I recently have started playing around with the Task Parallel Library (TPL) that will be shipping in .NET 4, which lets you easily spread a workload across multiple cores using a simple Parallel.ForEach statement. I have a few IronPython scripts that I want to convert to start using multiple threads, so I thought I would try using the .NET 3.5 version of the TPL from inside of IronPython. It turns out that the TPL works just fine, but writing thread safe code in IronPython can be tricky. First off, the System.Threading.Interlocked class does not work on IronPython integers because python integers are immutable. That really sucks, because it means that the only way to change a global integer value in a thread safe manner is to use locks or the System.Threading.Monitor class. One interesting workaround is to use a python list instead of an integer. Appending a value to a list is an atomic operation, so instead of calling Interlocked.Add you can just append all the values to an empty list an use sum to return the aggregate value or len() to get the number of items in the list (emulating incrementing a variable). It is a bit of a hack, but sometimes easier that using locks. Also, you have to remember that the print statement is not thread safe, so instead you should use something like System.Text.StringBuilder to buffer your print statements and then print them once you are back to a single thread code section.
Below is a small sample script that I wrote to test processing a large number of text files using IronPython. It will search through all files in a given directory that match a given pattern (ie: *.txt) and return the total number of files and lines. It will also search for a given tolken inside each file and return the total number of matches. This is a task that is very easy to run on multiple cores and is a perfect fit for the Parallel.ForEach method. By default it will search your temporary files folder, which on my workstation had 69 .txt files with a total of 469,286 lines. The single threaded version took about 2.3 seconds to run and the multi-threaded version took between 0.8 and 1.0 seconds. The workload was spread across 8 cores, which caused a 2x improvement in speed, even thought this example is still primarily bound by the drive IO speed. Still it shows how using the TPL can greatly increase performance for common tasks.