- Community Pick
- Experts Exchange Approved
Introduction
As chip makers focus on adding processor cores over increasing clock speed, developers need to utilize the features of modern CPUs. One of the ways we can do this is by implementing parallel algorithms in our software.
One recent task I needed to perform at home was to find and document large files in certain folders. I do a back up regularly of documents and source code, and large binaries in those folders can cause overflow on the media I use for storage. So I wanted a program that could scan through all the files in the folder and build a list of files over a certain size. I thought this would be a good opportunity to use some multithreading and see how the performance was improved.
In terms of parallel processing, .NET developers have several choices. BackgroundWorker objects are useful for performing lengthy calculations out of the UI thread so that the UI can remain responsive. You can create your own threads, just by instantiating new Thread objects with delegates to the routines you wish to call. You can also use the built in ThreadPool.
If you want an aggressively multithreaded application, the ThreadPool object is ideal. It is designed so that you can queue work items, and as worker threads become available to the pool, the threads will execute the items you queued. It’s convenient as the framework handles the business of queuing and signaling the threads to start.
Implementation
In terms of design, I took the easy way out. I thought I would create a new work item for each folder and dump it in the queue. I will use a ManualResetEvent object for the worker threads to signal that they are finished. The ThreadPool does not have an easy built in way to determine the state of individual work items.
I have a simple class that saves thread state information and is passed into the work item.
Here are some form level declarations which will be explained as we proceed.
The call to QueueUserWorkItem takes a WaitCallback parameter, which is a delegate to the subroutine that you want the work item to execute. The last parameter is an object through which I use to pass the ThreadState object to the subroutine. So that we know when the ThreadPool is finished processing the queue, we have a form level counter that gets decremented each time a thread finishes. When the counter is equal to zero, we will call .Set on the ManualResetEvent. The calling thread will call the WaitOne method on the reset event object, which will block until .Set is called by the worker thread.
The code that is called by each work item is in the ScanFiles subroutine. Notice the use of Interlock.Add and Interlock.Increment. The Interlocked Class provides these thread-safe methods that ensure that only one thread is manipulating the variable at any one time. Similarly, the FileList.Add method is wrapped in a SyncLock block. SyncLock restricts the access to the enclosed variables and code by just one thread at a time. It is recommended that you don't overuse it. If large blocks of code are restricted this way, then you are creating a bottleneck for your multithreading.
Testing and Results
I set up stopwatch objects and implemented a single threaded method that performed the same task, timing it in the same way.
There are definitely large improvements in performance on my quad core system when using multithreading. Results are consistently around 30%-40% of the time taken to scan in a single thread. This was better than I expected, as I was thinking that the process of doing a single directory per thread may be less efficient because of extra overhead. But the ThreadPool does a pretty good job of making your multithreading as efficient as it can be.
Conclusion
The ThreadPool class is a must-have tool for your kit if you have a large number of IO operations that need to happen asynchronously, or if you can spread calculations out in multiple chunks. The full source code for the test project is here:
FileSizes Example Project Download
It was written in Visual Studio 2005.
by: DanRollins on 2010-09-13 at 16:46:27ID: 19394
I do have one question...
I'm inexperienced with .NET and the ThreadPool, but it appears to me that the code starts a thread for every folder in the StartFolder dir. I would expect that Task Manger would show dozens of new threads. But instead, it starts with 15, and climbs to only about 20. I get similar numbers if I insert a ThreadPool.GetAvailableThr
Is this as expected?
Incidently, singled-theaded my quad-core maxes at 25% (as expected). Multi-threaded, it goes up to 95% (firing on all four cylinders) -- that's just to confirm that it is working as advertized :-)
-- Dan