Community Pick: Many members of our community have endorsed this article.
Editor's Choice: This article has been selected by our editors as an exceptional contribution.

The ThreadPool: Multithreading for Performance

Published:

Introduction

As chip makers focus on adding processor cores over increasing clock speed, developers need to utilize the features of modern CPUs.  One of the ways we can do this is by implementing parallel algorithms in our software.  

One recent task I needed to perform at home was to find and document large files in certain folders.  I do a back up regularly of documents and source code, and large binaries in those folders can cause overflow on the media I use for storage.  So I wanted a program that could scan through all the files in the folder and build a list of files over a certain size.  I thought this would be a good opportunity to use some multithreading and see how the performance was improved.

In terms of parallel processing, .NET developers have several choices.  BackgroundWorker objects are useful for performing lengthy calculations out of the UI thread so that the UI can remain responsive.  You can create your own threads, just by instantiating new Thread objects with delegates to the routines you wish to call.  You can also use the built in ThreadPool.  

If you want an aggressively multithreaded application, the ThreadPool object is ideal.  It is designed so that you can queue work items, and as worker threads become available to the pool, the threads will execute the items you queued.  It’s convenient as the framework handles the business of queuing and signaling the threads to start.  


Implementation

In terms of design, I took the easy way out.  I thought I would create a new work item for each folder and dump it in the queue.  I will use a ManualResetEvent object for the worker threads to signal that they are finished.  The ThreadPool does not have an easy built in way to determine the state of individual work items.

I have a simple class that saves thread state information and is passed into the work item.

Private Class ThreadState
                      	Private mDirInfo As DirectoryInfo
                      	Public Property DirInfo As DirectoryInfo
                      		Get
                      			Return mDirInfo
                      		End Get
                      		Set(ByVal value As DirectoryInfo)
                      			mDirInfo = value
                      		End Set
                      	End Property
                      	'Sum of size of all files in this directory (non-recursive.)
                      	Private mDirectorySize As Long
                      	Public Property DirectorySize As Long
                      		Get
                      			Return mDirectorySize
                      		End Get
                      		Set(ByVal value As Long)
                      			mDirectorySize = value
                      		End Set
                      	End Property
                      
                      	Public Sub New(ByVal DirInfo As DirectoryInfo)
                      		Me.DirInfo = DirInfo
                      	End Sub
                      End Class

Open in new window


Here are some form level declarations which will be explained as we proceed.
'Track files which match the criteria
                      Private FileList As New List(Of FileInfo)
                      
                      'Keep track of number of threads still unfinished
                      Private mNumActiveThreads As Integer
                      'Allow worker threads to signal back to main thread via waithandle.
                      Private mEv As New ManualResetEvent(False)
                      
                      'This is just a handy list to maintain all the directory size information that is gathered.
                      'ThreadState objects are passed into the procedure used by the individual threads.
                      Private ThreadStateList As New List(Of ThreadState)

Open in new window


The call to QueueUserWorkItem takes a WaitCallback parameter, which is a delegate to the subroutine that you want the work item to execute.  The last parameter is an object through which I use to pass the ThreadState object to the subroutine.  So that we know when the ThreadPool is finished processing the queue, we have a form level counter that gets decremented each time a thread finishes.  When the counter is equal to zero, we will call .Set on the ManualResetEvent.  The calling thread will call the WaitOne method on the reset event object, which will block until .Set is called by the worker thread.


''' <summary>
                      ''' This is the main routine that adds each directory as a 
                      ''' new threadpool work item.
                      ''' </summary>
                      ''' <param name="StartFolder"></param>
                      ''' <remarks></remarks>
                      Public Sub ScanAllFilesMultithreaded(ByVal StartFolder As Object)
                      	FolderSize = 0
                      	NumFiles = 0
                              mEv.Reset()
                      	Dim di As New DirectoryInfo(CStr(StartFolder))
                      	Dim DirInfoList As New List(Of DirectoryInfo)
                      	GetDirectories(di, DirInfoList)
                      
                      	sw = Stopwatch.StartNew
                      	mNumActiveThreads = DirInfoList.Count
                      	For Each di In DirInfoList
                      		AddWorkItem(di)
                      	Next
                      
                      	'WaitOne blocks until Set is called on the object in the
                      	'worker thread.
                      	mEv.WaitOne()
                      
                      	sw.Stop()
                      
                      End Sub
                      
                      ''' <summary>
                      ''' This adds items to the ThreadState list 
                      ''' and adds new work items to the thread pool, all
                      ''' based on the directoryinfo object that is passed in.
                      ''' </summary>
                      ''' <param name="di"></param>
                      ''' <remarks></remarks>
                      Private Sub AddWorkItem(ByVal di As DirectoryInfo)
                      	'Dim Ev As New ManualResetEvent(False)
                      	Dim ts As New ThreadState(di)
                      	'EventList.Add(Ev)
                      	ThreadStateList.Add(ts)
                      	Threading.ThreadPool.QueueUserWorkItem( _
                      		New WaitCallback(AddressOf ScanFiles), ts)
                      End Sub

Open in new window

     
The code that is called by each work item is in the ScanFiles subroutine.  Notice the use of Interlock.Add and Interlock.Increment.  The Interlocked Class provides these thread-safe methods that ensure that only one thread is manipulating the variable at any one time.  Similarly, the FileList.Add method is wrapped in a SyncLock block.  SyncLock restricts the access to the enclosed variables and code by just one thread at a time.  It is recommended that you don't overuse it.  If large blocks of code are restricted this way, then you are creating a bottleneck for your multithreading.

''' <summary>
                      ''' This is the procedure called by the thread pool work items.  
                      ''' See AddWorkItem for how this is set up.
                      ''' </summary>
                      ''' <param name="state"></param>
                      ''' <remarks></remarks>
                      Private Sub ScanFiles(ByVal state As Object)
                      	Dim TS As ThreadState = CType(state, ThreadState)
                      	Try
                      
                      		Dim FInfo As FileInfo
                      		For Each FInfo In TS.DirInfo.GetFiles
                      			'Since this routine can run simultaneously 
                      			'in different threads, it is important to use
                      			'the Interlocked Increment and Add methods to ensure
                      			'that the form level variables are thread safe and 
                      			'not subject to race conditions.
                      			Interlocked.Increment(NumFiles)
                      			Interlocked.Add(TS.DirectorySize, FInfo.Length)
                      			If FInfo.Length > Limit Then
                      				'SyncLock ensures that only this thread 
                      				'can access the filelist during this Add 
                      				SyncLock so
                      					FileList.Add(FInfo)
                      				End SyncLock
                      			End If
                      		Next
                      		Interlocked.Add(FolderSize, TS.DirectorySize)
                      	Catch ex As Exception
                      		Debug.WriteLine(ex)
                      	Finally
                      		'When mNumActiveThreads reaches 0, all the workers are finished.
                      		If Interlocked.Decrement(mNumActiveThreads) = 0 Then
                      			mEv.Set()
                      		End If
                      	End Try
                      End Sub

Open in new window


Testing and Results

I set up stopwatch objects and implemented a single threaded method that performed the same task, timing it in the same way.  

There are definitely large improvements in performance on my quad core system when using multithreading.  Results are consistently around 30%-40% of the time taken to scan in a single thread.  This was better than I expected, as I was thinking that the process of doing a single directory per thread may be less efficient because of extra overhead.  But the ThreadPool does a pretty good job of making your multithreading as efficient as it can be.

First run

Conclusion

The ThreadPool class is a must-have tool for your kit if you have a large number of IO operations that need to happen asynchronously, or if you can spread calculations out in multiple chunks.  The full source code for the test project is here:
 
      FileSizes Example Project Download

It was written in Visual Studio 2005.
3
8,177 Views

Comments (1)

CERTIFIED EXPERT
Author of the Year 2009

Commented:
I was able to build and run the project.  I got similar timing results.

I do have one question...
I'm inexperienced with .NET and the ThreadPool, but it appears to me that the code starts a thread for every folder in the StartFolder dir.  I would expect that Task Manger would show dozens of new threads.  But instead, it starts with 15, and climbs to only about 20.  I get similar numbers if I insert a ThreadPool.GetAvailableThreads call in the scan loop.  That's true even for a full-disk scan (takes over a minute)

Is this as expected?  

Incidently, singled-theaded my quad-core maxes at 25% (as expected).  Multi-threaded, it goes up to 95% (firing on all four cylinders)  -- that's just to confirm that it is working as advertized :-)

-- Dan

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.