<

The ThreadPool: Multithreading for Performance

Published on
17,052 Points
6,752 Views
3 Endorsements
Last Modified:
Awarded

Introduction

As chip makers focus on adding processor cores over increasing clock speed, developers need to utilize the features of modern CPUs.  One of the ways we can do this is by implementing parallel algorithms in our software.  

One recent task I needed to perform at home was to find and document large files in certain folders.  I do a back up regularly of documents and source code, and large binaries in those folders can cause overflow on the media I use for storage.  So I wanted a program that could scan through all the files in the folder and build a list of files over a certain size.  I thought this would be a good opportunity to use some multithreading and see how the performance was improved.

In terms of parallel processing, .NET developers have several choices.  BackgroundWorker objects are useful for performing lengthy calculations out of the UI thread so that the UI can remain responsive.  You can create your own threads, just by instantiating new Thread objects with delegates to the routines you wish to call.  You can also use the built in ThreadPool.  

If you want an aggressively multithreaded application, the ThreadPool object is ideal.  It is designed so that you can queue work items, and as worker threads become available to the pool, the threads will execute the items you queued.  It’s convenient as the framework handles the business of queuing and signaling the threads to start.  


Implementation

In terms of design, I took the easy way out.  I thought I would create a new work item for each folder and dump it in the queue.  I will use a ManualResetEvent object for the worker threads to signal that they are finished.  The ThreadPool does not have an easy built in way to determine the state of individual work items.

I have a simple class that saves thread state information and is passed into the work item.

Private Class ThreadState
	Private mDirInfo As DirectoryInfo
	Public Property DirInfo As DirectoryInfo
		Get
			Return mDirInfo
		End Get
		Set(ByVal value As DirectoryInfo)
			mDirInfo = value
		End Set
	End Property
	'Sum of size of all files in this directory (non-recursive.)
	Private mDirectorySize As Long
	Public Property DirectorySize As Long
		Get
			Return mDirectorySize
		End Get
		Set(ByVal value As Long)
			mDirectorySize = value
		End Set
	End Property

	Public Sub New(ByVal DirInfo As DirectoryInfo)
		Me.DirInfo = DirInfo
	End Sub
End Class

Open in new window


Here are some form level declarations which will be explained as we proceed.
'Track files which match the criteria
Private FileList As New List(Of FileInfo)

'Keep track of number of threads still unfinished
Private mNumActiveThreads As Integer
'Allow worker threads to signal back to main thread via waithandle.
Private mEv As New ManualResetEvent(False)

'This is just a handy list to maintain all the directory size information that is gathered.
'ThreadState objects are passed into the procedure used by the individual threads.
Private ThreadStateList As New List(Of ThreadState)

Open in new window


The call to QueueUserWorkItem takes a WaitCallback parameter, which is a delegate to the subroutine that you want the work item to execute.  The last parameter is an object through which I use to pass the ThreadState object to the subroutine.  So that we know when the ThreadPool is finished processing the queue, we have a form level counter that gets decremented each time a thread finishes.  When the counter is equal to zero, we will call .Set on the ManualResetEvent.  The calling thread will call the WaitOne method on the reset event object, which will block until .Set is called by the worker thread.


''' <summary>
''' This is the main routine that adds each directory as a 
''' new threadpool work item.
''' </summary>
''' <param name="StartFolder"></param>
''' <remarks></remarks>
Public Sub ScanAllFilesMultithreaded(ByVal StartFolder As Object)
	FolderSize = 0
	NumFiles = 0
        mEv.Reset()
	Dim di As New DirectoryInfo(CStr(StartFolder))
	Dim DirInfoList As New List(Of DirectoryInfo)
	GetDirectories(di, DirInfoList)

	sw = Stopwatch.StartNew
	mNumActiveThreads = DirInfoList.Count
	For Each di In DirInfoList
		AddWorkItem(di)
	Next

	'WaitOne blocks until Set is called on the object in the
	'worker thread.
	mEv.WaitOne()

	sw.Stop()

End Sub

''' <summary>
''' This adds items to the ThreadState list 
''' and adds new work items to the thread pool, all
''' based on the directoryinfo object that is passed in.
''' </summary>
''' <param name="di"></param>
''' <remarks></remarks>
Private Sub AddWorkItem(ByVal di As DirectoryInfo)
	'Dim Ev As New ManualResetEvent(False)
	Dim ts As New ThreadState(di)
	'EventList.Add(Ev)
	ThreadStateList.Add(ts)
	Threading.ThreadPool.QueueUserWorkItem( _
		New WaitCallback(AddressOf ScanFiles), ts)
End Sub

Open in new window

     
The code that is called by each work item is in the ScanFiles subroutine.  Notice the use of Interlock.Add and Interlock.Increment.  The Interlocked Class provides these thread-safe methods that ensure that only one thread is manipulating the variable at any one time.  Similarly, the FileList.Add method is wrapped in a SyncLock block.  SyncLock restricts the access to the enclosed variables and code by just one thread at a time.  It is recommended that you don't overuse it.  If large blocks of code are restricted this way, then you are creating a bottleneck for your multithreading.

''' <summary>
''' This is the procedure called by the thread pool work items.  
''' See AddWorkItem for how this is set up.
''' </summary>
''' <param name="state"></param>
''' <remarks></remarks>
Private Sub ScanFiles(ByVal state As Object)
	Dim TS As ThreadState = CType(state, ThreadState)
	Try

		Dim FInfo As FileInfo
		For Each FInfo In TS.DirInfo.GetFiles
			'Since this routine can run simultaneously 
			'in different threads, it is important to use
			'the Interlocked Increment and Add methods to ensure
			'that the form level variables are thread safe and 
			'not subject to race conditions.
			Interlocked.Increment(NumFiles)
			Interlocked.Add(TS.DirectorySize, FInfo.Length)
			If FInfo.Length > Limit Then
				'SyncLock ensures that only this thread 
				'can access the filelist during this Add 
				SyncLock so
					FileList.Add(FInfo)
				End SyncLock
			End If
		Next
		Interlocked.Add(FolderSize, TS.DirectorySize)
	Catch ex As Exception
		Debug.WriteLine(ex)
	Finally
		'When mNumActiveThreads reaches 0, all the workers are finished.
		If Interlocked.Decrement(mNumActiveThreads) = 0 Then
			mEv.Set()
		End If
	End Try
End Sub

Open in new window


Testing and Results

I set up stopwatch objects and implemented a single threaded method that performed the same task, timing it in the same way.  

There are definitely large improvements in performance on my quad core system when using multithreading.  Results are consistently around 30%-40% of the time taken to scan in a single thread.  This was better than I expected, as I was thinking that the process of doing a single directory per thread may be less efficient because of extra overhead.  But the ThreadPool does a pretty good job of making your multithreading as efficient as it can be.

First run

Conclusion

The ThreadPool class is a must-have tool for your kit if you have a large number of IO operations that need to happen asynchronously, or if you can spread calculations out in multiple chunks.  The full source code for the test project is here:
 
      FileSizes Example Project Download

It was written in Visual Studio 2005.
3
Comment
Author:PaulHews
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
1 Comment
 
LVL 49

Expert Comment

by:DanRollins
I was able to build and run the project.  I got similar timing results.

I do have one question...
I'm inexperienced with .NET and the ThreadPool, but it appears to me that the code starts a thread for every folder in the StartFolder dir.  I would expect that Task Manger would show dozens of new threads.  But instead, it starts with 15, and climbs to only about 20.  I get similar numbers if I insert a ThreadPool.GetAvailableThreads call in the scan loop.  That's true even for a full-disk scan (takes over a minute)

Is this as expected?  

Incidently, singled-theaded my quad-core maxes at 25% (as expected).  Multi-threaded, it goes up to 95% (firing on all four cylinders)  -- that's just to confirm that it is working as advertized :-)

-- Dan
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Join & Write a Comment

Monitoring a network: how to monitor network services and why? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the philosophy behind service monitoring and why a handshake validation is critical in network monitoring. Software utilized …
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month