Multithreading -- Why and When

Adding multiple threads of execution to your program may seem like an obvious "next step" in your development plans. This article discusses the pros and cons of making that decision. In a separate article, I'll show how to start a thread, stop it, and show progress messages in your U/I.

To Thread or Not To Thread...
I recall being very excited back in 1985 when I found that the then-new OS/2 supported pre-emptive multithreading. Surely that would revolutionize programming... wouldn't it? We'd no longer need to wait for one thing to finish before starting another... right?

But then I needed to think about it for a while to find a practical use for it. Consider that the computer is not actually doing two things at once. It is doing one thing for a while, then it puts that down and does the other thing for a while... So, for the most part, there's no true speed up. In 1985, CPUs were primitive 80286s and the task-switch had a lot of overhead. Dual processor boxes cost as much as a house, and the idea of a multiple-core CPU-on-a-chip was decades in the future.

Surely there are algorithms that can benefit from multithreading... An article I wrote back then found one. I had written a program that would generate and solve a maze -- not unlike this one.

The maze has exactly one path from start to finish. To solve the maze, my program begins at one opening and works its way to the other, reversing and back-tracking when it hits a dead end. In some cases, it spends a lot of time eliminating all those dead ends. What if I had two threads of execution? I could start one thread on each end of the maze and when they met (presumably, somewhere in the middle), I'd have the solution. The idea panned out gold. Even with the extra overhead of the threading context switches, the chance of skipping some dead-end processing by working from two ends was a clear win. Testing and timing on thousands of large randomly-generated mazes, I found an average 20% advantage by using that multithreaded algorithm. That's nowhere near the hoped-for 50%, but it's a significant improvement.

However, I realized that this is a special case. Most search algorithms can't work backwards... for instance, if you are searching a document for a specific text string, there is exactly one "path." Starting a second thread in the middle or working back from the end can yield no benefit on a single-CPU architecture.

In fact, it is hard to think of many scenarios where a time-sharing algorithm is better than a well-written single-threaded algorithm. Usually, whatever you think you are gaining by using a secondary thread for pre-processing or "simultaneously" processing elsewhere is an illusion. There are some algorithms that can benefit from a "lucky find" -- a promising route to a solution turns up at a random location in "solution space" and you have a better chance of finding a solution sooner by exploring that avenue while also continuing on with the original search. But that's a sophisticated scenario - not at all like the day-to-day U/I-centric programming most of us spend our working hours doing.

It Already Does That
Where multitasking is a clear win is in avoiding delays in processing user interaction. For instance, when a web browser requests a page, it must not stop cold while it waits for the response. It must continue to process keystrokes and mouse clicks and screen updates in a timely manner. Another example is in printing -- actually any time you must access a slow device -- you don't want to hold up the show while the slow device finishes.

The thing is... All of the most common "blocking" tasks use system tools that provide ways to avoid unwanted delays in U/I processing. For instance, when you use, say, IXMLHTTPRequest to download a webpage or image file, you can specify asynchronous operation; the system spins off a thread to do all the waiting while your program continues running smoothly and all you need to do is listen for an event or poll for a status change. Likewise, printing actually sends data to a dedicated system (the spooler) so that sending the page is fast and does not interrupt your processing of U/I tasks while waiting for the slow printer.

If you have several MDI windows or perhaps some floating modeless dialog boxes, you might think it a clever idea to have a thread for each one. But that is unnecessary and almost certainly inadvisable. The system already handles all of the things that need to be done to keep these multiple windows updated and active.

What I'm saying is just this:

You probably don't need to use multithreading in your program.

Multithreading introduces certain complexities that are best avoided. Debugging a multithreaded application can be very confusing... You stop at a breakpoint and single-step for a while, then click GO and the breakpoint is immediately hit by another thread -- you are now looking at a separate instance of the function, not the one you were debugging a moment ago. What's more, you have just screwed up the timing from what it would have been in a non-debugging scenario. This ends up with a sort of Heisenberg "observer effect" that cannot be avoided.

Most importantly, there are many types of programming constructs, such as linked lists, that can get monumentally bollixed up when multiple threads are accessing them. You have to go out of your way to set up mutexes, semaphores, and/or critical sections to prevent the horrendous possibility of a task switch at just the wrong time. Why horrendous? Because the kinds of errors that occur are inherently non-reproducible. They seem to come at random times, with neither rhyme nor reason. Debugging can turn into your worst nightmare.

It turns out that GUI processing often runs into that linked-list type of scenario. The user clicks in a ListView and there is a rapid-fire exchange of messages and events -- to itself and the parent window. If that gets interrupted by some thread deciding it was time to start a new action (such as adding or deleting a ListView item) the entire structure can get hosed. The symptoms are erratic and non-reproducible, making your job much harder (and the program not one whit better).

Other "headaches" involving multithreading include: Resource ownership is critical: E.g., you need to create sockets and database connections, and windows, etc. in the thread and other threads cannot or should not access them. DLL access (especially with MFC, but also with the standard C Runtime Library): You need to take certain precautions when calling DLL functions from multiple threads.

Pseudo-multithreading
So, before we look at multithreading, I want to present a very real alternative that will work just as well in most situations, while avoiding all of the potential pitfalls.

Rather than spin off a thread to do a separate task, make use of Windows' built-in event-driven nature. All Windows GUI programs contain a message loop, and programs basically just sit in that loop processing the messages. If you want some program functionality "interleaved" into the sequence and to be processed "in the background" then all you need to do is insert your own message request into the loop at a timed intervals. It's called a Window Timer, and it is supported by every GUI-aware programming language.

Examples:

You need to "continuously" check the status of a card-reader in your Point-of-Sale application. You might consider writing a thread dedicated to that. But it will be conceptually much easier to just start a Window Timer to check that status periodically.

A lengthy process is taking place (say, a database stored procedure is running). Rather than monitoring the progress with a separate thread, just have a timer take a quick peek every so often. Update the screen with status text or move a progress gauge by one tick in a way that appears to be "continuous."

You need to run an external program and keep tabs on it to know when it is finished (see Execute a Program with C++). You might consider using a separate thread to monitor that program, but a timer could do that just as well -- with no multithreading headaches.

The beauty of putting these background tasks in a window timer is that when the TimerProc gets called, there are no other messages being handled; nothing else is happening at that particular point in time. No linked lists are partially updated. No sequence of window events has been interrupted. You own the CPU (at least your program's slice of it) and if the program fails, you can breakpoint and single-step through it and find out what's going wrong.

If you resist the Window Timer idea, you should ask yourself: Why not?

Are you worried that it won't be responsive and immediate enough? From a U/I standpoint, that's foolish. The human eye can't perceive anything that happens faster than about thirty times per second (approximate frame rate for TV and videos). Using a Window Timer, you can update the screen ONE THOUSAND times per second. If you schedule a timer interval at 50ms and it comes in at 63ms, do you really think your user will notice?

Are you worried that your timer message might be lost? That you might miss a critical event? That is a legitimate concern if you are trying to do real-time monitoring of some hardware. But that sort of work is almost invariably done at the device driver level -- running with high priority inside the system kernel. A device driver is designed to record such events and pass them to you upon request, with the full understanding that your GUI program may not be able to handle them immediately. Thus, this issue is probably NOT going to affect your program's functionality. You'll be able to do whatever needs to be done via a TimerProc "in the background" without actually creating a background-processing thread.

In summary, there are plenty of good reasons to avoid multithreading: The programming is complex, it rarely buys you what you thought you were going to get, and there is a simple alternative (a Window Timer) that works just as well in most situations. That said...

Use Multithreading When...
First, identify a specific task that does not involve direct user interaction. Do all of your U/I programming in the main thread. What's left might be a candidate for one or more worker threads (threads that do not handle U/I tasks).

Next, verify that you are not fooling yourself. Is there an actual physical delay that you can avoid by running a worker thread? Or will your program (and your user) be stuck looking at an hourglass cursor anyway? Unless you are running on a multi-core or multi-CPU box, you are usually just dividing the same resource (the single CPU) among several subtasks and the final result won't be available any faster.

Here are some examples where multithreading, even on a single CPU, can help:

You have a lot of idle time between keystrokes when waiting for the user to type. You can do a spell-check in that time... and as long as it does not interfere with the user, this is time saved. For instance Microsoft Office Word puts a wavy red underline below misspellings, but it does not stop normal U/I activity. This is a reasonably good use of spare CPU cycles.

You do high-volume access with a remote host. When you make a request, you know that the remote host may need several seconds to process the request. If you have a queue of requests to make, then allowing your program to go idle during that in host period would be a big waste. A secondary thread could be posting the next request or processing the previous response. This is a clear win. You do not give up anything, and though any single transaction will not be sped up, the average per-transaction processing speed is improved.

You are processing a stream of data, where a partially-processed stream can be displayed immediately, and the screen can be updated when the entire dataset has been collected. For instance, the task of receiving and buffering incoming video data (and indicating readiness) can be handled in a separate thread. Or a web browser can begin processing the HTML while other threads are busy downloading the images and other items. The final page might not be available any sooner either way, but the user will be interactive sooner and will feel like things are going faster.

You need to process multiple database requests as you generate a complex output. If your web page can't ship until three separate database requests have been completed, you could start all three "at one time." If the first one you need comes back first, then you can begin generating your final output while the other two database requests are still in progress.

Multi-core and Multi-CPU Scenarios
Much of what I've said above about "fooling yourself" on single-CPU time-sharing systems is not necessarily true when your program is running with multiple CPUs. In such cases, the secondary threads might not steal time from the main thread; rather, they can use time-slices available on the secondary processor(s).

If you are not waiting on I/O and not waiting for a prerequisite step to finish before starting the next step, then there are often situations where you can improve performance by multithreading via parallel processing.

For instance, consider a matrix multiplication. If you can set up so that the bottom half of the matrix is being processed at the same time as the top half, then the final calculation will be finished sooner. If you need to search a large document quickly, then if you start one thread at the top and another in the middle, the search will finish sooner. Image manipulation that involves traversing a large bitmap from top-to-bottom could be done faster if the task is split into two thread running on two CPUs.

With multiple CPUs, you'll still have the same set of headaches that I've described earlier, but at least (if you have really isolated tasks that can run in parallel), you'll get some benefit from the extra programming efforts.

If, after reading all of this, you still think that your program can benefit from multithreading, then check out

Simple Multithreading in C++

In that article, I describe the basics of using multithreading -- How to create a thread, how to end it gracefully (or not so gracefully, in an emergency) and how to avoid some of the multithreading pitfalls.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you liked this article and want to see more from this author, please click the Yes button near the:
Was this article helpful?
label that is just below and to the right of this text. Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Multithreading -- Why and When

Comments (0)