[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 239
  • Last Modified:

How to do safe multi-threading?

Dear Experts,

Hi.  My app was designed as an old-fashioned batch program.  (I ported it from COBOL...)  It has a handful of programs that just crunch through a big data file and spit out reports.

I'm now being asked to make them run faster by using multi-threading.  Most likely, that would mean making those programs run concurrently, rather than consecutively as they do now.  My problem is, I've done absolutely nothing yet in terms of thread safety (multi-threading had been explicitly rejected in the original design), and I've now got about one hundred classes and probably 100,000 lines of code...

Could anyone suggest how I'd start?  What would be the top 5 things to do?  I know I'd need to declare a lot of methods synchronized, but what else?  

Thanks,
BrianMc1958

0
BrianMc1958
Asked:
BrianMc1958
  • 6
  • 6
  • 4
  • +1
3 Solutions
 
sciuriwareCommented:
Rule 1: if you got only 1 CPU there is nothing to gain!

;JOOP!
0
 
sciuriwareCommented:
Rule 2: better to split a program into multiple running instances than to
complicate the program into multi threads.

Threads are necessary only when they NEED synchronising,
not when they cause its need.

Do those batches need to wait for each other? No? Split the program!

;JOOP!
0
 
imladrisCommented:
W.r.t. to my post on my last question, there may be opportunities to speed up this code through multithreading. It is possible that there is time spent waiting for data from the file to be read that could be exploited in some way.

However, in this as in any other performance improvement project, the first thing you should do is profile the program. Most programs spend 80% of their time executing 20% of their code. The way to improve performance is to find out where the bottlenecks are, and then optimize them.

Spending 1000's of manhours multithreading your program when it is only spending 5% of its time waiting for data from the disk is going to give you a maximum 5% speed boost (probably a lot less). On the other hand, if you can identify a method that is using 20% of the programs time (maybe doing a bunch of string concatenation for a report), you can get a much bigger gain, for a lot less effort, by improving that method.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
imladrisCommented:
Sorry first bit:

W.r.t. to my post on my last question

should have been:

W.r.t. to my post on *your* last question
0
 
BrianMc1958Author Commented:
My boss wants multi-threading because different programs can be kicked off by different users, and they have a wide variance in run-time.  So assuming one user starts a really long program, and a different user then starts a quick one, the latter still has to wait until the long one finishes for the quick one to start.

Sorry.  I should have said that first...

To sciuriware:

Could you explain this some more:

>>Rule 2: better to split a program into multiple running instances than to
complicate the program into multi threads.

I don't understand "multiple running instances"...


Thanks,
--BrianMc1958
0
 
BrianMc1958Author Commented:
If, by "multiple running instances", you mean make the whole program one thread--I think that's my current plan.  I'm already invoking them polymorphically, so spawning them as threads should be simple enough.  However, I'm not sure what else--beyond lot of synchronizing--I need to do...  
0
 
imladrisCommented:
I'm guessing from the way you're describing your boss's statement that the program currently produces a "batch" of reports. And that different people are interested in different reports that are being generated.

And that your boss's idea is that if each user could kick off only the piece they were interested in, *when* they were interested in it, and that, when that happens, the program could only do as much work as it needed to do in order to produce the report that particular person was interested in, then everyone would be better off.

If I've got all that right, then sciuriware suggestion of splitting the program into multiple instances is probably the right tactic. Note also that, technically in this case, your boss is asking for *multiuser* capability, not multithreading.

To go down this path you would, at least conceptually, split the program into multiple programs. From your description the objective would be a situation where User A can run program A to get report A, and user B can program B to get report B etc. etc. If the big datafile doesn't get altered while the various tasks "crunch" through it, then this should work. In the brute force tactic you would literally make as many copies of the program as there are reports needed, and then alter each to avoid doing any work that is unneeded for the report it is supposed to produce. From a coding perspective it would be nicer if it could remain one program. This might be done by passing in commandline arguments to indicate which report is desired. The program then sets one or more parameters to use internally to manage doing only what is needed to get the requested report.

If the big datafile *does* change, then things get more complicated of course. Or if temporary storage is used you might need to ensure that is stored in different places when processing different reports. Etc. etc. The possible complications are endless. It depends on what your program does.

But, just to close the loop, if I didn't get that right (i.e. my assumptions about what you mean are wrong), then you'll have to try explaining what is desired in more detail.
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Of course, making too many processes will result in too many process control blocks in memory, though if you use threads you will be able to make use of the same process-space. Though the number of context-switches will probably be lesser. If the threads use the same memory/ objects, then its better to use them instead of making multiple process instances because the memory-sharing will be a problem:

http://java.sun.com/docs/books/tutorial/essential/threading/
0
 
BrianMc1958Author Commented:
Let me try to explain this right one more time...

I actually have about six programs.  Users can run them interactively, with different parameters.  They can run the same program more than once.  They are all under the control of a master scheduling program, which uses a daemon to see if any requests to run a program have come in.  If so, the master program then runs all the desired programs--one at a time.

The change my boss is requesting would probably mean running all the desired programs at once.  I had assumed the way to do this would be to run each desired program in it's own thread.  So I would need to add threading to the master program, but probably not to the sub-programs.

So to answer imladris, my "program" is already split into sub-programs.  (I'm sorry I hadn't really said that before!).  I should also say that the total number of threads will always be very small.

So, I'm still unclear on what "splitting the program into multiple instances" means.  Based on this latest description, does it sound like I've already done that?  

Thanks one more time,
BrianMc1958
0
 
sciuriwareCommented:
Try to convince your boss that running multiple threads in ONE program for multiple users is
a waste of time in this case.
Unless you could make it a client server case with the server on a very fast machine.

;JOOP!
0
 
imladrisCommented:
>So, I'm still unclear on what "splitting the program into multiple instances" means.  Based on this latest description, does it sound like I've already done that?  

It does sound like you have already done that. I wonder though, if you have six separate programs, where is the difficulty in running the programs "at once". I don't think it will run much faster, but if they are separate programs it should be easy enough to do? So, I'm guessing there is still something I'm not getting. What is involved (in detail) in running one of these programs? What is the interface to this master scheduler like? What does it do to "see if any requests" have come in? Why is there a master scheduler anyway? Why not just run one of the 6 programs as needed?

0
 
BrianMc1958Author Commented:
The six programs all instantiate the a "utilities" class, for instance.  It was designed expecting never to be multi-threaded, and so I have no idea at this point how thread-safe it might be.  At a minimum, it seems like I would need to find every "static" anything in my code, and figure out how to protect it from converging threads...

I think I've probably taken up too much of all your time on this one question.  But before I sign off, could you folks confirm that if I am to run the six programs at the same time, that necessarily means putting each in it's own thread?  Short of invoking six JVM's, that's the only way, right?

Thanks for that last time (this time),
BrianMc1958

0
 
imladrisCommented:
OK, so there aren't six programs, right? How many "main"s are there? How does each program get invoked?

>At a minimum, it seems like I would need to find every "static" anything in my code, and figure out how to protect it from converging threads...

Possibly.

>But before I sign off, could you folks confirm that if I am to run the six programs at the same time, that necessarily means putting each in it's own thread?

No, it doesn't. That question is still outstanding for your code. Depending on the internal structure of your program(s), it may be easier and better to follow Joop's suggestion of splitting the code into separate programs. You can run multiple instances of Excel at the same time, right? That's got nothing to do with threads.

>Short of invoking six JVM's, that's the only way, right?

Yes, running separate programs would (conceptually) mean invoking six JVM's. Is there a problem with that?

>I think I've probably taken up too much of all your time on this one question.

No! Don't go now! We haven't resolved a thing yet. And I still think your boss is barking up the completely wrong tree. I still see no evidence that a massive multithreading project is going to deliver any performance benefit whatsoever.
0
 
BrianMc1958Author Commented:
Please excuse me for ignoring you all for five days!  I was kind of hoping the issue might go away.  I have responded to my boss, and this has now been put on the back burner, with the promise that I'll try to get more concurrent--somehow--in the future.

If anyone is still listening, then, I'd love to follow up!

 >>How many "main"s are there? How does each program get invoked?

There is just one actual main, in the master program.  It instantiates six other different classes, and calls a method called "doMain" in each to make it run.

>>Yes, running separate programs would (conceptually) mean invoking six JVM's. Is there a problem with that?

I would need to re-work the basic structure of the master program, but that is doable within a few days at most.  I had assumed multiple JVM's would be a major resource drain.   For good design, I thought I would need multiple instances running concurrently in one JVM.  If not, then yes, multiple JVM's would eliminate all of the concurrent-thread problems, leaving me with (just) the shared-resources problems...  Would that be a fairly "normal" solution?

BTW, my boss pointed out that two threads (actual threads--not instances) WILL run faster even on a single CPU, because the CPU tends to wait a lot for IO anyway.  I think he's right.  What do you think?


Thanks one more time.

Will award points and start a new thread (!) if nobody's listening here anymore.
--BrianMc1958
0
 
sciuriwareCommented:
Your boss is partly wrong: the scheduling of two jobs
(=2 JVM's) by the operating system is better than 2
threads within one JVM, unless there is a lot of inter-thread communication.

When you put 'tasks' together into ONE program you can:
- for distinct tasks, run the program N times at once, starting it as task-#m via the commandline.
You will notice that in fact many programs and utilities are like that:
the good old pkzip/unzip was 2 seperate tasks controlled which one to run from the job name.
- for coherent tasks that MUST run simultaneously and inter-communicating,
the 'main' must "spray" all necessary threads into the air at once.

Unless programs must share memory or such, multiple JVM's won't eat your resources.
And, remember, a program can crash: in the case of threads they die together!

;JOOP!
0
 
imladrisCommented:
>I had assumed multiple JVM's would be a major resource drain.

I wouldn't expect it to be. Each JVM has the same code, of course, so I would expect most operating systems to only load it into memory once.

>Would that be a fairly "normal" solution?

Absolutely.

>BTW, my boss pointed out that two threads (actual threads--not instances) WILL run faster even on a single CPU, because the CPU tends to wait a lot for IO anyway.  I think he's right.  What do you think?

It certainly *can* be the case, as I already indicated in one of my early posts here, and the one on your previous question. Given the description of your program since then (crunch through a big file and spit out reports) it certainly seems reasonable. One would imagine that if it is crunching through a big file there is synchronous I/O waiting that can be exploited. But whether that is in the 5% range, or the 50% range I'm not in a position to say.
Note that this is not a statement that can be made in general. There is lots of software out there that is CPU bound. That is, it spends little or no time waiting for file i/o. It's maximum speed is limited by the speed of the CPU. For such software this tactic will deliver no results whatsoever.

Two more points. First a repitition. This project is actually about performance improvement. The whole multithreading thing appears to be an assumed solution to the problem. But it is a big assumption. A performance improvement project should always start with profiling. First find out where the program is spending its time. Then you can make effective decisions about what performance improvements are possible by various means, and what kind of a payoff you can expect.

Second, if disk i/o does chew up a noticeable amount of time, splitting the program into 6 pieces would be a much simpler way of exploiting that. The different programs will quite naturally be safe from interference with each other, and the operating system can switch between them quite effectively, thereby exploiting all the disk waits there are. The only advantage to be had by threading it instead is that the context switch would be a little cheaper. The disadvantages of threading (all the coding you would have to do, and risk of problems, the subsequent, difficult to find, bugs you would have to deal with) will quickly swamp its advantages in this scenario.

The most common use of threading is to replace polling. Stuff like pushing bytes to be printed into a buffer, and having a separate thread that deals with doling them out to the printer as it is able. Or getting incoming data at a buffer, and having another thread that waits at the actual communications port and puts data in the buffer as it gets it. Or a webserver that is running numerous connections that are essentially identical. In these scenario's you can't reasonably split the code into separate programs. And you can't reasonably freeze the program while it prints, or waits for incoming data. In these cases threading is a nice clean relatively simple way of managing various things going on in a program.

Actual distributed programming (where you're trying to do more than one thing at the same time) is rarely done. And that's becuase it is difficult to do it right, and even when you have done it right, it is difficult to gain a commensurate performance advantage.
0
 
BrianMc1958Author Commented:
Well, Experts, I think we can finally put this question to bed.  Now that you have finally got me to state my question fully, it looks like we have a definite answer.  When this next comes up, I'll work on running each of the six "programs" in it's own JVM.  

Your experience and guidance is, as always, very much appreciated.

--BrianMc1958  
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 6
  • 6
  • 4
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now