• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 4828
  • Last Modified:

Delphi multi-thread program apparenty not using a multi-core processor

After decades of programming I am taking my first steps in parallel process programming using as always Delphi Professional, now XE3 on a Windows 8 professional Toshiba Satellite A665 laptop.
System.cpucount returns 4, as it should.
I have now programmed some minimal test programs and copied and tried some from the web running anywhere from 1 to 16 threads simultaneously. To my surprise the processing time is, however, just the same or somewhat greater than if I simply run the same processing sequentially. I wonder if the program is using all the cores or just one, whether some special switches have to be set, etc. (I have also read somewhere that Windows does not really allow parallel use of cores?)
  • 4
  • 2
  • 2
  • +2
4 Solutions
Sinisa VukCommented:
It can be done. Make multitherading is not enough. You must assign each thread to specific processor core.
some of components can help you to do job:
Geert GOracle dbaCommented:
you might want to check out the delphi geek's omni thread library
it's got a steep learning curve.

it will take care of assigning the different threads to the different cpu
Ephraim WangoyaCommented:
Multi-threading will not neccesarily increase your processing speed.
All depends on what type of processing you are doing, if one thread has to wait for resources beign used by another thread, then you dont get any speed advantage.
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Geert GOracle dbaCommented:
threading has a lot of pitfalls
the most import items to pay attention:
1: protecting resources from simultaneous access
2: synchronisation of threads: like a thread can start when a other has finished

consider your program like a bath tub and all the people like threads
everybody has a task to do: take a bath
synchronisation is very important here:
>> you probably don't want everybody to take a bath together: the six kids + parents + grandparents
>> you need to set  certain rules like who can actually take a bath together

the data would be what items go in bath with the person:
rules need to be set here too: having a load of ducks in with the kids is nice,
but granddad will probably like it more if they are removed before he goes in
It would help if we knew more about what you are attempting.

  Ray tracing is a great example of multiple processor cores being a great value.  In Ray tracing a great deal of math is done to render every pixel versus an in memory list of 3D objects.  Since each pixel can be rendered independently it would help to have thousands of cores.

  Other types of work are not easily threaded.  If you were doing database updates in a set of database tables that use foreign keys then each would need to wait until the previous had finished.  Without waiting until the previous one finished you would have errors from the same SQL updates that would not cause errors if other commands had finished.

  One of the biggest mistakes made by people new to multi-threading is trying to have all of the threads causing live screen updates.  Each update requires a synchronize statement.  This, in effect, makes all of the threads wait on each other.  If you need to have threads running, limit their updates of the screen to, something like once a second.
magnussonmsAuthor Commented:
The task I am dealing with seems well suited for multi-cpu processing. It is basically the processing of independent samples where all the data and work arrays for each can be stored in a separate record, object etc. Each processor could thus work on a sample without any interaction with the others. (Actually, tasks within each also have this character providing another parallel processing approach.) The job would be finished when all the samples had been analyzed (using computation intensive pattern detection in each case, which should lend itself well to truly parallel processing). Moreover, there is no need for UI updating before the longest task has been completed. So, as far as I can see, this situation is a particularly well suited, rather like the ray tracing mentioned above (even more so).

Wile I have not done any parallel process programming until now, I have read about it for some time. As a matter of fact, my main research area during the last few decades is the analysis/modeling of real-time (social) interaction in humans, animals and brain-cells, developing and using highly optimized, but still computation hungry algorithms so I do not underestimate the the complexities of interactive parallelism, but the present case seems particularly simple.

It has come as a surprise to me that in Delphi (Professional XE3), assigning tasks to particular cpu-cores appears quite difficult at least partly due to the way Windows operates. Simply using parallel threads does not provide any speed advantage as I have now found out as sequential processing is easily faster while using much less processing (using sinisav's program). What I really need is a to assign each independent task to a separate core -- as its only (or main) task. I think this is at  the heart of my issue. I have read about the OTL library, which according to Geert_Gruwez above should allow this, but has a "steep learning curve". I will now continue studying it.

All the replies have been useful, but I am still looking for a realistic solution and I will write back when I have found it or postponed trying until better tools become available.
I don't believe that Windows will allow you to hoard a processor for your thread.  You can assign a thread to a processor but there is no guarantee that nothing else will be scheduled on it.  You can up the priority to try to squeeze others out though.
Geert GOracle dbaCommented:
i built a very lightweight threading unit once:
check this Q

The question contains the uQueuedThreads unit.
Basically a FIFO queue monitored by a thread.
When a task is added to the queue, the monitor will start a new thread
the new thread will be given the information from the queued item

It doesn't contain Processor assignments yet.
This would be set in the thread create procedure

manually setting the thread affinity to the next cpu like this:

constructor TQueuedThread.Create(aInfo: TThreadQueueData; aOnThreadDone: TNotifyEvent);
  inherited Create(False);
  FreeOnTerminate := True;
  OnTerminate := aOnThreadDone;
  fReturnInfoMsg := TStringList.Create;
  fInfo := aInfo;
  fReturnInfo := nil;
  if Assigned(aInfo) then
    fReturnInfo := aInfo.FReturnInfo;
  // set thread affinity to next cpu
  lCpu := lCpu*2; // or lCpu Shl 1
  if lCpu > lMaxCores then lCpu := 1;
  SetThreadAffinityMask(Handle, lCpu);

Open in new window

lCpu is a variable in the uQueuedThread unit
lMaxCores also and initialised as

  info: TSystemInfo;
      lmaxCores := info.dwNumberOfProcessors;

Open in new window

magnussonmsAuthor Commented:
I will need more time to find out what I can realistically do as my programming experience  is almost exclusively in Fortran and Delphi. Many thanks for most valuable help.
Geert GOracle dbaCommented:
check the pipeline technique on OTL

I got some multi threading programs working with OTL, so i thought i'd come back on this.

 you indicated you need to process samples
pipeline lends itself to work in stages > first process is to get a sample, next process it (in different ways), capture the processed data and store it

pipeline := Parallel.PipeLine.Throttle(10000)

Open in new window

Each procedure of the Stage is something like this:
procedure TDataProcessor.ProcessSample(const input, output: IOmniBlockingCollection;
  const task: IOmniTask);
  Value: TOmniValue;
    while not Task.Terminated do
      if input.TryTake(Value, 1000) then
        // Process the Value (can be any object)
        // when finished ... pass the calculated data to the next step
        repeat until TryAdd(ProcessedData, 1000);

Open in new window

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 4
  • 2
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now