We help IT Professionals succeed at work.

Problem with multi-threads

Alvein
Alvein asked
on
1,010 Views
Last Modified: 2008-01-09
Hello,

This is VC++ 6.

I'm coding a multi-thread application, and have a problem. At random times, not all the threads ends. Sometimes a few of them stay running forever.

Before to go into the threads internal coding, I need to be sure the "framework" I'm using is ok.

So this is the basic source code, with all the unnecessary stuff removed.

Take into account this is the code for a DLL. Not sure if this is important. As you can see, there are two exported functions. The first one, StartSession(), creates and starts the threads. They run forever until the 2nd function, StopSession() is called.

Inside the While block in the ThreadProc() function, there's a bunch of If's with some function calls, but nothing that could make the thread to get hung.

I'm waiting for your comments.


Thanks in advance.


------->8-------

BOOL bRunning=FALSE;

ULONG ulTotalSlots=128; // Let's say it's a constant value, for simplicity

HANDLE *ahThread;

....

unsigned int __stdcall ThreadProc(void *PIndex) {
  while(bRunning) {
    ....
    }
  _endthreadex(0);
  return(0);
}

VOID WINAPI StartSession(VOID) {
ULONG ulK;
  if(!bRunning) {
      bRunning=TRUE;
      ahThread=(HANDLE *)calloc(ulTotalSlots,sizeof(HANDLE));
      for(ulK=0;ulK<ulTotalSlots;ulK++)
        ahThread[ulK]=(HANDLE)_beginthreadex(NULL,0,ThreadProc,(void *)ulK,0,NULL);
      }
}

VOID WINAPI StopSession(VOID) {
ULONG ulK;
  if(bRunning) {
    bRunning=FALSE;
    (void)WaitForMultipleObjects(ulTotalSlots,ahThread,TRUE,INFINITE);
    for(ulK=0;ulK<ulTotalSlots;ulK++)
      CloseHandle(ahThread[ulK]);
    free(ahThread);
    }
}
Comment
Watch Question

Hi Alvein,

This code:

  if(!bRunning) {
      bRunning=TRUE;

Is not thread safe. If one thread sets 'bRunning' while the other is between these two lines of code your locking process will fail.

Either use semaphores or increment/decrement the lock value. Semaphores are safer, look up 'mutex' in the help system.

Paul

Author

Commented:
Hello,

Thanks for your reply.

I understand your reasoning, but, notice how the two if(!bRunning) statemets used in the code are inside two top level functions never called from a thread (at least not from the threads this DLL creates). In a few words, no thread sets the bRunning value. Just checks its value, and nothing more.
jkr
CERTIFIED EXPERT
Top Expert 2012

Commented:
It *might* work here, but I agree with Paul. I simply wouldn't do it like that.You're talking about a framework, and there will be a moment where you might start to extend that framework still carring potentially dangerous legacy code, so do it right from the beginning, e.g. using a proper Win32 primitive like a sync object, e.g.

HANDLE g_hEvent;

BOOL IsRunning () {

    return WAIT_TIMEOUT == WaitForSingleObject(g_hEvent, 1);
}

void StopRunning () {

    SetEvent(g_hEvent);
}

void StartRunning () {

    ResetEvent(g_hEvent);
}

// ... setup the event properly by calling 'CreateEvent()'

unsigned int __stdcall ThreadProc(void *PIndex) {
  while(IsRunning()) {
    ....
    }
  _endthreadex(0);
  return(0);
}

VOID WINAPI StartSession(VOID) {
ULONG ulK;
  if(!bRunning) {
      StartRunning();
      ahThread=(HANDLE *)calloc(ulTotalSlots,sizeof(HANDLE));
      for(ulK=0;ulK<ulTotalSlots;ulK++)
        ahThread[ulK]=(HANDLE)_beginthreadex(NULL,0,ThreadProc,(void *)ulK,0,NULL);
      }
}

VOID WINAPI StopSession(VOID) {
ULONG ulK;
  if(bRunning) {
    StopRunning();
    (void)WaitForMultipleObjects(ulTotalSlots,ahThread,TRUE,INFINITE);
    for(ulK=0;ulK<ulTotalSlots;ulK++)
      CloseHandle(ahThread[ulK]);
    free(ahThread);
    }
}

Author

Commented:
Hi,

Well, thanks for the tips. Apart of the proposed changes, I will have to wait a big while to see if something happens (or does not). If the problem persist, I guess I will have to post the real ThreadProc(), but let's hope not.

....

There's a really weird thing I've seen by using this DLL. The caller does create a global array and pass the pointer to the DLL. The ThreadProc() function updates this array, but each thread has it's own array slot and does not interfere with the rest, so there's no need of to synchronize anything, right? Well, I added a "debug" field to the array base type, and I'm setting it to TRUE at the beginning of ThreadProc(), and resetting to FALSE near the end, just before _endthreadex().

When look the contents of that field, in the caller, after calling StopSession(), there's a bunch of slots with the debug field still being TRUE! only after about a second, everybody is FALSE. If StopSession() is visually a blocking function, what's the reason for some threads continue running (which is what I can infere of this behavior) for a little time after an explicit WaitForMultipleObjects(,,TRUE,INFINITE)?

Thanks again.
>>>> bRunning = TRUE;
>>>> Is not thread safe.

Actually, that doesn't matter in this case provided StopSession is not called prior to the last thread was started.

>>>> At random times, not all the threads ends.

Actually the code above seems safe - beside of one weak point that I will explain below - provided the global bool 'bRunning' is really global, what means that if there are more than one cpp file involved you would need to define the global variables in one cpp file - most  likely in the main module - and have an external reference in aller other cpp files. In case of dlls you would need to 'export' all global variables.

The week point I spoke of is the " bRunning=FALSE; " statement in the StopSession function. After bRunning is false and prior to calling WaitForMultipleObjects there is a (little) chance that one of the threads already exited. By calling endthreadex the handle already was closed (freed). So it wouldn't get to signal state anymore what might cause the WaitForMultipleObjects to hang infinite....

Generally, there are a few unnecessary code parts which I would recommend to change:

1. unsigned int __stdcall ThreadProc(void *PIndex)

When using beginthreadex the prototype of a thread procedure is "void threadproc(void*)". That is different thant to using CreatedThread. To keep it simple and clean you simply should take the prototype not making unneeded castings.

2. __stdcall WINAPI

You don't need the __stdcall specifier nor any other (equivalent) specifier like WINAPI  as long as you are not using mixed-language libraries. Note, __stdcall definitively isn't necessarty when a function has only one argument (cause the order of the arguments isn't different for one argument). All these specifiers are an inheritage of a long (and evil) past which should be left behind wherever possible (esspecially if you ever intend to write portable code.

3. _endthreadex(0);

Remove that statement as all threads correctly end without it if reaching (one of) the return statements in the thread procedure. As told above that statement most likely is responsibly for the hung up as it frees the thread handles.

4. calloc

If 'ulTotalSlots' really is a constant you should init a global array for the handles. If not you should define a maximum number of threads and create an array on the the stack like that:

  enum { MAX_THREADS = 256 };

  HANDLE ahthread[MAX_THREADS] = { NULL };

You still could use 'ulTotalSlots' as an upper bound. Note, in C++ you should create dynamic memory by using the 'new' operator. But here it isn't necessary (and an additional error source).

5. _beginthreadex

I couldn't see any need for beginthreadex as you didn't use any of the extra arguments. Use beginthread for simplicity reasons.

Regards, Alex







 

Author

Commented:
Hello,

After the last changes, SetEvent(), etc., the program is leaving more threads running with no apparent reason. Last night they were 7, compared with the 2 of the previous version.

@ itsmeandnobodyelse,

>>>> In case of dlls you would need to 'export' all global variables.

The bRunning variable is only used in the main module.

>>>> By calling endthreadex the handle already was closed (freed). So it wouldn't get to signal state anymore what might cause the WaitForMultipleObjects to hang infinite....

According to MSDN, _endthreadex() does not close the thread handle.

>>>> You don't need the __stdcall specifier nor any other (equivalent) specifier like WINAPI  as long as you are not using mixed-language libraries.

The caller is coded in VB.

>>>> As told above that statement most likely is responsibly for the hung up as it frees the thread handles.

I will try this as soon as I can. I'm not in front of my PC right now.

>>>> If 'ulTotalSlots' really is a constant you should init a global array for the handles.

It's a global variable and its value is set by the caller (thru another exported function). What's the problem with calloc() anyways?


Thanks for your comments.
>>>> According to MSDN, _endthreadex() does not close the thread handle.

You are right, but it may influence the signal status. Actually, you  should/could try to remove endthreadex statement. If you have no endless loop somewhere else it seems to me the only thing what could prevent the thread handle from getting signalled.

>>>> The caller is coded in VB.

As I told __stdcall has no influence on functions with less than two arguments. AFAIK VB has the same calling conventions than VC.

>>>> What's the problem with calloc() anyways?

No problem. You can't 'delete' storage that was allocated by malloc, calloc or realloc. And you can't 'free' memory allocated by 'new'. Using malloc/calloc/realloc needs a cast while 'new' is the C++ way. Furthermore for class types the constructor wouldn't be called if using 'alloc functions.. Generally, you shouldn't use C allocation in C++ cause there isn't any advantage (but some disadvantages). Moreover, as I told for the purpose above you don't need dynamic memory but should use an array of maximum size allocated on the stack (or using a class member).

Regards, Alex



Author

Commented:
Hello,

Well, after remove the _endthreadex() statement, the half of the total threads left running after a call to StopScrape(), so I continued using _endthreadex(). Not sure what follows now.

In the meanwhile, please help me to synchronize these two routines. They are called from the IPThread() code. At first sight, they are not able to make the thread code hangs. However, they access and modify a global resource, and I need to be sure that the multi-threading stuff is not skewing the results, and in fact, is what it's doing:

unsigned long AddToResults(char *szId,unsigned long ulTimestamp) {
unsigned long ulK;
  for(ulK=0;(ulK<ulTotalSlots)&&strcmp(arResults[ulK].szId,szId)&&arResults[ulK].ulTimestamp;ulK++);
  if(ulK<ulTotalSlots)
    if(!arResults[ulK].ulTimestamp) {
      strcpy(arResults[ulK].szId,szId);
      arResults[ulK].ulTimestamp=ulTimestamp;
      }
    else
      if(arResults[ulK].ulTimestamp<ulTimestamp)
        arResults[ulK].ulTimestamp=ulTimestamp;
  return(ulK);
}

int IsFoundInResults(char *szId) {
unsigned long ulK;
  for(ulK=0;(ulK<ulTotalSlots)&&strcmp(arResults[ulK].szId,szId)&&arResults[ulK].ulTimestamp;ulK++);
  return((ulK<ulTotalSlots)&&arResults[ulK].ulTimestamp);
}

Don't pay much attention to the code itself. AddToResults() inserts one item in a global array, and IsFoundInResults() is TRUE if a given item is found in that global array. I need to block that array the way only one thread at time can insert or search for items there. By not doing this, sometimes the array ends with duplicated items. I hope you get the point.


Thanks again.

Author

Commented:
Paul,

It sounded like a new question, but read well, two functions called from the same thread code that has a problem. I'm trying to examine all the possible sources of failure. The functions does not behave like I want, so maybe there lies the bug. I'm not the expert in such stuff, I just want to give enough details for the rest of people can give a diagnose, since the first recommendations I've got are making the things worse.


Thanks.
>>>> I need to block that array the way only one thread at time can insert or search for items there.

For synchronisation you may use that little helper class:

// mutex.h
#ifndef MUTEX_H
#define MUTEX_H

#include <windows.h>

class Mutex
{
private:
    CRITICAL_SECTION mutex;
public:
    Mutex() {  InitializeCriticalSection(&mutex);  }

    // enter critical section and block others threads or wait
    void enter() { EnterCriticalSection(&mutex); }
   
    // leave critical section and release any other thread blocked
    void leave() { LeaveCriticalSection(&mutex); }
};

#endif

#endif // MUTEX_H


Use it by either defining a global (and shared) Mutex object or by adding it as a member to an already shared class (object).

// main.cpp

   ...

  #include "mutex.h"
  Mutex theMutex;   // the one and only

// thread.cpp

  #include "mutex.h"
  extern Mutex theMutex;


  ...

  unsigned long AddToResults(char *szId,unsigned long ulTimestamp)
  {
        theMutex.enter();

        // put here your original code
        ...

        theMutex.leave();
  }

  int IsFoundInResults(char *szId)
  {
        theMutex.enter();

        // put here your original code
        ...

        theMutex.leave();
  }


Note, if you have additional returns (or exceptions) you *have* to call 'theMutex.leave() ' prior to returning or you'll get a deadlock.

>>>> remove the _endthreadex() statement, the half of the total threads left running after a call to StopScrape()

How did you know that?

Note, after the call to WaitForMultipleObjects you should wait a little while (some milliseconds) before freeing the handles. The handle may get signalled though the thread actually hasn't terminated.

>>>> I'm trying to examine all the possible sources of failure.

Yes. If you are operating on (globally) shared structures and arrays where some were inserting while others were evaluating, there is a good chance of running into an infinite loop.

Regards, Alex


 


Author

Commented:
Hello,

>>>> For synchronisation you may use that little helper class

Ok. Thanks. I'll do the edits and tell you.

>>>> How did you know that?

I don't remember if it was exactly the half. I did not count it, but there was a lot of threads still running, seconds, minutes after a call to StopSession(). I'm using Sysinternals' Process Explorer to check this condition.

>>>> after the call to WaitForMultipleObjects you should wait a little while (some milliseconds) before freeing the handles.

Isn't INFINITE enough? ;)

>>>> If you are operating on (globally) shared structures and arrays where some were inserting while others were evaluating, there is a good chance of running into an infinite loop.

As I said, I'm just trying to examine everything. I don't see how these routines could get hung in an infinite loop, but who knows.


Thanks again.

Author

Commented:
BTW, isn't required to add a destructor with a call to DeleteCriticalSection()?
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
>>>> Isn't INFINITE enough? ;)

INFINITE only works as long as you are waiting. I thought, you already would reach the for loop that happens after the call of WaitForMultipleObjects. Then, all thread handles were signalled but some of the threads might be still not be removed from memory. If your main thread closes immediately after wait, some of the threads still might be terminated by force what could mean that not all resources were freed.


>>>> I'm using Sysinternals' Process Explorer to check this condition.

Actually, the condition easily could be verified by simply using a debugger. Put the breakpoint at the while condition that checks the stop flag and every thread *must* break there. If you don't know how to identify the threads you either could assign the current thread id to a temporary variable

    unsigned long threadID =  __threadid();
    while (bRunning)
     ...

or you make a copy of the thread function for any thread - adding a unique suffix to each function. Then, you need to set breakpoints to each of these functions:

   unsigned int __stdcall ThreadProc_A(void *PIndex) {
   while(bRunning)   // set the breakpoint here
   {

Actually, if your synchronisation is done well, *all* threads will recognize the stop flag and the call to WaitForMultipleObjects definitively returns.

>>>> isn't required to add a destructor with a call to DeleteCriticalSection()?

Yes, it would be better cause it would free all resources associated with. However, if the mutex was defined globally or as  a static class member resources were freed anyway.

Regards, Alex


Author

Commented:
I had to put a "Sleep(10000);" statement just after WaitForMultipleObjects(). That's ridicule. I've never been this way before. I don't have idea about what to do.

Author

Commented:
Look this, I've changed the line

(void)WaitForMultipleObjects(ulTotalSlots,ahThread,TRUE,INFINITE);

To

for(ulK=0;ulK<ulTotalSlots;ulK++)
  (void)WaitForSingleObject(ahThread[ulK],INFINITE);

And after some tests, the threads are ending ok. Aren't those lines the same???? The call to StopSession() is taking rather more time than expected, but who cares if the bug disappeared?
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.