SIGALRM lost ?

I'm developing a multithread C++ app on a 2.4.19 Linux system.  Occasionally( say 1% of the time ), the process got stuck and I don't understand why.

I have 2 threads:  The master thread repeatedly checks an event list every minute.  It then process an event and removes it from the event queue when the event start time approaches.   The worker threads monitors a message queue and reload the event list when requested to do so.

After much debugging, I saw on one of our systems the alarm( 60 ) was called but sigwait() didn't catch SIGALRM.  How could this have happened?  
//------------------ main.cpp ----------------------
pthread_mutex_t DataLock = PTHREAD_MUTEX_INITIALIZER;
Schedule MySchedule;
 
void ProcessSchedule()
{
    if ( MyScheduleList.nextEvent().startTime() > time( NULL ))
    {
        MyScheduleList.nextEvent().run(); // launch a child process
        MyScheduleList.pop(); 
    }
 
    int waitAgain = time( NULL ) - MyScheduleList.top().startTime();
    if ( waitAgain > 60 ) waitAgain = 60;
    if ( waitAgain < 1 )  waitAgain = 1;
 
    alarm( waitAgain ); // check schedule again in about 1 min
}
 
 
int main()
{
    // ---- block all signals ---
    sigset_t allSignals;
    sigfillset( &allSignals );
    sigdelset( &allSignals, SIGCHLD );
    pthread_sigmask( SIG_BLOCK, &allSignals, NULL );
 
    struct sigaction ignore;
    ignore.sa_handler = SIG_IGN; 
    sigaction( SIGCHLD, &ignore, NULL );
 
    // --- launch message queue listener thread ---
    pthread_t threadId;
    pthread_create( &threadId, NULL, MsgHandlerFunc, NULL );
 
    while( true )
    {
        int signalCaught = 0;
        sigwait( &allSignals, &signalCaught );
 
        if ( signalCaught == SIGALRM ) 
        {
            if ( pthread_mutex_lock( &DataLock ) == 0 )
            {
                ProcessSchedule();
                pthread_mutex_unlock( &DataLock );
            }
        }
        else if ( signalCaught == SIGUSR1 ) // for debug only
        {
            if ( pthread_mutex_lock( &DataLock ) == 0 )
            {
                MySchedule::Dump();
                pthread_mutex_unlock( &DataLock );
            }
        }
        else // unexpected signals
        {
            Debug::Out( "caught signal %d, exit application", signalCaught );
            break;
        }
    }
 
    return 0;
}
 
 
 
// ------------------- Msg queue handler thread ---------------------
void* MsgHandlerFunc( void* pData )
{
    MsgQueue myMsgQueue = MsgQueue( MY_KEY );
 
    while( true ) 
    {
        // receive message
        if ( myMsgQueue.receiveMsg() == MsgQueue::MSG_RELOAD_SCHEDULE )
        {
            if ( pthread_mutex_lock( &DataLock ) == 0 )
            {
                MySchedule.reloadFromFile();                
                ProcessSchedule();
                pthread_mutex_unlock( &DataLock );
            }
            myMsgQueue.sendMsg( MSG_OK );
        }
    }
 
    pthread_exit( NULL );
    return NULL;
}

Open in new window

LogicInnovationsAsked:
Who is Participating?
 
Infinity08Connect With a Mentor Commented:
You're right. So, that doesn't explain the problem either.

Maybe I'm misunderstanding your problem.

>> Occasionally( say 1% of the time ), the process got stuck and I don't understand why.

Could you give a bit more detail ? Under what conditions does it block ? Does it "unblock" again ? After how much time ? How does this behavior manifest itself to the user ?
0
 
Infinity08Commented:
The first thing I think of is that signals don't queue. ie. if a second SIGALRM signal is sent before the first is handled, there will still only be one SIGALRM signal, not 2.

Would that explain your problem ?
0
 
LogicInnovationsAuthor Commented:
Theoretically speaking it's possible, but it's hard to believe my first thread get starved for over a minute.
0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LogicInnovationsAuthor Commented:
Actually even 1 SIGALRM will ensure the continuation of the app, so no that won't explain my problem.
0
 
Infinity08Commented:
Sorry for the delay - this question somehow disappeared off my radar.

Ok, I had a look over your code (after my initial shot in the dark), and I notice that you don't block the signals in the MsgHandlerFunc thread (you only block the signals in the main thread).

You need to make sure that the SIGALRM (and other signals you want to handle with sigwait) are blocked in every thread, except the main thread with the sigwait, or the signal might be delivered to the wrong thread.
0
 
Infinity08Commented:
>> You need to make sure that the SIGALRM (and other signals you want to handle with sigwait) are blocked in every thread, except the main thread with the sigwait

That was an unfortunate way of saying it. Of course the signals have to be blocked in the main thread too, but they will be handled in that thread by sigwait.

So, to avoid confusion : block the signals in all threads. And have the sigwait in your main thread, which will then be the only thread that gets the signals.
0
 
LogicInnovationsAuthor Commented:
Actually according to POSIX standards,  a thread inherits signal mask of the thread that created it.  So as long as a signal is masked first thing in main, it will be masked everywhere.

0
 
LogicInnovationsAuthor Commented:
Turned out the version of pthread library on the system is llinuxthread, not NTPL.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.