?
Solved

SIGALRM lost ?

Posted on 2008-11-04
8
Medium Priority
?
721 Views
Last Modified: 2013-11-13
I'm developing a multithread C++ app on a 2.4.19 Linux system.  Occasionally( say 1% of the time ), the process got stuck and I don't understand why.

I have 2 threads:  The master thread repeatedly checks an event list every minute.  It then process an event and removes it from the event queue when the event start time approaches.   The worker threads monitors a message queue and reload the event list when requested to do so.

After much debugging, I saw on one of our systems the alarm( 60 ) was called but sigwait() didn't catch SIGALRM.  How could this have happened?  
//------------------ main.cpp ----------------------
pthread_mutex_t DataLock = PTHREAD_MUTEX_INITIALIZER;
Schedule MySchedule;
 
void ProcessSchedule()
{
    if ( MyScheduleList.nextEvent().startTime() > time( NULL ))
    {
        MyScheduleList.nextEvent().run(); // launch a child process
        MyScheduleList.pop(); 
    }
 
    int waitAgain = time( NULL ) - MyScheduleList.top().startTime();
    if ( waitAgain > 60 ) waitAgain = 60;
    if ( waitAgain < 1 )  waitAgain = 1;
 
    alarm( waitAgain ); // check schedule again in about 1 min
}
 
 
int main()
{
    // ---- block all signals ---
    sigset_t allSignals;
    sigfillset( &allSignals );
    sigdelset( &allSignals, SIGCHLD );
    pthread_sigmask( SIG_BLOCK, &allSignals, NULL );
 
    struct sigaction ignore;
    ignore.sa_handler = SIG_IGN; 
    sigaction( SIGCHLD, &ignore, NULL );
 
    // --- launch message queue listener thread ---
    pthread_t threadId;
    pthread_create( &threadId, NULL, MsgHandlerFunc, NULL );
 
    while( true )
    {
        int signalCaught = 0;
        sigwait( &allSignals, &signalCaught );
 
        if ( signalCaught == SIGALRM ) 
        {
            if ( pthread_mutex_lock( &DataLock ) == 0 )
            {
                ProcessSchedule();
                pthread_mutex_unlock( &DataLock );
            }
        }
        else if ( signalCaught == SIGUSR1 ) // for debug only
        {
            if ( pthread_mutex_lock( &DataLock ) == 0 )
            {
                MySchedule::Dump();
                pthread_mutex_unlock( &DataLock );
            }
        }
        else // unexpected signals
        {
            Debug::Out( "caught signal %d, exit application", signalCaught );
            break;
        }
    }
 
    return 0;
}
 
 
 
// ------------------- Msg queue handler thread ---------------------
void* MsgHandlerFunc( void* pData )
{
    MsgQueue myMsgQueue = MsgQueue( MY_KEY );
 
    while( true ) 
    {
        // receive message
        if ( myMsgQueue.receiveMsg() == MsgQueue::MSG_RELOAD_SCHEDULE )
        {
            if ( pthread_mutex_lock( &DataLock ) == 0 )
            {
                MySchedule.reloadFromFile();                
                ProcessSchedule();
                pthread_mutex_unlock( &DataLock );
            }
            myMsgQueue.sendMsg( MSG_OK );
        }
    }
 
    pthread_exit( NULL );
    return NULL;
}

Open in new window

0
Comment
Question by:LogicInnovations
  • 4
  • 4
8 Comments
 
LVL 53

Expert Comment

by:Infinity08
ID: 22881620
The first thing I think of is that signals don't queue. ie. if a second SIGALRM signal is sent before the first is handled, there will still only be one SIGALRM signal, not 2.

Would that explain your problem ?
0
 

Author Comment

by:LogicInnovations
ID: 22881842
Theoretically speaking it's possible, but it's hard to believe my first thread get starved for over a minute.
0
 

Author Comment

by:LogicInnovations
ID: 22888648
Actually even 1 SIGALRM will ensure the continuation of the app, so no that won't explain my problem.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 53

Expert Comment

by:Infinity08
ID: 22902912
Sorry for the delay - this question somehow disappeared off my radar.

Ok, I had a look over your code (after my initial shot in the dark), and I notice that you don't block the signals in the MsgHandlerFunc thread (you only block the signals in the main thread).

You need to make sure that the SIGALRM (and other signals you want to handle with sigwait) are blocked in every thread, except the main thread with the sigwait, or the signal might be delivered to the wrong thread.
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 22902921
>> You need to make sure that the SIGALRM (and other signals you want to handle with sigwait) are blocked in every thread, except the main thread with the sigwait

That was an unfortunate way of saying it. Of course the signals have to be blocked in the main thread too, but they will be handled in that thread by sigwait.

So, to avoid confusion : block the signals in all threads. And have the sigwait in your main thread, which will then be the only thread that gets the signals.
0
 

Author Comment

by:LogicInnovations
ID: 22905564
Actually according to POSIX standards,  a thread inherits signal mask of the thread that created it.  So as long as a signal is masked first thing in main, it will be masked everywhere.

0
 
LVL 53

Accepted Solution

by:
Infinity08 earned 1500 total points
ID: 22911637
You're right. So, that doesn't explain the problem either.

Maybe I'm misunderstanding your problem.

>> Occasionally( say 1% of the time ), the process got stuck and I don't understand why.

Could you give a bit more detail ? Under what conditions does it block ? Does it "unblock" again ? After how much time ? How does this behavior manifest itself to the user ?
0
 

Author Comment

by:LogicInnovations
ID: 23331126
Turned out the version of pthread library on the system is llinuxthread, not NTPL.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Container Orchestration platforms empower organizations to scale their apps at an exceptional rate. This is the reason numerous innovation-driven companies are moving apps to an appropriated datacenter wide platform that empowers them to scale at a …
If you are a mobile app developer and especially develop hybrid mobile apps then these 4 mistakes you must avoid for hybrid app development to be the more genuine app developer.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
Progress

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question