Years ago (in the early development of a system I am still working on), we encountered an issue concerning select().
We used to use select() for a "select server" that is embedded in an application running on a network of Linux computers. This select server allowed an arbitrary number of GUIs to connect to the computers to display a control panel for the application's engineering process.
Back then, we were using a 2.4 kernel. At that time pselect() was not part of the kernel, but was available in gcclib. We had mysterious crashes of applications that plagued us. After weeks of study, we determined that the crashes happened simultaneously with occurrences of user terminals disconnecting their ssh client connections. The problem appeared to be that on occasion a race condition involving the handling of SIGHUP signals which caused the process running the select server to crash by executing the default action for SIGHUP (which is to immediately terminate the process).
The cure for that problem was to stop using select() and use pselect() instead. It has two advantages:
1. it allows a timeout to prevent select from hanging if no data arrives, and
2. it saves and restores the signal mask as an atomic operation
Item 2. allowed me to craft a signal handler that did nothing other than set an action specifier variable to 'terminate' if the SIGHUP actually arrived at the process.
The termination problem went away for a long time and now I discovered two things:
My select server, which uses pselect() prepares the signal mask this way:
but does NOT do this:
In retrospect, it seems that I should be masking the SIGHUP signal handler when calling pselect()? (Because I don't want to use the default action of 'terminate')
Otherwise, I am not sure I understand why this code fixed our earlier problem with SIGHUP.
In other words, is just providing pselect() a sigmask to use for saving and restoring the sigmask context a solution to the problem?
Or does 'sigaddset(sigsetptr, SIGHUP);' block the default action for SIGHUP and enable the application's SIGHUP handler?
Or does 'sigaddset(sigsetptr, SIGHUP);' block the application's SIGHUP handler and enable the default system action for SIGHUP?
I also noticed that a new driver, which I just studied closely for the first time only recently, uses select() instead of pselect().
The latest symptom is that on occasion, our GUIs simply stop being able to connect to the application, and on rare occasions the application just stops running for reasons that are still unclear, and the application must be manually restarted.
Does anyone out there know if the pselect() versus select() problem that was such an issue in the 2.4 kernel is still of concern?
Should I be using pselect() instead of select() in the new driver? The new driver is running in a different thread than the original select server.