Link to home
Start Free TrialLog in
Avatar of joeb62
joeb62

asked on

posix semaphore deadlock

Hi,

I have a problem with the posix semaphores.  If the program is in the critical section with sigkill stopped, then the semaphore remains locked. Does anyone know a solution ?

The productive code handle signals/exit/pthread_exit/... and unlock the semaphore :=)

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <semaphore.h>
#include <errno.h>

int main (char **v, int c)
{
  sem_t           *mysem;
  int              ni, cnt= 0, sv;
  
  if ((mysem= sem_open ("test004", O_CREAT, S_IRWXU, 1)) == SEM_FAILED)
    {
      printf ("semo open error %i\n", errno);
      exit (1);
    }

  if (sem_getvalue(mysem, &sv) == 0)
  printf ("semval %i\n", sv);
 
  while (1)
   {
     sem_wait (mysem);
     printf ("start critical section\n"); 
     usleep (3000000);
     sem_post (mysem);
     printf ("stop critical section\n");
     usleep (1000000);
   }   

  return 0;
}

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Brian Utterback
Brian Utterback
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Does lead to the question of what program is using kill -9 on your process and why?
Avatar of joeb62
joeb62

ASKER

First of all thanks for the answers :=)

>If you are protecting a critical section, then you might be better off using a mutex
All internal synchronizations use a mutex. I need this for the inter-process synchronization (some processes in a server system). The problem occurred with an AIX system. For Linux and Solaris, I have better solutions.

>Does lead to the question of what program is using kill -9 on your process and why?
hm, a admin (customer) kill a process with -9 and hit the 600 microsecond time-window :=(
And then it was over with the interprocess synchronization - deadlock !

>There is no easy mechanism for handling the case of a process that is killed by a "kill -9" with POSIX semaphores.
This is my question :=) I have not found any solution (semaphore remove is not possible). Today I changed the synchronization to the old System V semaphores - it works. But this is a software change and not a fix.
Well, it gets kind of ugly, but you could use a timed wait on the semaphore. Store the pid of the user in that is in the critical section. When the timeout occurs, then the thread that times out can check the pid, then call the semaphore again. If it times out a second time with the same pid it could assume that the process has exited and post the semaphore.
Or it could actually check for the process with that pid.
a signal handler should be able to release the semaphore in case of a kill.

but for the above scenario i would use neither a semaphore nor a mutex but simply a shared boolean flag.

Sara
If you could use a shared boolean flag then it isn't a critical section, by definition.
the sequence

if (shared_flag)
{
    do_something();
    shared_flag = false;
}

Open in new window


is absolutely safe for the case where shared_flag was set by another thread which itself is not dependent on the flag. it is like green traffic lights where the cross traffic waits on separate red lights. synchronization by flags is not fast enough for time-critical scenarios but for to handle an initial synchronization order as it is here, it is optimal.

Sara
I don't understand your code segment. Where does the waiting occur? Why would you set the flag to false after you are done with the critical segment? How would that prevent more than one thread from executing do_something() at the same time?
main thread:

thread1_tl = green;
thread2_tl=red;

starts both threads

thread1:

while (1)
{
 if (thread1_tl==green)
 {
      do_critical();
      thread1_tl=red;
      thread2_tl=green; // or tell_scheduler_that_i_was_done(1);
 }
else
    do_some_other_action_or_sleep();
}

thread2:

while (1)
{
 if (thread2_tl==green)
 {
      do_critical();
      thread2_tl=red;
      thread1_tl=green;  // or tell_scheduler_that_i_was_done(2);
 }
 else
     do_some_other_action_or_sleep();
}

Open in new window


you can have a third thread (scheduler) which decides which thread is next (and is the only one which would set the flags to green). that also would be the way if more than two threads need to be synchronized. of course instead of polling could wait on an event.

note, waiting on semaphores or conditional waits are the better solutions in many cases. my point is that for simple scenarios, simple solutions which are slower but don't have drawbacks often can be a way out.

Sara
This is a limitation with POSIX semaphores:

If the process is crashed in critical section (before call to sem_post), the semaphore will be in locked state (unusable). The terminated process is not unlock semaphore.

In this scenario, it is expecting to set semaphore to be auto-posted back when  application terminates. This is SEM_UNDO facility that is available in System-V semaphores, if processes terminates abnormally. But this facility is not available in POSIX semaphores.

An alternatives can be suggested are as follows.  

* Use System-V semaphores(if there is no chance to port).

* Process resources are automatically cleaned up when a process is terminated:
   Use "file locks".
   Replace all your calls to sem_wait() & sem_post() with call to lockf( fd, F_TLOCK, 0 )
   & lockf( fd, F_ULOCK, 0 ) respectively.

I thought that this is still not a perfect solution, because basically lockf is not designed for shared memory lock & unlock and also not alternative to semaphores.  

Any Better solution would be greatly appreciated.

Thanks,
Shiva
blu's comment #a38275951 is not a solution since a lock by mutex or semaphore also would dead-lock given that the process or thread was killed after the lock was established.

there are only two ways to accomplish a safe termination:
(1) a signal handler would delay termination until all locks were released (and resources were freed).
(2) don't use a system lock but wait on events or do polling.

i don't see an answer that could be called a solution.

Sara