Link to home
Start Free TrialLog in
Avatar of sephi
sephi

asked on

Clean-up IPC resources after a process crash

Unix does not clean up IPC resources (semaphores and shared memory) after a process crash.
Is there a way to bypass the problem?
Avatar of sephi
sephi

ASKER

I'll, also appreciate a reference to online or paper literature.
here the result of man ipcrm, which is rhe command to remove ipc giving for each, id or key, whis are furnished by ipcs (ipc status)


 ipcrm(1)                                                           ipcrm(1)

 NAME
      ipcrm - remove a message queue, semaphore set, or shared memory
      identifier

 SYNOPSIS
      ipcrm [option]...

 DESCRIPTION
      The ipcrm command removes one or more specified message queue,
      semaphore set, or shared memory identifiers.

    Options
      The identifiers are specified by the following options:

           -m shmid       Remove the shared memory identifier shmid from the
                          system.  The shared memory segment and data
                          structure associated with it are destroyed after
                          the last detach.

           -q msqid       Remove the message queue identifier msqid from the
                          system and destroy the message queue and data
                          structure associated with it.

           -s semid       Remove the semaphore identifier semid from the
                          system and destroy the set of semaphores and data
                          structure associated with it.

           -M shmkey      Remove the shared memory identifier, created with
                          key shmkey, from the system.  The shared memory
                          segment and data structure associated with it are
                          destroyed after the last detach.

           -Q msgkey      Remove the message queue identifier, created with
                          key msgkey, from the system and destroy the
                          message queue and data structure associated with
                          it.

           -S semkey      Remove the semaphore identifier, created with key
                          semkey, from the system and destroy the set of
                          semaphores and data structure associated with it.

      The details of the removals are described in msgctl(2), shmctl(2), and
      semctl(2).  The identifiers and keys can be found by using ipcs (see
Avatar of sephi

ASKER

I'm talking about a busy client/server environment. Multiple clients are communicating with multiple servers. If a client or a server process crashes, the resources must be automatically cleaned up.

If there is no way to tell the operating system to clean up, a daemon is needed to trace the IPC resources usage.

I was hoping to get a reference to such a daemon (preferably in C++).
How you define an IPC resource which can be removed? An IPC resource currently not being attached can be attached later. So, obviously, we can't simply remove IPC resources simply based on its current attach count.
Avatar of sephi

ASKER

You're right. The daemon must keep track of what resources are attached to which process. It has to detect crashed processes and remove their resources.
ASKER CERTIFIED SOLUTION
Avatar of faraj
faraj

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I can think of 4 ideas
1. Write the programs such that  it does not dump core.
I think you should defenitely consider this option.
2. reboot the machine !!!!
3. If the process crash, have sigv handler, where you release the IPC resource ( I think this is a feasible solution)
4. In C++ , have a global exception handler, where you can handle these kind of exceptions.
to sephi:
Simply keeping track of attached processes doesn't tell you whether an IPC resource can be removed. A similar example is, if a process crashs, kernel will automatically close all its opened file descriptors but not files. Can you remove those files by simply keeping track of whethere it is opening by some processes? No! With same reason, an independent daemon can't remove an IPC resource by simply checking no process attaching to it. this is because, like files, IPC resource can be attached/used across process sessions. So, on crashing of a process, it is not kernel can't remove IPC resource but it doesn't want to.

The best solution for you is redesign your application to allow reuse or defered remove of those IPC resources instead of clean them immediately.
to faraj:

process may crash without going through your signal handle at all.
Hi ,

As I earlier said this is basic OS feature of Unix that it does not do reference counting
of resources like windows.So you do not have an option to detect what process is using which resource
Even kernal does not keep information about who has opened a resource and how many times.

So you have following basic options

1)If your apllication or set of application only are using a particular resource,
try to put the cleanup code in application startup.So when application is reinvoked it
will do cleanup for its previous instance.
2)Put the cleanup code in all signal handlers,anyway this will not help from crash.

If you are very keen to use a daemon.There is no default facility.For your need you may code
something like this.

Use a global table of resource information in a shared memory.Each record like
1)Resource Id
2)Global count of resource
3)liked list of resource data

each node of resource data
<process id of process who opened resource> <no of times opened (Referece count)>
For all resource creation/open calls use wrappers like
msgget->my_msgget
shmget->my_shmget etc

in all my_xxx do the following
1)open shared memeory
2)update/create resource data for this process.also update global count.
3)call actual msgget/shmget etc.


Create a process(daemon) doing the following
1)open the shared memory
2)while(1) do the following
3)for each resource check the validity of pid stored in resource data node.If process
referred by pid is dead update global count.(You may use pstat_getproc() routines or
/proc structure).
4)If global count dropped to zero ,remove the resource.
5)Sleep for some time.

Many points to take care
1)Sleeping period in daemon should be small so that pid values should not get reused in that period.
2)shared memory should be based on some ket which is globally accessible to both app & daemon

Hope this help you.If you have any doubts please feel free to express.

Regards,
ufolk123
Avatar of sephi

ASKER

I don't trust the signal control option proposed by faraj.

I prefer ufolk123's proposal to use a daemon.

I would like to understand, however, why the daemon needs to hold the resource list in shared memory?
Is it to protect against the daemon itself being crashed?
to ufolk123:

Unix kernel does reference count many of its resource usage include file descriptor, IPC objects, etc.. In IPC case, you can even see how many process is attached to an IPC resource using standardized API. The reference count will be reduced either the process released the reference (detached, closed, etc.) or the process crashed.
Avatar of sephi

ASKER

I know the ipcs command can show IPC usage. Is there an API for that? Does it show reference count?
It depends on which IPC resource. For share memory, you definitely can get the number of attached processes with shmctl()'s IPC_STAT option.
Hi,

regarding kejin's comment,there is no reference counting done for semaphores and message queues.Also shmctl IPC_STAT takes care of number of attch issued not the number of opens done by using shmget.So If a process just does shmget and dies without doing attach there is no way you can  trace it.

As par sephi's question i use shared memory or memory mapped files to keep track fo resource as daemon address space is differnt.As all apps need to register and daemon need to read the inofmration at a global place ,this is needed.There is other way of doing it using sending a resiteration messages and resource closes messages to daemon in that case we can use unix domain sockets/pipes for com between app and daemon and daemon can take care of resource list in a local heap.

Regards,
Rajesh
Hi Sephi,

finally we can say there are two alternatives to implement this daemon.

1)Use a shared memory to hold resource list.Each wrapper call(my_xxx) will access the shared memory ,update the info and daemon will poll on the shared memory.One semaphore (with UNDO option set)may be required for sync access to resource table.
2)Keep the table local to daemon.This way the wrapper calls will issue a message to a port being listened by daemon.This can be using a namedp pipe or unix domain socket conn opened for listening from daemon side.But this will have problems like comm traffic if number of client apps are many but in this case resource table is more safe(centerlised control).


Regards,
ufolk123
to ufolk123:

1. regarding to your statement of "there is no reference counting on semaphores and message queue", based on what? Can not get reference count via a public API doesn't means kernel doesn't do reference count. For instance, unix kernel does reference counting on file descriptor but there is no portable public API to find out what is the current count on a given fd. Also, for semaphores, you definitely can use semctl()'s IPC_STAT to get the number of processes waiting on a given semaphore. For mesage queue, this question doesn't make sense at first place. Because even a process crashed, the massage post by it before the crash shouldn't be removed and therefore neither the message queue itself.

2. regarding to your statement of "if a process only does a shmget() but not shmat(), there is no way you can trace it". This is exactly what my point. I.e. you can not based on the number of attached processes to decide whether  the IPC resource can be removed from the application point of view. Not only a process may only call shmget() but not shmat() immediately, a process can even not call shmget() at begining untill at a very later time when it needs to use the share memory segment. So, a daemon can't based on reference count or attach count on a share memory sagement to decide whether it is proper to remove it. So, even if the reference count or attach account are both zero, the daemon still has no way to know whether there is a process (either alive or not started yet) may get/attach this share memory segment later.

To all of us:
if installing signal handlers is an acceptable solution, then, the best solution is simply a atexit() handler.
Hi Kenji,

I fully agree with you that Unix does keep (a typical )refernce count for resources.
But my previous answer was focused at sephil's problem.

Consider the case1

Process 1 creates a semaphore using semget call.
Process 1 is crashed.
Now the semaphore will remain in the memory.But if had process1 been
terminated succesfully,he might had done a semctl.

Now consider case2

Process 1 creates a semaphore
Process 2 access the semaphore created by process1 using semget call.
Now even if either of Process1 or Process2 crashes semaphore should
not be removed unless until both logical owners of semaphore
(Process1 and Process2) are terminated.As any one of them can access
the semaphore by using desc returned by system.
So here comes the point.It is basically the acess function(which returns
a valid handle/descriptor to a resource) which should be useful
for reference counting for most cases.
As you know this is what is done by Windows and i feel should
work for catering the need of an application dev.

I do not want to prove that Unix does not do any logical ref counting
But it does not solve the common nightmare of developers where when
all logical owners of a resource are terminated but nobody issued
the ctl commnad, resource is not reclaimed.

I feel sorry if i sounded like a critique to you in my previous answer.

Regards,
ufolk123
As I said many times that, unix kernel, which doesn't remove IPC resources in above case, isn't b/c it is incapable to do so but it doesn't want to. The IPC resources are allowed to be used across process sessions. For example, a process can create an IPC resource and terminate without have to bring do the created IPC resource with it when no any other processes using it. Then, long time after the original creator teriminate, another process may start and get this IPC resource. So, during this long time window, there is no any process refering to this IPC resources (i.e. its reference count is 0). I am not very formula with windows but please understand above unix ipc behavior isn't a "problem" but a feature. Just like a normal unix file lives accross kernel session, IPC resources live across user process sessions by design. If developers feel this is a nightmare comparing to what they enjoy in windows world, they should, as I suggested in  one of previous comment, redesign their applications.

You don't need bother to prove unix doesn't do logical ref counting. It is well documented that unix kernel does reference count on file descriptors.
If you doesn't like the proposition
to use signal trap
Another solution using fork is to run all the program in the sons. The father is looking
for the exit status. If the exit status is not 0, the father will clean
son's resources.

Regards ,
 Faraj.