Link to home
Start Free TrialLog in
Avatar of giligen
giligen

asked on

Managing Daemon Core File(s)

Hello all,

I have an application that I'm running as a daemon. This application crashes from time to time, and I'd like to get the core file to debug it.
My questions are:

- In which script should I put the ulimt -c unlimited comand so it'll apply to the entire system (including daemons)?

- In which directory will the core file be located?

- Can I control the directory in which the core will be located?

Thanks,

-- Gilad
Avatar of jlevie
jlevie

> In which script should I put the ulimt -c unlimited comand so it'll apply to the entire system

Short of modifying /bin/bash & /bin/sh (which are usually the same executable on Linux) there's no single place that this can be done for the entire system. Each Bash shell that is started defaults to not producing core files since that's built in to the shell. Since the shell that starts a daemon determines whether the daemon will produce a core file on exception abort it is necessary to set 'ulimit -c unlimited' in that shell before starting the daemon. In the general case that's done within the /etc/rc.d/init script that starts the daemon at boot.

> In which directory will the core file be located?

In most cases that will be the root directory (/), but some init scripts may change dirs elsewhere. The core file will be placed in the current working directory of the shell that started the daemon.

> Can I control the directory in which the core will be located?

See above.
there is a file /etc/security/limits that can be used to specify different limits for different groups/users.
adding the following:

* soft core 10000
* hard core 100000

will set the core size limits, soft limit to 10000K, hard limit to 100000K - as far as I know, the soft limit is the default without changing using ulimit, and the hard is the maximum limit when changed using ulimit.

Note that a reboot is probably necessary to get these changes enforced...

As for the directory, yup, whatever directory the process is started, which is probably the root directory.
Avatar of giligen

ASKER

First of all thanks for trying to help.

I've tried both of the methods suggested:
- Performed ulimit -c unlimited and then ran my program.
I tried to produce core dump in 3 different ways (assert / Kill -3/ another way)
In all of them the program crashed but there was no core dump in neither the current directory (from which I've started the program) or in the root directory.

Then I tried the /etc/secutiry/limits.conf stuff, I've added to the file * soft core unlimited, and repeated the test(s) above but still I can not find the core dump.

My app, is using the daemon command to change from regaulr executable to daemon status.

Ideas?

Thanks!
I assume by that last comment that you mean that you have an init script in /etc/rc.d/init.d that contains something like:

...
    ulimit -c unlimited
    daemon /path-to/my-prog $my-args
...

You won't get a core file in that case because 'daemon' starts a subshell and it will default to no core files. You'd need to do:

...
    ulimit -c unlimited
    /path-to/my-prog $my-args
...

to have a core file produced.
Avatar of giligen

ASKER

Hi,

Upon execution, my program (being started as a regular executable), changes into daemon mode by using the interface int daemon(int nochdir, int noclose);

so what I do is as you've specified
ulimit -c unlimited
./myProg

and I can't get the core dump (in the root directory or elsewhere).

Thanks
does your OS user have write-permission to the root directory? If not, and daemon() is called with nochdir=0, then the core file will not get generated, as there is no permission to create the file...
Avatar of giligen

ASKER

It should have, anyway I've re-compiled it with nochdir=1 and it still doesn't work.
BUT - I don't know how and why, it created a core ONCE, and I can not re-produce it again.
What caused the core file to be created the one time it did?

I just noticed you say you are trying to cause core dump with a "kill -3" : Signal 3 is "Quit from keyboard" which the application might trap and hide (e.g. Sun's JVM uses "kill -3" to provide stack traces for all running threads!).

Try "kill -6" which is "Abort signal" - the same as calling abort() from within the code (although this should be equivalent to assert() if NDEBUG is not defined a the #include <assert.h> stage...

Cheers,
C.
Avatar of giligen

ASKER

Thanks for the Kill 3 comment.
The other to ways are:
using an assert (that causes a core dump when being run not as a daemon)
and a broken-pipe signal (14).

What I really don't understand is that in the signle time that I did get a core, nothing special was done, I just used the assert to crash it, and for a single time it worked. I know it sounds bad, and you probably say to yourself, "can't be - he must have been doing something else", but I really can't think of anything.

Thanks!
ASKER CERTIFIED SOLUTION
Avatar of cjjclifford
cjjclifford

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
reading "man assert":

       If  the  macro  NDEBUG  was  defined  at the moment <assert.h> was last
       included, the macro assert() generates no code, and hence does  nothing
       at all.

Can you change the assert() call to force coredump to abort() - the macro might not be generating code as described in the man pages....
Avatar of giligen

ASKER

I've tried your sample application and of-course it worked fine, then I started to look and search what are we doing wrong. And after a short research I found out that our applciation uses the SUID/GUID bits. Once these bits are used one must explicitly enable the creation of the core file using the folloing api call:
prctl(PR_SET_DUMPABLE, 1, 0, 0, 0)

Once I've added it, it started to work fine.
Thanks for all your support and help.
You might also want to check out "man initscript" ... /etc/initscript is used by init to start all processes and can therefore be used to set ulimits on items in inittab.
aathan, initscript is used when starting inittab programs (from man "When  the  shell script /etc/initscript is present, init will use it execute the commands from inittab.")
This is not necessarily relevant here as the author didn't state that the process that was being deamon was being run from inittab...