Solved

Tracking which Unix process pid just crashed

Posted on 2010-09-05
3
711 Views
Last Modified: 2012-05-10

I got a monitoring alert that says a Unix process just crashed but the alert did not
specify the pid or process name that just crashed.

Is there any way to find out?

What about the directory /usr/ucb/... : does it hold any clue

I recall in Linux  /var/run   there are *.pid files that hold the pid of processes.
If a process was abrupted terminated or manually "killed", does the pid file
stays behind?  I thought of going thru one by one the .pid files to see the pid
& check which ones are no longer found in "ps -ef"

/var/log/messages did not give any clue

any good Shell script / command to check this easily would be most welcome
as well
0
Comment
Question by:sunhux
3 Comments
 

Author Comment

by:sunhux
ID: 33608897


What does the date stamp of those /var/run/*.pid files mean?
0
 
LVL 6

Accepted Solution

by:
apresence earned 470 total points
ID: 33608966
Not all applications write the pid files.  But, you are right, the pid files are usually deleted when a process exits normally.  The date stamp (mtime) on the /var/run/*.pid files is the last time the process was started.

If you want to check those pid files to see if any of their processes are missing, the attached code will do it for ya.

Sample output (I create a theoretical testproc.pid file with a pid that doesn't exist for testing):
root@beta:~/exex/test9 $ echo 999 >/var/run/testproc.pid
root@beta:~/exex/test9 $ ./show_missing_pids.sh
PID 3146 (/var/run/atd.pid): RUNNING
PID 2334 (/var/run/auditd.pid): RUNNING
PID 3015 (/var/run/crond.pid): RUNNING
...
PID 999 (/var/run/testproc.pid): NOT RUNNING
...
root@beta:~/exex/test9 $
#!/bin/sh

for i in `ls /var/run/*.pid`; do

  actual_pid=`perl -ne 'print "$1\n" if /^(\d+)/' < $i`

  if [ -n "$actual_pid" ]; then

    ps -p $actual_pid >/dev/null 2>&1

    if [ $? -eq 0 ]; then

      echo "PID $actual_pid ($i): RUNNING"

    else

      echo "PID $actual_pid ($i): NOT RUNNING"

    fi

  fi

done

Open in new window

0
 
LVL 61

Assisted Solution

by:gheist
gheist earned 30 total points
ID: 33610611
do you have any log or boot message entry confirming a process crash?
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When you do backups in the Solaris Operating System, the file system must be inactive. Otherwise, the output may be inconsistent. A file system is inactive when it's unmounted or it's write-locked by the operating system. Although the fssnap utility…
Let's say you need to move the data of a file system from one partition to another. This generally involves dismounting the file system, backing it up to tapes, and restoring it to a new partition. You may also copy the file system from one place to…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

932 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now