• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 639
  • Last Modified:

killing processes when kill -9 does not work

I'd like to know how can processes be killed when commands like kill, kill -9 and killall don't work. I have this problem with a 'rm' command executed by cron. I execute 'kill -9 pid' but the pid remains there. I have also changed the priority of the process, but what I actually want is to remove it.
 
0
foron
Asked:
foron
1 Solution
 
jlevieCommented:
If "kill -9" can't get rid of the process, then it's trully hosed and your only recourse is to reboot.
0
 
chris_calabreseCommented:
I'd like to elaborate a bit here...when a process hangs like this it's because it's in a non-interruptable system call.  This type of thing is most common when the process is waiting for some kind of disk IO on a filesystem that's hosed in some way or is on a stale NFS mount.
0
 
chris_calabreseCommented:
Oh yeah, the way to find out exactly what's happening is to do a system call trace on the process.  I don't know what's out there in the Linux world to do this, but in the commercial Unix's the tool you'd use would be truss, trace, or tusc (depending on the flavor of Unix.).
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
bernardhCommented:
one alternative is the fuser command, try fuser -k /dev/device_name (e.g. /dev/tty0)
0
 
chris_calabreseCommented:
Since we already know which process is hung, fuser probably won't help too much here.

lsof (ftp://vic.cc.purdue.edu/pub/tools/unix/lsof) can be a little more helpful as it will tell you what files the hung process has open.  But even that will often not tell you what's really going on (if the open itself is hanging, for instance, it won't show up in lsof).

The only sure way is through a system call trace.
0
 
bernardhCommented:
fuser -k /dev/tty0 (for example) will terminate all of the processes using a given file system, or device. it works all the time. fuser only shows you what file is openened by a particular process or program. it doesn't help you kill a hung process. on other unix flavors, there are commands like /usr/lbin/tty/stty-cxma flush ttyX or /usr/sbin/strreset -M ## -m ## but i don't think they exist on linux.
0
 
aaryalCommented:
you can trace system calls with strace. look up man page for the details.
on a side note:
it's cousin is the ltrace which does library call trace (for dynamic libs)

i think you can do strace -e trace=<PID>
to trace a running process.
0
 
bernardhCommented:
"fuser only shows you what file is opened by a particular process              or program. it doesn't help you kill a hung process." lsof, i meant.
0
 
chris_calabreseCommented:
lsof is more useful than fuser, but as I pointed out above still isn't sufficient in many situations.  The system call trace is definitely the way to go.  As aaryal pointed out, you can do a system call trace under Linux with strace.

If you could split point in this system, I'd say give half to aaryal.  But since you can't and I'm greedy, I won't ;-)
0
 
bernardhCommented:
what was being asked here is HOW TO REMOVE the pid of a process that has been terminated with the kill -9 command, not how to trace system calls or find out which file was opened by which process, blah-blah-blah...in short an alternative to the kill -9 command...
0
 
chris_calabreseCommented:
According to the fuser manual page at http://www.kashpureff.org/nic/linux/man.shtml/fuser(1) and also the same on my HP-UX 10.20 box, fuser -k calls kill -9; therefore we can conclude that fuser -k can not kill any process that kill -9 cannot.

If something is in a state that kill -9 can not deal with, it's because it's blocked at a very low level in the kernel, and the only thing you can do is figure out what's blocking it.

If it's a hung NFS mountpoint, you might be able to remount it.  If it's a tape that's shoeshining, you may be able to eject the tape.  If it's a filesystem that's plain hanging, you need kernel patches to fix the hang.  Etc.

However, you need to first find out what it's hanging on.  fuser may be helpful here.  lsof will be more useful.  There are circumstances where only a system call trace (and therefore strace) will do.  Of course, you need to trace it before it hangs, but it it's something running out of cron that hangs every time....
0
 
foronAuthor Commented:
What I want to do is not to trace the system call that hung the process. What I want to do is to kill the process. If you need more info, the process is using a NFS partition. Chris_calabrese answered that I should remount the NFS partition, but I have other processes running on it.
0
 
chris_calabreseCommented:
If the process is hanging on a call to an NFS mounted file, my guess would be that the entire NFS mountpoint is dead and that other processes attempting to access it will hang too.  However, you're going to need the system call trace (strace) to figure out if it's really an NFS problem.
0
 
bernardhCommented:
just get straight to the point, unmount the file system then try to stop and restart the nfs daemons using: killall -HUP rpc.nfsd rpc.mountd or /etc/rc.d/init.d/nfs stop; /etc/rc.d/init.d/nfs start, then remount the filesystem
0

Featured Post

Receive 1:1 tech help

Solve your biggest tech problems alongside global tech experts with 1:1 help.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now