vinod
asked on
Wait state of CPU
The top command on our RHEL4 server shows our CPU in WA (wait) state for about 50% of the time. I guess this is caused by one of several NFS mounted volumes. How do I determine exactly which device and which of the user processes causing this wait state? BTW, why CPU should be waiting at all in an interrupt driven system?
Vinod
Vinod
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
What should I make out of the following output? It does not say anything about NFS volumes. Does not tell me which PID is causing the near 100% iowait.
# iostat -x 5
Linux 2.6.9-55.0.6.ELsmp (myhost.princeton.edu) 2007-10-04
avg-cpu: %user %nice %sys %iowait %idle
0.25 0.00 0.40 99.35 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 1.80 0.00 0.60 0.00 19.16 0.00 9.58 32.00 0.00 1.33 1.33 0.08
avg-cpu: %user %nice %sys %iowait %idle
0.80 0.00 1.40 92.65 5.15
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 1.40 0.00 35.33 0.00 461.48 0.00 230.74 13.06 2.32 7.69 0.64 2.26
# iostat -x 5
Linux 2.6.9-55.0.6.ELsmp (myhost.princeton.edu) 2007-10-04
avg-cpu: %user %nice %sys %iowait %idle
0.25 0.00 0.40 99.35 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 1.80 0.00 0.60 0.00 19.16 0.00 9.58 32.00 0.00 1.33 1.33 0.08
avg-cpu: %user %nice %sys %iowait %idle
0.80 0.00 1.40 92.65 5.15
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 1.40 0.00 35.33 0.00 461.48 0.00 230.74 13.06 2.32 7.69 0.64 2.26
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
nfsstat does not explain nearly 100% iowait. Niether iostat nor nfsstat gives any hint about which process is to blame to which I could send a kill. Our cluster has multiple NFS servers and multiple client nodes. All NFS volumes are mounted on all clients. When any NFS client/server is stuck, it affects all users on our mail login server which is only an NFS client. Simple commands like df hang for ever, ctrl-c does not work, %iowait goes 100%, all users start screeming. I need to put the system is usable state without rebooting.
Vinod
Vinod
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
You can use iostat tool. Please see http://www.linuxcommand.org/man_pages/iostat1.html