Link to home
Start Free TrialLog in
Avatar of vinod
vinodFlag for United States of America

asked on

Wait state of CPU

The top command on our RHEL4 server shows our CPU in WA (wait) state for about 50% of the time. I guess this is caused by one of several NFS mounted volumes. How do I determine exactly which device and which of the user processes causing this wait state? BTW, why CPU should be waiting at all in an interrupt driven system?

Vinod
ASKER CERTIFIED SOLUTION
Avatar of ravenpl
ravenpl
Flag of Poland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of omarfarid
Hi,

You can use iostat tool. Please see http://www.linuxcommand.org/man_pages/iostat1.html
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of vinod

ASKER

What should I make out of the following output? It does not say anything about NFS volumes. Does not tell me which PID is causing the near 100% iowait.

# iostat -x 5
Linux 2.6.9-55.0.6.ELsmp (myhost.princeton.edu)      2007-10-04

avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.25    0.00    0.40   99.35    0.00

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s  rkB/s  wkB/s avgrq-sz avgqu-sz await  svctm  %util
sda          0.00   1.80  0.00  0.60    0.00   19.16   0.00   9.58    32.00     0.00  1.33   1.33   0.08

avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.80    0.00    1.40   92.65    5.15

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s  rkB/s  wkB/s avgrq-sz avgqu-sz await  svctm  %util
sda          0.00   1.40  0.00 35.33    0.00  461.48   0.00 230.74    13.06     2.32  7.69   0.64   2.26
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of vinod

ASKER

nfsstat does not explain nearly 100% iowait. Niether iostat nor nfsstat gives any hint about which process is to blame to which I could send a kill. Our cluster has multiple NFS servers and multiple client nodes. All NFS volumes are mounted on all clients. When any NFS client/server is stuck, it affects all users on our mail login server which is only an NFS client. Simple commands like df hang for ever, ctrl-c does not work,  %iowait goes 100%, all users start screeming. I need to put the system is usable state without rebooting.
 
Vinod
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial