Wait state of CPU

The top command on our RHEL4 server shows our CPU in WA (wait) state for about 50% of the time. I guess this is caused by one of several NFS mounted volumes. How do I determine exactly which device and which of the user processes causing this wait state? BTW, why CPU should be waiting at all in an interrupt driven system?

Vinod
vinodAsked:
Who is Participating?
 
ravenplConnect With a Mentor Commented:
I don't know the answer for the first Q
> BTW, why CPU should be waiting at all in an interrupt driven system?
OS is in wait state if there is pending I/O (disk/nfs access etc.) and no other task to schedule. In other words there is nothing to do(cpu is idle), cause we have to wait for I/O completion first.
0
 
omarfaridCommented:
Hi,

You can use iostat tool. Please see http://www.linuxcommand.org/man_pages/iostat1.html
0
 
TintinConnect With a Mentor Commented:
Run

iostat -x 5

And let it run for a little while, checking the the last three columns:


             await
                     The  average  time  (in  milliseconds) for I/O requests issued to the device to be served.                        This includes the time spent by the requests in queue and the time spent servicing them.

              svctm
                     The average service time (in milliseconds) for I/O requests that were issued to the device.

              %util
                     Percentage of CPU time during which I/O requests were issued to the device (bandwidth  utilization  for  the device). Device saturation occurs when this value is close to 100%.

0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
vinodAuthor Commented:
What should I make out of the following output? It does not say anything about NFS volumes. Does not tell me which PID is causing the near 100% iowait.

# iostat -x 5
Linux 2.6.9-55.0.6.ELsmp (myhost.princeton.edu)      2007-10-04

avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.25    0.00    0.40   99.35    0.00

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s  rkB/s  wkB/s avgrq-sz avgqu-sz await  svctm  %util
sda          0.00   1.80  0.00  0.60    0.00   19.16   0.00   9.58    32.00     0.00  1.33   1.33   0.08

avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.80    0.00    1.40   92.65    5.15

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s  rkB/s  wkB/s avgrq-sz avgqu-sz await  svctm  %util
sda          0.00   1.40  0.00 35.33    0.00  461.48   0.00 230.74    13.06     2.32  7.69   0.64   2.26
0
 
omarfaridConnect With a Mentor Commented:
Hi,

For NFS mounted shares, please use nfsstat tool.

See the output of:

nfsstat -c -n -m


The link below is the man page for nfsstat:

http://linux.die.net/man/8/nfsstat
0
 
vinodAuthor Commented:
nfsstat does not explain nearly 100% iowait. Niether iostat nor nfsstat gives any hint about which process is to blame to which I could send a kill. Our cluster has multiple NFS servers and multiple client nodes. All NFS volumes are mounted on all clients. When any NFS client/server is stuck, it affects all users on our mail login server which is only an NFS client. Simple commands like df hang for ever, ctrl-c does not work,  %iowait goes 100%, all users start screeming. I need to put the system is usable state without rebooting.
 
Vinod
0
 
omarfaridConnect With a Mentor Commented:
Hi,

It is better if you can monitor the nfs servers themselves. This will give you better idea and you can take faster action.

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.