• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 8027
  • Last Modified:

Wait state of CPU

The top command on our RHEL4 server shows our CPU in WA (wait) state for about 50% of the time. I guess this is caused by one of several NFS mounted volumes. How do I determine exactly which device and which of the user processes causing this wait state? BTW, why CPU should be waiting at all in an interrupt driven system?

Vinod
0
vinod
Asked:
vinod
4 Solutions
 
ravenplCommented:
I don't know the answer for the first Q
> BTW, why CPU should be waiting at all in an interrupt driven system?
OS is in wait state if there is pending I/O (disk/nfs access etc.) and no other task to schedule. In other words there is nothing to do(cpu is idle), cause we have to wait for I/O completion first.
0
 
omarfaridCommented:
Hi,

You can use iostat tool. Please see http://www.linuxcommand.org/man_pages/iostat1.html
0
 
TintinCommented:
Run

iostat -x 5

And let it run for a little while, checking the the last three columns:


             await
                     The  average  time  (in  milliseconds) for I/O requests issued to the device to be served.                        This includes the time spent by the requests in queue and the time spent servicing them.

              svctm
                     The average service time (in milliseconds) for I/O requests that were issued to the device.

              %util
                     Percentage of CPU time during which I/O requests were issued to the device (bandwidth  utilization  for  the device). Device saturation occurs when this value is close to 100%.

0
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

 
vinodAuthor Commented:
What should I make out of the following output? It does not say anything about NFS volumes. Does not tell me which PID is causing the near 100% iowait.

# iostat -x 5
Linux 2.6.9-55.0.6.ELsmp (myhost.princeton.edu)      2007-10-04

avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.25    0.00    0.40   99.35    0.00

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s  rkB/s  wkB/s avgrq-sz avgqu-sz await  svctm  %util
sda          0.00   1.80  0.00  0.60    0.00   19.16   0.00   9.58    32.00     0.00  1.33   1.33   0.08

avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.80    0.00    1.40   92.65    5.15

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s  rkB/s  wkB/s avgrq-sz avgqu-sz await  svctm  %util
sda          0.00   1.40  0.00 35.33    0.00  461.48   0.00 230.74    13.06     2.32  7.69   0.64   2.26
0
 
omarfaridCommented:
Hi,

For NFS mounted shares, please use nfsstat tool.

See the output of:

nfsstat -c -n -m


The link below is the man page for nfsstat:

http://linux.die.net/man/8/nfsstat
0
 
vinodAuthor Commented:
nfsstat does not explain nearly 100% iowait. Niether iostat nor nfsstat gives any hint about which process is to blame to which I could send a kill. Our cluster has multiple NFS servers and multiple client nodes. All NFS volumes are mounted on all clients. When any NFS client/server is stuck, it affects all users on our mail login server which is only an NFS client. Simple commands like df hang for ever, ctrl-c does not work,  %iowait goes 100%, all users start screeming. I need to put the system is usable state without rebooting.
 
Vinod
0
 
omarfaridCommented:
Hi,

It is better if you can monitor the nfs servers themselves. This will give you better idea and you can take faster action.

0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now