Load average high, caused by high I/O?

Posted on 2006-11-07
Last Modified: 2008-01-09
I've been struggling with this problem for many months now. One of our workhorse servers is seeing some very regular load average "spikes", where the box basically freezes for a couple seconds and when it unfreezes the load average has jumped to about twice as high.

For example, our linux box generally runs at a load average of 2-3. Every 5 minutes (it seems fairly regular), the box just hangs (am not able to send any command to it, nothing gets refreshed), and then in about 5 seconds the box unhangs and then the load average is at 5-6.

My current theory is that it is some sort of disk i/o bottleneck, hogging the HD which keeps processes from running, which leads to high load average (since that metric is based on how many processes are waiting to run). Using sar, I have correlated the hangs to 100% disk i/o util.

We have a lot of things that run on the system, mostly proprietary software that we built. I've tried to find the specific culprit for this, but have not been able to.

Some things that would help diagnose the problem would be a util that better told me what process is causing the high load average ('top' isn't cutting it), or something that told me which processes were doing the most disk writing.

I'm ssh'ing to the box, so I'm not directly on it, if that makes a difference.

If anyone has anything that could help me, I'd consider giving you my first born. Or tons of points. Whatever you'd prefer.
Question by:timdr
  • 7
  • 4
  • 3
  • +2
LVL 27

Expert Comment

ID: 17894806
Hello, timdr.
First of all 2-3 load average means you have not enough CPU power to run all processes on current machine. It's time to think about upgrade IF you have no 2xCPU already (then LA=2 is ok).
Next. You say disk i/o is high and no process hogging all CPU time. I see at least two possible reasons:
1) You have not enough memory and your kernel begins swapping (probably because of some memory leak).
You may check it, by looking to swap space usage AND by looking to VSZ (virtual memory size) and RSS (resident memory size) value. Just run in a script periodically:

ps -eopid,user,vsz,rss,pcpu,fname
swapon -s

2) You have enough memory, but some process/processes really writes many. That's may or may not be a reason, depending on your hardware.
If most I/O is performed by DMA controller you shouldn't have such problem.  Kernel drivers prefer to use DMA when possible and 100%
disk usage SHOULD NOT lead to high load average or high interrupt rate. So, if you have no swapping but still having high i/o and high
load average your drivers may be working in non-optimal mode. That may be (from my experience):
- when you have IDE controller working in PIO (not DMA) mode
- when your application performs flush() often
- when your filesystem is mounted in "sync" mode
Usual operation with just read() or write() syscalls shouldn't lead to high I/O+high LA. When you have I/O bus bottleneck when writing, you kernels' I/O buffers become full more often and most application will be in sleeping state (not running) and LA will raise.

Unfortunatly it's not easy to determine which process is performing most I/O (since buffer operations are quick and most job is done by drivers, not processes), but you may try to find which process is in 'D' state (uninterruptable sleep) with that command (endless loop, interruptable by ^C):
while true ; do ps -eopid,user,vsz,rss,pcpu,state,fname | grep " D "; done

Also read this articles:
It's about KProbe utility. It may be used to profile almost anything in your kernel and applications.
You should also understand how Linux kernel works, be familiar with C and understand how kernel works.
Very good (but not easy) book about Linux drivers interhals is here:

LVL 27

Expert Comment

ID: 17894818
misspelled once :-), should be: ... most application will be in sleeping state (not running) and LA will NOT raise

Expert Comment

ID: 17895410
Have you tried iostat to monitor your I/O usage?
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.


Author Comment

ID: 17900863
Thank you for the detailed suggestions. Some more info...

- Memory is not an issue, I have plenty of memory and I never see any swapping/thrashing going on.
- The system is a DL-385 (dual AMD Opteron 275, meaning 4 core's total).
- Based on the above, unlikely to be using PIO mode.
- Cannot find any reason it would be doing flush often, or mounted in sync mode.

Running your command to find D state processes, I saw that sendmail and kjournal in there all the time. Other propriatary processes did show up every once in a while. What does running this command tell me? Should I only look for processes that show up in this list constantly?

NorCal: That is how I tracked down that my IO utilization was high, using "sar -d 1 1000".
LVL 27

Expert Comment

ID: 17904767
> - Memory is not an issue, I have plenty of memory and I never see any swapping/thrashing going on.

> - The system is a DL-385 (dual AMD Opteron 275, meaning 4 core's total).
Not so good. Linux SMP kernel still not so stable as Solaris one. And kernel drivers can eat lot of CPU in spinlocks and race conditions. Now it's time to check your kernel version and modules (with versions) loaded into kernel. Please run 'uname -a' and 'lsmod'. Also you kernel (starting from 2.6.11) may be compiled with three possible preemption models, this may impact overall system performance. Please check which one are you using in your config file (if you have it) in kernel sources directory.

> - Based on the above, unlikely to be using PIO mode.
Yes, but. Windows always goes down to PIO when you have hard drive with errors. In Linux I don't know exact behaveour, but on disk with errors you will have great slow down also. Check your IDE HDD (I don't know is it SCSI or IDE) you may with mhdd boot floppy tool (scan command), then you will see slow/bad  regions on your HDD.

> -  Cannot find any reason it would be doing flush often, or mounted in sync mode.
Flushing is  completely up to your applications, they are either written with flush() or not. Mounting in sync mode is up to admin of that server.

May be sendmail is your problem? Is it importent to run it on this server. Can you stop it for test purposes?


Expert Comment

ID: 17909235
ps aux | awk '$8 ~ /D/ {print}'
LVL 11

Expert Comment

ID: 17911089
please post the output of

lsof -i -P

This will show you all processes which are bound to a port in the filesystem, and which ports the activity is on.

LVL 27

Expert Comment

ID: 17911987
timdr  also I'd like to see how exactly your CPU is loaded.
When heavy loaded, run top or sar and see what is a percent of CPU time between (user processes, system, interrupt handler).
All three are importent.

Author Comment

ID: 17918670
First of all, let me again say thank you to all you guys for help me with this. It is one annoying problem!

- The "ps aux | awk '$8 ~ /D/ {print}'" was very helpful, even moreso then the original "ps -eopid,user,vsz,rss,pcpu,state,fname" because it shows me more of the command line. So I'm using that one now to track this.

- The "lsof -i -P" command showed me exactly what I would have expected. This server makes a lot of https connections to some of our other servers, so I just saw a list of all of these connections. I can't post it because it may contain confidential information. But there were about 25 connections to different servers over port 443, plus one ssh connection.

- CPU utilization right after the load is spiked: Cpu(s): 71.4% us, 28.6% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
-- I copied and pasted this from top right after it unfroze (freeze lasts about 10 seconds), and the load average jumped from 2 to about 5.

- Right after it un-froze, the command "ps aux | awk '$8 ~ /D/ {print}'" (which I run every second to watch things) showed:
root       359  0.1  0.0     0    0 ?        D    Oct05  55:52 [kjournald]
timdr   12550  0.4  0.3 36052 31796 ?       D    12:56   0:35 perl
root      4627  0.0  0.0  8548 3088 ?        Ds   15:15   0:00 sendmail: idle

- (renamed to protect the innocent) is a process that runs, that does work on the system. I run about 60 copies of this process. This process connects to remove servers, then writes some data to disk, and logs its activity in a log file. Could this whole situation be caused by these processes writing to the same log file? It doesn't write that much data overall.

- To answer the other questions:
-- Sendmail is indeed important. I did make a change to it yesterday to try to improve this situation. I set SuperSafe=False in the sendmail configuration, which from what I've read helps keep down unnecessary I/O.
-- We are using a SCSI raid 5 disk.
-- Kernel: Linux 2.6.9-42.0.2.ELsmp #1 SMP Thu Aug 17 18:00:32 EDT 2006 i686 athlon i386 GNU/Linux
-- Was not able to run "lsmod", sure you spelled that right?

- Is there any way to check for bad disk blocks without taking the server down? Are bad blocks possible with raid? Are they not automatically detected?
LVL 11

Expert Comment

ID: 17919677
ps awfux | awk '$8 ~ /D/ {print}'

may give more verbose output.
LVL 11

Expert Comment

ID: 17919679
You might also wish to run chkrootkit on this.
LVL 11

Expert Comment

ID: 17919682
Oddly enough I wonder what your filesystems look like. Can you post the output of your fstab /etc/fstab?

I'm wondering if you have an inode issue, or that the system has a broken /proc.

Something is definately mysterious here.
LVL 27

Accepted Solution

Nopius earned 500 total points
ID: 17935240
timdr, hi again. I'm as your server, sometimes heavy loaded on my current job and nobody knows why this happens :-)

Now, back to your problem. Most importent are CPU states right after hangup:
Cpu(s): 71.4% us, 28.6% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si

You see very high system load (it's not normal), so the problem is _definitly_ in kernel or kernel drivers.
But this problem may also be provocated by non-optimized i/o from application.

Next by importance. You say you have 5 SCSI disks, but what scsi? Is it hardware or software? If it a hardware, how much is a cache?
If it's a software raid-5 then you probably should have very high system usage when performing bulk writes (raid-5 is very CPU intensive for software drivers).

Next. You have kernel 2.6.9, that's not good. I _highly_ recommend you to upgrade to the latest version.  
Because in 2.6.9 where used inefficient locking mechanism inside kernel drivers. So having 1 driver in
non-interraptable i/o state you will have 'hang' system even on two CPUs and even on SMP kernel (if it's below 2.6.11).

Please upgrade your kernel and see if problem dissappeares. Otherwise we can only guess which driver and why cause your problem. Yes, we will guess it, but later (on new kernel) :-)
LVL 27

Expert Comment

ID: 17935365
network drivers may be the reason of such load on high network traffic. Then most time your CPU is processing network traffic (inside network driver).
Can you measure network i/o also?  
And one more possible reason may be concurrent write to the same log file from multiple processes if they use some lock mechanism (if not it's hardly a problem). Probably better solution is serialization of your concurrent writes to pipe (which accepts only atomic writes of 512 bytes in Linux) and then read this pipe from third process, which will write data to logfile.

Author Comment

ID: 17940715
We are going to upgrade our box soon, so I guess until that day we'll just look for the file locking issues, and the network reads.

Thank you guys for all of you assistance!!!
LVL 27

Expert Comment

ID: 17942803
Please let me know if renlen update solve the problem.


Featured Post

Free Tool: Postgres Monitoring System

A PHP and Perl based system to collect and display usage statistics from PostgreSQL databases.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Daily system administration tasks often require administrators to connect remote systems. But allowing these remote systems to accept passwords makes these systems vulnerable to the risk of brute-force password guessing attacks. Furthermore there ar…
Using 'screen' for session sharing, The Simple Edition Step 1: user starts session with command: screen Step 2: other user (logged in with same user account) connects with command: screen -x Done. Both users are connected to the same CLI sessio…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…

685 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question