Solved

Load average high, caused by high I/O?

Posted on 2006-11-07
16
1,014 Views
Last Modified: 2008-01-09
I've been struggling with this problem for many months now. One of our workhorse servers is seeing some very regular load average "spikes", where the box basically freezes for a couple seconds and when it unfreezes the load average has jumped to about twice as high.

For example, our linux box generally runs at a load average of 2-3. Every 5 minutes (it seems fairly regular), the box just hangs (am not able to send any command to it, nothing gets refreshed), and then in about 5 seconds the box unhangs and then the load average is at 5-6.

My current theory is that it is some sort of disk i/o bottleneck, hogging the HD which keeps processes from running, which leads to high load average (since that metric is based on how many processes are waiting to run). Using sar, I have correlated the hangs to 100% disk i/o util.

We have a lot of things that run on the system, mostly proprietary software that we built. I've tried to find the specific culprit for this, but have not been able to.

Some things that would help diagnose the problem would be a util that better told me what process is causing the high load average ('top' isn't cutting it), or something that told me which processes were doing the most disk writing.

I'm ssh'ing to the box, so I'm not directly on it, if that makes a difference.

If anyone has anything that could help me, I'd consider giving you my first born. Or tons of points. Whatever you'd prefer.
0
Comment
Question by:timdr
  • 7
  • 4
  • 3
  • +2
16 Comments
 
LVL 27

Expert Comment

by:Nopius
Comment Utility
Hello, timdr.
First of all 2-3 load average means you have not enough CPU power to run all processes on current machine. It's time to think about upgrade IF you have no 2xCPU already (then LA=2 is ok).
Next. You say disk i/o is high and no process hogging all CPU time. I see at least two possible reasons:
1) You have not enough memory and your kernel begins swapping (probably because of some memory leak).
You may check it, by looking to swap space usage AND by looking to VSZ (virtual memory size) and RSS (resident memory size) value. Just run in a script periodically:

ps -eopid,user,vsz,rss,pcpu,fname
swapon -s


2) You have enough memory, but some process/processes really writes many. That's may or may not be a reason, depending on your hardware.
If most I/O is performed by DMA controller you shouldn't have such problem.  Kernel drivers prefer to use DMA when possible and 100%
disk usage SHOULD NOT lead to high load average or high interrupt rate. So, if you have no swapping but still having high i/o and high
load average your drivers may be working in non-optimal mode. That may be (from my experience):
- when you have IDE controller working in PIO (not DMA) mode
- when your application performs flush() often
- when your filesystem is mounted in "sync" mode
Usual operation with just read() or write() syscalls shouldn't lead to high I/O+high LA. When you have I/O bus bottleneck when writing, you kernels' I/O buffers become full more often and most application will be in sleeping state (not running) and LA will raise.

Unfortunatly it's not easy to determine which process is performing most I/O (since buffer operations are quick and most job is done by drivers, not processes), but you may try to find which process is in 'D' state (uninterruptable sleep) with that command (endless loop, interruptable by ^C):
while true ; do ps -eopid,user,vsz,rss,pcpu,state,fname | grep " D "; done

Also read this articles:
http://www-128.ibm.com/developerworks/library/l-kprobes.html?ca=dgr-lnxw42Kprobe
http://uadmin.blogspot.com/2005/08/dtrace-equivalent-for-linux-only.html
It's about KProbe utility. It may be used to profile almost anything in your kernel and applications.
You should also understand how Linux kernel works, be familiar with C and understand how kernel works.
Very good (but not easy) book about Linux drivers interhals is here: http://lwn.net/Kernel/LDD3/






0
 
LVL 27

Expert Comment

by:Nopius
Comment Utility
misspelled once :-), should be: ... most application will be in sleeping state (not running) and LA will NOT raise
0
 
LVL 4

Expert Comment

by:NorCal2612
Comment Utility
Have you tried iostat to monitor your I/O usage?

http://www.adminschoice.com/docs/iostat_vmstat_netstat.htm
0
 
LVL 1

Author Comment

by:timdr
Comment Utility
Thank you for the detailed suggestions. Some more info...

- Memory is not an issue, I have plenty of memory and I never see any swapping/thrashing going on.
- The system is a DL-385 (dual AMD Opteron 275, meaning 4 core's total).
- Based on the above, unlikely to be using PIO mode.
- Cannot find any reason it would be doing flush often, or mounted in sync mode.

Running your command to find D state processes, I saw that sendmail and kjournal in there all the time. Other propriatary processes did show up every once in a while. What does running this command tell me? Should I only look for processes that show up in this list constantly?

NorCal: That is how I tracked down that my IO utilization was high, using "sar -d 1 1000".
0
 
LVL 27

Expert Comment

by:Nopius
Comment Utility
> - Memory is not an issue, I have plenty of memory and I never see any swapping/thrashing going on.
Good.

> - The system is a DL-385 (dual AMD Opteron 275, meaning 4 core's total).
Not so good. Linux SMP kernel still not so stable as Solaris one. And kernel drivers can eat lot of CPU in spinlocks and race conditions. Now it's time to check your kernel version and modules (with versions) loaded into kernel. Please run 'uname -a' and 'lsmod'. Also you kernel (starting from 2.6.11) may be compiled with three possible preemption models, this may impact overall system performance. Please check which one are you using in your config file (if you have it) in kernel sources directory.

> - Based on the above, unlikely to be using PIO mode.
Yes, but. Windows always goes down to PIO when you have hard drive with errors. In Linux I don't know exact behaveour, but on disk with errors you will have great slow down also. Check your IDE HDD (I don't know is it SCSI or IDE) you may with mhdd boot floppy tool (scan command), then you will see slow/bad  regions on your HDD.

> -  Cannot find any reason it would be doing flush often, or mounted in sync mode.
Flushing is  completely up to your applications, they are either written with flush() or not. Mounting in sync mode is up to admin of that server.

May be sendmail is your problem? Is it importent to run it on this server. Can you stop it for test purposes?

0
 
LVL 3

Expert Comment

by:bryanlloydharris
Comment Utility
ps aux | awk '$8 ~ /D/ {print}'
0
 
LVL 11

Expert Comment

by:kblack05
Comment Utility
please post the output of

lsof -i -P

This will show you all processes which are bound to a port in the filesystem, and which ports the activity is on.

0
 
LVL 27

Expert Comment

by:Nopius
Comment Utility
timdr  also I'd like to see how exactly your CPU is loaded.
When heavy loaded, run top or sar and see what is a percent of CPU time between (user processes, system, interrupt handler).
All three are importent.
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 1

Author Comment

by:timdr
Comment Utility
First of all, let me again say thank you to all you guys for help me with this. It is one annoying problem!

- The "ps aux | awk '$8 ~ /D/ {print}'" was very helpful, even moreso then the original "ps -eopid,user,vsz,rss,pcpu,state,fname" because it shows me more of the command line. So I'm using that one now to track this.

- The "lsof -i -P" command showed me exactly what I would have expected. This server makes a lot of https connections to some of our other servers, so I just saw a list of all of these connections. I can't post it because it may contain confidential information. But there were about 25 connections to different servers over port 443, plus one ssh connection.

- CPU utilization right after the load is spiked: Cpu(s): 71.4% us, 28.6% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
-- I copied and pasted this from top right after it unfroze (freeze lasts about 10 seconds), and the load average jumped from 2 to about 5.

- Right after it un-froze, the command "ps aux | awk '$8 ~ /D/ {print}'" (which I run every second to watch things) showed:
root       359  0.1  0.0     0    0 ?        D    Oct05  55:52 [kjournald]
timdr   12550  0.4  0.3 36052 31796 ?       D    12:56   0:35 perl myapp.pl
root      4627  0.0  0.0  8548 3088 ?        Ds   15:15   0:00 sendmail: mail.mysite.com.: idle

- myapp.pl (renamed to protect the innocent) is a process that runs, that does work on the system. I run about 60 copies of this process. This process connects to remove servers, then writes some data to disk, and logs its activity in a log file. Could this whole situation be caused by these processes writing to the same log file? It doesn't write that much data overall.

- To answer the other questions:
-- Sendmail is indeed important. I did make a change to it yesterday to try to improve this situation. I set SuperSafe=False in the sendmail configuration, which from what I've read helps keep down unnecessary I/O.
-- We are using a SCSI raid 5 disk.
-- Kernel: Linux www.mysite.com 2.6.9-42.0.2.ELsmp #1 SMP Thu Aug 17 18:00:32 EDT 2006 i686 athlon i386 GNU/Linux
-- Was not able to run "lsmod", sure you spelled that right?

- Is there any way to check for bad disk blocks without taking the server down? Are bad blocks possible with raid? Are they not automatically detected?
0
 
LVL 11

Expert Comment

by:kblack05
Comment Utility
ps awfux | awk '$8 ~ /D/ {print}'

may give more verbose output.
0
 
LVL 11

Expert Comment

by:kblack05
Comment Utility
You might also wish to run chkrootkit on this.

http://www.chkrootkit.org/
0
 
LVL 11

Expert Comment

by:kblack05
Comment Utility
Oddly enough I wonder what your filesystems look like. Can you post the output of your fstab /etc/fstab?

I'm wondering if you have an inode issue, or that the system has a broken /proc.

Something is definately mysterious here.
0
 
LVL 27

Accepted Solution

by:
Nopius earned 500 total points
Comment Utility
timdr, hi again. I'm as your server, sometimes heavy loaded on my current job and nobody knows why this happens :-)

Now, back to your problem. Most importent are CPU states right after hangup:
Cpu(s): 71.4% us, 28.6% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si

You see very high system load (it's not normal), so the problem is _definitly_ in kernel or kernel drivers.
But this problem may also be provocated by non-optimized i/o from application.

Next by importance. You say you have 5 SCSI disks, but what scsi? Is it hardware or software? If it a hardware, how much is a cache?
If it's a software raid-5 then you probably should have very high system usage when performing bulk writes (raid-5 is very CPU intensive for software drivers).

Next. You have kernel 2.6.9, that's not good. I _highly_ recommend you to upgrade to the latest version.  
Because in 2.6.9 where used inefficient locking mechanism inside kernel drivers. So having 1 driver in
non-interraptable i/o state you will have 'hang' system even on two CPUs and even on SMP kernel (if it's below 2.6.11).

Please upgrade your kernel and see if problem dissappeares. Otherwise we can only guess which driver and why cause your problem. Yes, we will guess it, but later (on new kernel) :-)
0
 
LVL 27

Expert Comment

by:Nopius
Comment Utility
network drivers may be the reason of such load on high network traffic. Then most time your CPU is processing network traffic (inside network driver).
Can you measure network i/o also?  
And one more possible reason may be concurrent write to the same log file from multiple processes if they use some lock mechanism (if not it's hardly a problem). Probably better solution is serialization of your concurrent writes to pipe (which accepts only atomic writes of 512 bytes in Linux) and then read this pipe from third process, which will write data to logfile.
0
 
LVL 1

Author Comment

by:timdr
Comment Utility
We are going to upgrade our box soon, so I guess until that day we'll just look for the file locking issues, and the network reads.

Thank you guys for all of you assistance!!!
0
 
LVL 27

Expert Comment

by:Nopius
Comment Utility
Please let me know if renlen update solve the problem.

CU L8R
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Network Interface Card (NIC) bonding, also known as link aggregation, NIC teaming and trunking, is an important concept to understand and implement in any environment where high availability is of concern. Using this feature, a server administrator …
I am a long time windows user and for me it is normal to have spaces in directory and file names. Changing to Linux I found myself frustrated when I moved my windows data over to my new Linux computer. The problem occurs when at the command line.…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now