Solved

Load average high, caused by high I/O?

Posted on 2006-11-07
16
1,039 Views
Last Modified: 2008-01-09
I've been struggling with this problem for many months now. One of our workhorse servers is seeing some very regular load average "spikes", where the box basically freezes for a couple seconds and when it unfreezes the load average has jumped to about twice as high.

For example, our linux box generally runs at a load average of 2-3. Every 5 minutes (it seems fairly regular), the box just hangs (am not able to send any command to it, nothing gets refreshed), and then in about 5 seconds the box unhangs and then the load average is at 5-6.

My current theory is that it is some sort of disk i/o bottleneck, hogging the HD which keeps processes from running, which leads to high load average (since that metric is based on how many processes are waiting to run). Using sar, I have correlated the hangs to 100% disk i/o util.

We have a lot of things that run on the system, mostly proprietary software that we built. I've tried to find the specific culprit for this, but have not been able to.

Some things that would help diagnose the problem would be a util that better told me what process is causing the high load average ('top' isn't cutting it), or something that told me which processes were doing the most disk writing.

I'm ssh'ing to the box, so I'm not directly on it, if that makes a difference.

If anyone has anything that could help me, I'd consider giving you my first born. Or tons of points. Whatever you'd prefer.
0
Comment
Question by:timdr
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 4
  • 3
  • +2
16 Comments
 
LVL 27

Expert Comment

by:Nopius
ID: 17894806
Hello, timdr.
First of all 2-3 load average means you have not enough CPU power to run all processes on current machine. It's time to think about upgrade IF you have no 2xCPU already (then LA=2 is ok).
Next. You say disk i/o is high and no process hogging all CPU time. I see at least two possible reasons:
1) You have not enough memory and your kernel begins swapping (probably because of some memory leak).
You may check it, by looking to swap space usage AND by looking to VSZ (virtual memory size) and RSS (resident memory size) value. Just run in a script periodically:

ps -eopid,user,vsz,rss,pcpu,fname
swapon -s


2) You have enough memory, but some process/processes really writes many. That's may or may not be a reason, depending on your hardware.
If most I/O is performed by DMA controller you shouldn't have such problem.  Kernel drivers prefer to use DMA when possible and 100%
disk usage SHOULD NOT lead to high load average or high interrupt rate. So, if you have no swapping but still having high i/o and high
load average your drivers may be working in non-optimal mode. That may be (from my experience):
- when you have IDE controller working in PIO (not DMA) mode
- when your application performs flush() often
- when your filesystem is mounted in "sync" mode
Usual operation with just read() or write() syscalls shouldn't lead to high I/O+high LA. When you have I/O bus bottleneck when writing, you kernels' I/O buffers become full more often and most application will be in sleeping state (not running) and LA will raise.

Unfortunatly it's not easy to determine which process is performing most I/O (since buffer operations are quick and most job is done by drivers, not processes), but you may try to find which process is in 'D' state (uninterruptable sleep) with that command (endless loop, interruptable by ^C):
while true ; do ps -eopid,user,vsz,rss,pcpu,state,fname | grep " D "; done

Also read this articles:
http://www-128.ibm.com/developerworks/library/l-kprobes.html?ca=dgr-lnxw42Kprobe
http://uadmin.blogspot.com/2005/08/dtrace-equivalent-for-linux-only.html
It's about KProbe utility. It may be used to profile almost anything in your kernel and applications.
You should also understand how Linux kernel works, be familiar with C and understand how kernel works.
Very good (but not easy) book about Linux drivers interhals is here: http://lwn.net/Kernel/LDD3/






0
 
LVL 27

Expert Comment

by:Nopius
ID: 17894818
misspelled once :-), should be: ... most application will be in sleeping state (not running) and LA will NOT raise
0
 
LVL 4

Expert Comment

by:NorCal2612
ID: 17895410
Have you tried iostat to monitor your I/O usage?

http://www.adminschoice.com/docs/iostat_vmstat_netstat.htm
0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 
LVL 1

Author Comment

by:timdr
ID: 17900863
Thank you for the detailed suggestions. Some more info...

- Memory is not an issue, I have plenty of memory and I never see any swapping/thrashing going on.
- The system is a DL-385 (dual AMD Opteron 275, meaning 4 core's total).
- Based on the above, unlikely to be using PIO mode.
- Cannot find any reason it would be doing flush often, or mounted in sync mode.

Running your command to find D state processes, I saw that sendmail and kjournal in there all the time. Other propriatary processes did show up every once in a while. What does running this command tell me? Should I only look for processes that show up in this list constantly?

NorCal: That is how I tracked down that my IO utilization was high, using "sar -d 1 1000".
0
 
LVL 27

Expert Comment

by:Nopius
ID: 17904767
> - Memory is not an issue, I have plenty of memory and I never see any swapping/thrashing going on.
Good.

> - The system is a DL-385 (dual AMD Opteron 275, meaning 4 core's total).
Not so good. Linux SMP kernel still not so stable as Solaris one. And kernel drivers can eat lot of CPU in spinlocks and race conditions. Now it's time to check your kernel version and modules (with versions) loaded into kernel. Please run 'uname -a' and 'lsmod'. Also you kernel (starting from 2.6.11) may be compiled with three possible preemption models, this may impact overall system performance. Please check which one are you using in your config file (if you have it) in kernel sources directory.

> - Based on the above, unlikely to be using PIO mode.
Yes, but. Windows always goes down to PIO when you have hard drive with errors. In Linux I don't know exact behaveour, but on disk with errors you will have great slow down also. Check your IDE HDD (I don't know is it SCSI or IDE) you may with mhdd boot floppy tool (scan command), then you will see slow/bad  regions on your HDD.

> -  Cannot find any reason it would be doing flush often, or mounted in sync mode.
Flushing is  completely up to your applications, they are either written with flush() or not. Mounting in sync mode is up to admin of that server.

May be sendmail is your problem? Is it importent to run it on this server. Can you stop it for test purposes?

0
 
LVL 3

Expert Comment

by:bryanlloydharris
ID: 17909235
ps aux | awk '$8 ~ /D/ {print}'
0
 
LVL 11

Expert Comment

by:kblack05
ID: 17911089
please post the output of

lsof -i -P

This will show you all processes which are bound to a port in the filesystem, and which ports the activity is on.

0
 
LVL 27

Expert Comment

by:Nopius
ID: 17911987
timdr  also I'd like to see how exactly your CPU is loaded.
When heavy loaded, run top or sar and see what is a percent of CPU time between (user processes, system, interrupt handler).
All three are importent.
0
 
LVL 1

Author Comment

by:timdr
ID: 17918670
First of all, let me again say thank you to all you guys for help me with this. It is one annoying problem!

- The "ps aux | awk '$8 ~ /D/ {print}'" was very helpful, even moreso then the original "ps -eopid,user,vsz,rss,pcpu,state,fname" because it shows me more of the command line. So I'm using that one now to track this.

- The "lsof -i -P" command showed me exactly what I would have expected. This server makes a lot of https connections to some of our other servers, so I just saw a list of all of these connections. I can't post it because it may contain confidential information. But there were about 25 connections to different servers over port 443, plus one ssh connection.

- CPU utilization right after the load is spiked: Cpu(s): 71.4% us, 28.6% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
-- I copied and pasted this from top right after it unfroze (freeze lasts about 10 seconds), and the load average jumped from 2 to about 5.

- Right after it un-froze, the command "ps aux | awk '$8 ~ /D/ {print}'" (which I run every second to watch things) showed:
root       359  0.1  0.0     0    0 ?        D    Oct05  55:52 [kjournald]
timdr   12550  0.4  0.3 36052 31796 ?       D    12:56   0:35 perl myapp.pl
root      4627  0.0  0.0  8548 3088 ?        Ds   15:15   0:00 sendmail: mail.mysite.com.: idle

- myapp.pl (renamed to protect the innocent) is a process that runs, that does work on the system. I run about 60 copies of this process. This process connects to remove servers, then writes some data to disk, and logs its activity in a log file. Could this whole situation be caused by these processes writing to the same log file? It doesn't write that much data overall.

- To answer the other questions:
-- Sendmail is indeed important. I did make a change to it yesterday to try to improve this situation. I set SuperSafe=False in the sendmail configuration, which from what I've read helps keep down unnecessary I/O.
-- We are using a SCSI raid 5 disk.
-- Kernel: Linux www.mysite.com 2.6.9-42.0.2.ELsmp #1 SMP Thu Aug 17 18:00:32 EDT 2006 i686 athlon i386 GNU/Linux
-- Was not able to run "lsmod", sure you spelled that right?

- Is there any way to check for bad disk blocks without taking the server down? Are bad blocks possible with raid? Are they not automatically detected?
0
 
LVL 11

Expert Comment

by:kblack05
ID: 17919677
ps awfux | awk '$8 ~ /D/ {print}'

may give more verbose output.
0
 
LVL 11

Expert Comment

by:kblack05
ID: 17919679
You might also wish to run chkrootkit on this.

http://www.chkrootkit.org/
0
 
LVL 11

Expert Comment

by:kblack05
ID: 17919682
Oddly enough I wonder what your filesystems look like. Can you post the output of your fstab /etc/fstab?

I'm wondering if you have an inode issue, or that the system has a broken /proc.

Something is definately mysterious here.
0
 
LVL 27

Accepted Solution

by:
Nopius earned 500 total points
ID: 17935240
timdr, hi again. I'm as your server, sometimes heavy loaded on my current job and nobody knows why this happens :-)

Now, back to your problem. Most importent are CPU states right after hangup:
Cpu(s): 71.4% us, 28.6% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si

You see very high system load (it's not normal), so the problem is _definitly_ in kernel or kernel drivers.
But this problem may also be provocated by non-optimized i/o from application.

Next by importance. You say you have 5 SCSI disks, but what scsi? Is it hardware or software? If it a hardware, how much is a cache?
If it's a software raid-5 then you probably should have very high system usage when performing bulk writes (raid-5 is very CPU intensive for software drivers).

Next. You have kernel 2.6.9, that's not good. I _highly_ recommend you to upgrade to the latest version.  
Because in 2.6.9 where used inefficient locking mechanism inside kernel drivers. So having 1 driver in
non-interraptable i/o state you will have 'hang' system even on two CPUs and even on SMP kernel (if it's below 2.6.11).

Please upgrade your kernel and see if problem dissappeares. Otherwise we can only guess which driver and why cause your problem. Yes, we will guess it, but later (on new kernel) :-)
0
 
LVL 27

Expert Comment

by:Nopius
ID: 17935365
network drivers may be the reason of such load on high network traffic. Then most time your CPU is processing network traffic (inside network driver).
Can you measure network i/o also?  
And one more possible reason may be concurrent write to the same log file from multiple processes if they use some lock mechanism (if not it's hardly a problem). Probably better solution is serialization of your concurrent writes to pipe (which accepts only atomic writes of 512 bytes in Linux) and then read this pipe from third process, which will write data to logfile.
0
 
LVL 1

Author Comment

by:timdr
ID: 17940715
We are going to upgrade our box soon, so I guess until that day we'll just look for the file locking issues, and the network reads.

Thank you guys for all of you assistance!!!
0
 
LVL 27

Expert Comment

by:Nopius
ID: 17942803
Please let me know if renlen update solve the problem.

CU L8R
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This is the error message I got (CODE) Error caused by incompatible libmp3lame 3.98-2 with ffmpeg I've googled this error message and found out sometimes it attaches this note "can be treated with downgrade libmp3lame to version 3.97 or 3.98" …
How many times have you wanted to quickly do the same thing to a list but found yourself typing it again and again? I first figured out a small time saver with the up arrow to recall the last command but that can only get you so far if you have a bi…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question