How do I find the cause of high load on Centos 5.4

Server is running Centos 5.4 (final) with 8Gb ram 2 x Core udo 2.8Ghz software raid configured.

Every 1 hour the loading spikes from 0.16 up to 4.04 and locks up everything for around 5-65 minutes. There is no noticable change in CPU or memory just the increase in loading. "top" does not reveal which app is causing the problem.

Any ideas on how to identify the culprit?
chrisk61Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ClintSwineyCommented:
Look at the cron jobs that are set to execute hourly. Disable them one by one and see if the problem goes away.
0
chrisk61Author Commented:
Hi thanks - this is all in crontab -e:
There's nothing hourly

*/5     *       *       *       *       /usr/bin/srvmonitor >/dev/null 2>&1
57      3       *       *       *       /usr/local/xxxxxxx/bin/notifications.sh >/dev/null 2>&1
14      4       *       *       *       /usr/local/xxxxxxx/bin/suspend.sh >/dev/null 2>&1
57      4       *       *       *       /usr/local/xxxxxxx/admin/sbin/run_updater >/dev/null 2>&1
15      5       *       *       *       /usr/local/xxxxxxx/bin/dbdump >/dev/null 2>&1
12      6       *       *       *       /usr/local/xxxxxxx/admin/sbin/rmvoicefax >/dev/null 2>&1
*/2     *       *       *       *       /usr/local/xxxxxxx/livemonitor/bin/livemonitor >/dev/null 2>&1
14      5       *       *       *       /usr/local/xxxxxxx/bin/cleantmp.sh >/dev/null 2>&1
01      6       *       *       *       /usr/local/xxxxxxx/admin/sbin/mngquota --action set --all --quiet >/dev/null 2>&1
*/5     *       *       *       *       /usr/local/xxxxxxx/bin/faxpreapproved.sh >/dev/null 2>&1
0
pawwaCommented:
Also look for those processes that could produce a lot of disk activity (maybe some log parsing, webalizer or whatever).

With "top" you can look at "%wa" which shows you what percentage of time is your CPU idle because of extensive IO requests, thus getting the load high.
0
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

chrisk61Author Commented:
Just checked %wa it ranges from 25% to 45% when the loading spikes, it is back down to 0.0% now.
0
deepak_iqCommented:

Download the attached nmon.zip to your linux system.

Unzip it.

It contains two files.

Copy those files to /usr/local/bin
#chmod 700 nmon start_nmon

Add an entry in crontab:
1 0 * * *       /usr/local/bin/start_nmon 300 300

#/usr/local/bin/start_nmon 300 300

If the server reboot,the above step needs to be started manually.

It's output will be in directory /opt/nmon11b/output

To view it's output,download the nmon file(s) to your windows desktop.
Run nmon analyser and you will get .xls output for the whole range of specifications.

Kindly let me know in case it didn't work out.
nmon.zip
nmon-analyser.zip
0
chrisk61Author Commented:
OK followed above and got an error:

nohup: cannot run command `/usr/local/bin/chk_tcp': No such file or directory

I can see output in "mydomain.nmon"
but not in
"mydomain.-chk_tcp-100406_2131.log" and not in "mydomain_100406_2131.nmon"

Can you confirm what I should be looking for please?
0
deepak_iqCommented:
Add following file in /usr/local/bin:

vi chk_tcp
#!/bin/bash

#set -x

# Check command line arguments.

if [ "$#" -eq "0" ] ; then
   echo  Error:   Missing parameter.
   echo "Usage:\n\t$(basename $0) "
   echo  "Example:\n\t$(basename $0) 5"
   exit 1
fi


sleep=$1

x=0
while [ $x -lt 1 ]
do
http="none"
http=`ps -ef | grep httpd |wc -l`

netstat -an >/tmp/netstat.txt
EST=`grep -i ESTABLISH /tmp/netstat.txt | wc -l`
TW=`grep -i TIME_WAIT /tmp/netstat.txt | wc -l`
CW=`grep -i CLOSE_WAIT /tmp/netstat.txt | wc -l`
FIN=`grep -i FIN /tmp/netstat.txt | wc -l`

#echo `date` TCP connection info for `hostname`
echo `date` `hostname`  EST:$EST TW:$TW CW:$CW FIN:$FIN http:$http
sleep $sleep
done

#chmod +x /usr/local/bin/chk_tcp

And then execute :

#/usr/local/bin/start_nmon 300 300

To find nmon running in server or not:

# ps -eaf|grep nmon
0
deepak_iqCommented:
Before running nmon manually,make sure you kill already running any previous nmon process.
0
chrisk61Author Commented:
Thanks this works fine but can you confirm what I am supposed to be looking at please?
As a reminder I am trying to find the cause of the hourly server loading peaks.
0
deepak_iqCommented:

Ok,first hurdle clear.

Now, you ftp the nmon file to your local desktop.

Open nmon_analyser xls sheet and there one option is there : Analyse nmon data.

Once you select that it will ask you for nmon file,select the nmon file location . This will automatically create one xls file output ,which you can save by any name on a specified location on your pc.

Regarding what information we gather from it.

We can gather whole lot of information from each section of that excel sheet. Particular ones in this case will be to see which processes are consuming max. memory,what is cpu utilization,what are disk i/o have been for all the disks during that duration.

For more understanding you can visit IBM's website for nmon-analyser and also I had attached one doc for understanding the various terms for the output you have received.

If you still face any difficulty,send me the nmon file and I will let you know in general what information we have captured from there.

Hope this will resolve your query.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.