Solved

Load at 5, no CPU I/O or swap in use

Posted on 2010-08-19
20
406 Views
Last Modified: 2013-12-06
Hi,

We are currently running CentOS 5 update 4 on a Dell R910 server 16 cores/32 hyperthreaded with 64GB of memory. It is our main Oracle 11g DB server for one of our customers and is attached to an MD 3000 storage array. We are having a load averaging around 5 but see no swap in use, CPUs are pretty much idle and no I/O wait. We have Oracle dataguard turned on in transactional mode. I've checked everything that I can think of, there are no Oracle processes running which would cause a spike. Anyone have any ideas as to what to check next?

I have another R910 configured the same way and do not see any issues with the 3 databases running on that server. The load is at .5.

Thanks
0
Comment
Question by:mw-hosting
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 7
  • 2
  • +2
20 Comments
 
LVL 26

Expert Comment

by:arober11
ID: 33483785
Stop as many of the daemons as you can on the server e.g. Oracle and Apache then check the load, if still high try a re-boot, Else bring the services back one at a time and monitor the impact on the load. If a culprit is identified let us know.
0
 
LVL 29

Expert Comment

by:Michael Worsham
ID: 33486312
Do you have SELinux and auditd enabled?
0
 

Author Comment

by:mw-hosting
ID: 33486697
SELinux not runnig

auditd is runnig.
0
The Orion Papers

Are you interested in becoming an AWS Certified Solutions Architect?

Discover a new interactive way of training for the exam.

 
LVL 29

Expert Comment

by:Michael Worsham
ID: 33486734
Can you post a chkconfig list just to see if you have any serviers you don't need enabled/running?

"chkconfig --list"

0
 

Author Comment

by:mw-hosting
ID: 33486868
All running services from chkconfig --list are the same between both R910's....I guess we are going to have to shut down the Oracle database and see if the issue is with that. That's all we have running on it besides a few out of box CentOS services.



0
 
LVL 29

Expert Comment

by:Michael Worsham
ID: 33486937
Have you tried a basic 'top' and see what process(es) are peaking?
0
 

Author Comment

by:mw-hosting
ID: 33487033
Sure did. There is very little activity going on the server. I do see on occasion the oracle processes, scsi_eh_3 and hald-addon-stor.

 CPU is at most 3% (oracle)

These processes appear also on our other R910 so at least the last two processes appear to be normal.

0
 
LVL 29

Expert Comment

by:Michael Worsham
ID: 33487112
I can tell you that the services mcstrans should be put into a 'stopped'/'off' state. Our Oracle DBA discovered that this sometimes causes CPU spikes when running.

Also, some unnecessary services that can be shutdown & deinstalled: pcsc-lite (PCSC Crypto Card detection), smartmontools (SMART drive monitoring), bluez-utils (bluetooth). If you aren't using SELinux, make sure that setroubleshootd is also in a disabled state (or even better -- deinstalled).
0
 

Author Comment

by:mw-hosting
ID: 33487157
SE Linux and the firewall are also disabled. The odd thing is we don't see any CPU spikes at all, just the load is high.



0
 
LVL 29

Expert Comment

by:Michael Worsham
ID: 33487191
It could be that your pagecache and slabcache is peaked.

Try this (as root):

echo 3 > /proc/sys/vm/drop_caches
0
 

Author Comment

by:mw-hosting
ID: 33488944
Tried that, still high load.

Maybe it is the hardware, we have dell's openmanage installed and no alerts there.
0
 
LVL 29

Expert Comment

by:Michael Worsham
ID: 33489931
Can you post a screenshot of a 'top' output?
0
 

Author Comment

by:mw-hosting
ID: 33490003
Screen shot of top attached
screenshot.png
0
 
LVL 12

Expert Comment

by:hfraser
ID: 33490180
The top command shows an average of the 16 cores (which actually appears as 32 processors because they're hyperthreaded). If you hit the "1" key, top will display the states for each of the processors. You might find that 5 or 6 of the processors are actually busy, while the rest are idle.

The load average is a sampled measurement of the run queue. On a single core machine, a load average of 1.0 means there was one runnable process when the sample was taken (about every 5 seconds). A value of 2.0 means there were 2 runnable processes, which of course means the cpu's overloaded.  But on a dual-core system, and value of 2 typically means each of the cores has a single runnable process.

All this simply means that a load average of 5 on a 16-core machine means it's very lightly loaded (less than a third of its capacity). The rule-of-thumb is to start worrying at 70% of capacity, which translates to .7*16 or a load average of 11.2.

So use top with the separate processor stats to see what the system really looks like. Also, keep in mind that the load average is a sampled value, and may not translate to how the system performs. The general wisdom is that the absolute number isn't as important as the change in value, which is a flag that something's happening.
0
 

Author Comment

by:mw-hosting
ID: 33490214
I would accept that but the other R910 server (same configuration) is sitting a .5 load and has 3 database servers compared to this one. I had run the 1 to show each CPU and it is still only at 3% on a single core if that.



0
 
LVL 3

Expert Comment

by:ckhsu1977
ID: 33490442
You mention you were going to shutdown oracle to see if the load drops. Did that happen?
0
 
LVL 12

Expert Comment

by:hfraser
ID: 33491376
Download a copy of iotop (or use iostat) to see if there's a lot of I/O, particularly swapping or faulting. Your system sure isn't faulting pages out, because it has lots of free memory, but lets check to be sure.
0
 

Author Comment

by:mw-hosting
ID: 33557327
I had determined this past weekend that it was one of the hal daemon processes that was causing the issue. Is there anything I can look at to determine what may have caused the issue with that process?
0
 
LVL 29

Expert Comment

by:Michael Worsham
ID: 33567035
When the process is running, you can use 'strace' to see what the process is attempting to call/utilize upon the server environment.

0
 

Accepted Solution

by:
mw-hosting earned 0 total points
ID: 33597390
It was the haldaemon that was taking up the load.

Anyone come across this before?
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It’s 2016. Password authentication should be dead — or at least close to dying. But, unfortunately, it has not traversed Quagga stage yet. Using password authentication is like laundering hotel guest linens with a washboard — it’s Passé.
Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
This video shows how to Export data from an Oracle database using the Datapump Export Utility.  The corresponding Datapump Import utility is also discussed and demonstrated.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question