Solved

Load at 5, no CPU I/O or swap in use

Posted on 2010-08-19
20
405 Views
Last Modified: 2013-12-06
Hi,

We are currently running CentOS 5 update 4 on a Dell R910 server 16 cores/32 hyperthreaded with 64GB of memory. It is our main Oracle 11g DB server for one of our customers and is attached to an MD 3000 storage array. We are having a load averaging around 5 but see no swap in use, CPUs are pretty much idle and no I/O wait. We have Oracle dataguard turned on in transactional mode. I've checked everything that I can think of, there are no Oracle processes running which would cause a spike. Anyone have any ideas as to what to check next?

I have another R910 configured the same way and do not see any issues with the 3 databases running on that server. The load is at .5.

Thanks
0
Comment
Question by:mw-hosting
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 7
  • 2
  • +2
20 Comments
 
LVL 26

Expert Comment

by:arober11
ID: 33483785
Stop as many of the daemons as you can on the server e.g. Oracle and Apache then check the load, if still high try a re-boot, Else bring the services back one at a time and monitor the impact on the load. If a culprit is identified let us know.
0
 
LVL 29

Expert Comment

by:Michael Worsham
ID: 33486312
Do you have SELinux and auditd enabled?
0
 

Author Comment

by:mw-hosting
ID: 33486697
SELinux not runnig

auditd is runnig.
0
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

 
LVL 29

Expert Comment

by:Michael Worsham
ID: 33486734
Can you post a chkconfig list just to see if you have any serviers you don't need enabled/running?

"chkconfig --list"

0
 

Author Comment

by:mw-hosting
ID: 33486868
All running services from chkconfig --list are the same between both R910's....I guess we are going to have to shut down the Oracle database and see if the issue is with that. That's all we have running on it besides a few out of box CentOS services.



0
 
LVL 29

Expert Comment

by:Michael Worsham
ID: 33486937
Have you tried a basic 'top' and see what process(es) are peaking?
0
 

Author Comment

by:mw-hosting
ID: 33487033
Sure did. There is very little activity going on the server. I do see on occasion the oracle processes, scsi_eh_3 and hald-addon-stor.

 CPU is at most 3% (oracle)

These processes appear also on our other R910 so at least the last two processes appear to be normal.

0
 
LVL 29

Expert Comment

by:Michael Worsham
ID: 33487112
I can tell you that the services mcstrans should be put into a 'stopped'/'off' state. Our Oracle DBA discovered that this sometimes causes CPU spikes when running.

Also, some unnecessary services that can be shutdown & deinstalled: pcsc-lite (PCSC Crypto Card detection), smartmontools (SMART drive monitoring), bluez-utils (bluetooth). If you aren't using SELinux, make sure that setroubleshootd is also in a disabled state (or even better -- deinstalled).
0
 

Author Comment

by:mw-hosting
ID: 33487157
SE Linux and the firewall are also disabled. The odd thing is we don't see any CPU spikes at all, just the load is high.



0
 
LVL 29

Expert Comment

by:Michael Worsham
ID: 33487191
It could be that your pagecache and slabcache is peaked.

Try this (as root):

echo 3 > /proc/sys/vm/drop_caches
0
 

Author Comment

by:mw-hosting
ID: 33488944
Tried that, still high load.

Maybe it is the hardware, we have dell's openmanage installed and no alerts there.
0
 
LVL 29

Expert Comment

by:Michael Worsham
ID: 33489931
Can you post a screenshot of a 'top' output?
0
 

Author Comment

by:mw-hosting
ID: 33490003
Screen shot of top attached
screenshot.png
0
 
LVL 12

Expert Comment

by:hfraser
ID: 33490180
The top command shows an average of the 16 cores (which actually appears as 32 processors because they're hyperthreaded). If you hit the "1" key, top will display the states for each of the processors. You might find that 5 or 6 of the processors are actually busy, while the rest are idle.

The load average is a sampled measurement of the run queue. On a single core machine, a load average of 1.0 means there was one runnable process when the sample was taken (about every 5 seconds). A value of 2.0 means there were 2 runnable processes, which of course means the cpu's overloaded.  But on a dual-core system, and value of 2 typically means each of the cores has a single runnable process.

All this simply means that a load average of 5 on a 16-core machine means it's very lightly loaded (less than a third of its capacity). The rule-of-thumb is to start worrying at 70% of capacity, which translates to .7*16 or a load average of 11.2.

So use top with the separate processor stats to see what the system really looks like. Also, keep in mind that the load average is a sampled value, and may not translate to how the system performs. The general wisdom is that the absolute number isn't as important as the change in value, which is a flag that something's happening.
0
 

Author Comment

by:mw-hosting
ID: 33490214
I would accept that but the other R910 server (same configuration) is sitting a .5 load and has 3 database servers compared to this one. I had run the 1 to show each CPU and it is still only at 3% on a single core if that.



0
 
LVL 3

Expert Comment

by:ckhsu1977
ID: 33490442
You mention you were going to shutdown oracle to see if the load drops. Did that happen?
0
 
LVL 12

Expert Comment

by:hfraser
ID: 33491376
Download a copy of iotop (or use iostat) to see if there's a lot of I/O, particularly swapping or faulting. Your system sure isn't faulting pages out, because it has lots of free memory, but lets check to be sure.
0
 

Author Comment

by:mw-hosting
ID: 33557327
I had determined this past weekend that it was one of the hal daemon processes that was causing the issue. Is there anything I can look at to determine what may have caused the issue with that process?
0
 
LVL 29

Expert Comment

by:Michael Worsham
ID: 33567035
When the process is running, you can use 'strace' to see what the process is attempting to call/utilize upon the server environment.

0
 

Accepted Solution

by:
mw-hosting earned 0 total points
ID: 33597390
It was the haldaemon that was taking up the load.

Anyone come across this before?
0

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The purpose of this article is to fix the unknown display problem in Linux Mint operating system. After installing the OS if you see Display monitor is not recognized then we can install "MESA" utilities to fix this problem or we can install additio…
Fine Tune your automatic Updates for Ubuntu / Debian
This video shows how to configure and send email from and Oracle database using both UTL_SMTP and UTL_MAIL, as well as comparing UTL_SMTP to a manual SMTP conversation with a mail server.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question