RHEL 5.2 very slow on RAID 5 and RAID 6

Posted on 2008-10-24
Last Modified: 2012-05-05

I have production server on RHEL 5.2 (Linux 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:12 EDT 2008 i686 i686 i386 GNU/Linux) which has LOCAL disk array on 15k SAS drives in RAID 5 array, and SAN storage on same SAS 15k drives on RAID 6.
SAN is connected via HBA FC addapter, 4 MBps speed.
Machine is actually Industry Standard Server, with 8 GB ECC RAM and 2 x QuadCore XEON processor, real HW RAID controller etc...which makes it quite a beast.

Now, we are running only 1 single application on it, which utilises JAVA on TOMCAT platform, and all 8 CPU cores are most of the time just stratching the floor - under 1% CPU utilisation average.

Partitions are all configured with LVM, for the whole purpose of beeing able to expand when needed.

And teh PROBLEM?
Here: when I run "du", for example, on local RAID 5 partition (600 GB), it takes 2 hours for command to complete! While "du" running, CPU's are almost idle, under 1%, only disks I/O are running on full. Issuing "du" on SAN partition (RAID 6) it is a bit faster, but it still takes 45 minutes to finish scanning 400 GB of files.
Also, when this single application, which runs on server, is trying to reindex all the files on RAID 5 and RAID 6 array to update info of all files, it takes 2 days or more, and all services are disabled at that time, while other users of the software report this same task to be finished within few hours, not days!

I then run disk benchmark, and it shows almost 200 MB/s file copy speeds, which is great.

So I am lost between fast and mostly idle CPU's, very fast disk arrays, and on the other side very poor and actually not acceptable performance of disk intensive operations.

Looking for idea of how to find the bottleneck of the system. Suggestions welcome.
Question by:Andrej Pirman
  • 2
LVL 77

Expert Comment

ID: 22798835
The only time you seem to be encountering a bottleneck of any kind is when you run du which as you said is very disk I/O intensive.
How frequently do you need to run du?

Do other users of this application have the same scope of data processed, i.e. 1 Terabyte of data?  Are the RAID partitions healthy (no failed drives)?
Do the others have the same quantity of files?
Is the reindexing mechanism optimally configured, or event identically configured among the various users of the software?
LVL 18

Author Comment

by:Andrej Pirman
ID: 22802105

I actually do not need to run "du" at any time, because there are other methods of determining actual disk usage. "du" was just a test to determine disk speed and to illustrate slow I/O in my system.

Regarding application's internal database reindexing, I assume process is kinda optimised, since other administrators, which host many more clients as I do, do NOT report even near so long-lasting reindexing process - most of them report reindexing to be finished in couple of hours, not days. And to be even more weird, most of others do not have such a good server, which points to a conclustion, that there is a configuration mistake or some other botleneck in my system.
LVL 77

Accepted Solution

arnold earned 500 total points
ID: 22803470
You seem to go on a premise rather than a qualitative analysis.
A reindex process can take two months or longer, but as long as the recent and the just added documents were indexed first, no one will notice how long the reindexing process takes which is where the "kinda optimised" mechanism comes into play. An added mechanism could be cached queries which could expedite the indexing by including those documents in the reindex mechanism.

The question is how much time passes from the initiation of the reindex and the return of some of the functionality?

Regarding defining  your server as better than the others, as you've noted, the bottleneck is in the Disk I/O.
So a server with Dual single Core Xeon 2.8GHz processor with a similar RAID configuration will likely run the same as yours.

Do the others perhaps have a RAID 10 setup?  May be they configured their setup slightly differently i.e. instead of having two huge partitions (400GB and 600GB) may be they separated the content into more designated paritions. instead of a single 600 GB they have 5 120GB with each partition defined to include a specific set of content.

Do they also use LVM on their RAID allocated partitions?

What is the application that is being referenced here?


Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article we have discussed the manual scenarios to recover data from Windows 10 through some backup and recovery tools which are offered by it.
It’s 2016. Password authentication should be dead — or at least close to dying. But, unfortunately, it has not traversed Quagga stage yet. Using password authentication is like laundering hotel guest linens with a washboard — it’s Passé.
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

895 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now