RHEL 5.2 very slow on RAID 5 and RAID 6


I have production server on RHEL 5.2 (Linux 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:12 EDT 2008 i686 i686 i386 GNU/Linux) which has LOCAL disk array on 15k SAS drives in RAID 5 array, and SAN storage on same SAS 15k drives on RAID 6.
SAN is connected via HBA FC addapter, 4 MBps speed.
Machine is actually Industry Standard Server, with 8 GB ECC RAM and 2 x QuadCore XEON processor, real HW RAID controller etc...which makes it quite a beast.

Now, we are running only 1 single application on it, which utilises JAVA on TOMCAT platform, and all 8 CPU cores are most of the time just stratching the floor - under 1% CPU utilisation average.

Partitions are all configured with LVM, for the whole purpose of beeing able to expand when needed.

And teh PROBLEM?
Here: when I run "du", for example, on local RAID 5 partition (600 GB), it takes 2 hours for command to complete! While "du" running, CPU's are almost idle, under 1%, only disks I/O are running on full. Issuing "du" on SAN partition (RAID 6) it is a bit faster, but it still takes 45 minutes to finish scanning 400 GB of files.
Also, when this single application, which runs on server, is trying to reindex all the files on RAID 5 and RAID 6 array to update info of all files, it takes 2 days or more, and all services are disabled at that time, while other users of the software report this same task to be finished within few hours, not days!

I then run disk benchmark, and it shows almost 200 MB/s file copy speeds, which is great.

So I am lost between fast and mostly idle CPU's, very fast disk arrays, and on the other side very poor and actually not acceptable performance of disk intensive operations.

Looking for idea of how to find the bottleneck of the system. Suggestions welcome.
LVL 18
Andrej PirmanAsked:
Who is Participating?
arnoldConnect With a Mentor Commented:
You seem to go on a premise rather than a qualitative analysis.
A reindex process can take two months or longer, but as long as the recent and the just added documents were indexed first, no one will notice how long the reindexing process takes which is where the "kinda optimised" mechanism comes into play. An added mechanism could be cached queries which could expedite the indexing by including those documents in the reindex mechanism.

The question is how much time passes from the initiation of the reindex and the return of some of the functionality?

Regarding defining  your server as better than the others, as you've noted, the bottleneck is in the Disk I/O.
So a server with Dual single Core Xeon 2.8GHz processor with a similar RAID configuration will likely run the same as yours.

Do the others perhaps have a RAID 10 setup?  May be they configured their setup slightly differently i.e. instead of having two huge partitions (400GB and 600GB) may be they separated the content into more designated paritions. instead of a single 600 GB they have 5 120GB with each partition defined to include a specific set of content.

Do they also use LVM on their RAID allocated partitions?

What is the application that is being referenced here?

The only time you seem to be encountering a bottleneck of any kind is when you run du which as you said is very disk I/O intensive.
How frequently do you need to run du?

Do other users of this application have the same scope of data processed, i.e. 1 Terabyte of data?  Are the RAID partitions healthy (no failed drives)?
Do the others have the same quantity of files?
Is the reindexing mechanism optimally configured, or event identically configured among the various users of the software?
Andrej PirmanAuthor Commented:

I actually do not need to run "du" at any time, because there are other methods of determining actual disk usage. "du" was just a test to determine disk speed and to illustrate slow I/O in my system.

Regarding application's internal database reindexing, I assume process is kinda optimised, since other administrators, which host many more clients as I do, do NOT report even near so long-lasting reindexing process - most of them report reindexing to be finished in couple of hours, not days. And to be even more weird, most of others do not have such a good server, which points to a conclustion, that there is a configuration mistake or some other botleneck in my system.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.