RHEL 5.2 very slow on RAID 5 and RAID 6

Posted on 2008-10-24
Last Modified: 2012-05-05

I have production server on RHEL 5.2 (Linux 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:12 EDT 2008 i686 i686 i386 GNU/Linux) which has LOCAL disk array on 15k SAS drives in RAID 5 array, and SAN storage on same SAS 15k drives on RAID 6.
SAN is connected via HBA FC addapter, 4 MBps speed.
Machine is actually Industry Standard Server, with 8 GB ECC RAM and 2 x QuadCore XEON processor, real HW RAID controller etc...which makes it quite a beast.

Now, we are running only 1 single application on it, which utilises JAVA on TOMCAT platform, and all 8 CPU cores are most of the time just stratching the floor - under 1% CPU utilisation average.

Partitions are all configured with LVM, for the whole purpose of beeing able to expand when needed.

And teh PROBLEM?
Here: when I run "du", for example, on local RAID 5 partition (600 GB), it takes 2 hours for command to complete! While "du" running, CPU's are almost idle, under 1%, only disks I/O are running on full. Issuing "du" on SAN partition (RAID 6) it is a bit faster, but it still takes 45 minutes to finish scanning 400 GB of files.
Also, when this single application, which runs on server, is trying to reindex all the files on RAID 5 and RAID 6 array to update info of all files, it takes 2 days or more, and all services are disabled at that time, while other users of the software report this same task to be finished within few hours, not days!

I then run disk benchmark, and it shows almost 200 MB/s file copy speeds, which is great.

So I am lost between fast and mostly idle CPU's, very fast disk arrays, and on the other side very poor and actually not acceptable performance of disk intensive operations.

Looking for idea of how to find the bottleneck of the system. Suggestions welcome.
Question by:Andrej Pirman
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
LVL 79

Expert Comment

ID: 22798835
The only time you seem to be encountering a bottleneck of any kind is when you run du which as you said is very disk I/O intensive.
How frequently do you need to run du?

Do other users of this application have the same scope of data processed, i.e. 1 Terabyte of data?  Are the RAID partitions healthy (no failed drives)?
Do the others have the same quantity of files?
Is the reindexing mechanism optimally configured, or event identically configured among the various users of the software?
LVL 18

Author Comment

by:Andrej Pirman
ID: 22802105

I actually do not need to run "du" at any time, because there are other methods of determining actual disk usage. "du" was just a test to determine disk speed and to illustrate slow I/O in my system.

Regarding application's internal database reindexing, I assume process is kinda optimised, since other administrators, which host many more clients as I do, do NOT report even near so long-lasting reindexing process - most of them report reindexing to be finished in couple of hours, not days. And to be even more weird, most of others do not have such a good server, which points to a conclustion, that there is a configuration mistake or some other botleneck in my system.
LVL 79

Accepted Solution

arnold earned 500 total points
ID: 22803470
You seem to go on a premise rather than a qualitative analysis.
A reindex process can take two months or longer, but as long as the recent and the just added documents were indexed first, no one will notice how long the reindexing process takes which is where the "kinda optimised" mechanism comes into play. An added mechanism could be cached queries which could expedite the indexing by including those documents in the reindex mechanism.

The question is how much time passes from the initiation of the reindex and the return of some of the functionality?

Regarding defining  your server as better than the others, as you've noted, the bottleneck is in the Disk I/O.
So a server with Dual single Core Xeon 2.8GHz processor with a similar RAID configuration will likely run the same as yours.

Do the others perhaps have a RAID 10 setup?  May be they configured their setup slightly differently i.e. instead of having two huge partitions (400GB and 600GB) may be they separated the content into more designated paritions. instead of a single 600 GB they have 5 120GB with each partition defined to include a specific set of content.

Do they also use LVM on their RAID allocated partitions?

What is the application that is being referenced here?


Featured Post

Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The purpose of this article is to demonstrate how we can use conditional statements using Python.
When we purchase storage, we typically are advertised storage of 500GB, 1TB, 2TB and so on. However, when you actually install it into your computer, your 500GB HDD will actually show up as 465GB. Why? It has to do with the way people and computersā€¦
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to moveā€¦
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses

635 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question