Solved

RHEL 5.2 very slow on RAID 5 and RAID 6

Posted on 2008-10-24
6
1,575 Views
Last Modified: 2012-05-05
Hi,

I have production server on RHEL 5.2 (Linux 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:12 EDT 2008 i686 i686 i386 GNU/Linux) which has LOCAL disk array on 15k SAS drives in RAID 5 array, and SAN storage on same SAS 15k drives on RAID 6.
SAN is connected via HBA FC addapter, 4 MBps speed.
Machine is actually Industry Standard Server, with 8 GB ECC RAM and 2 x QuadCore XEON processor, real HW RAID controller etc...which makes it quite a beast.

Now, we are running only 1 single application on it, which utilises JAVA on TOMCAT platform, and all 8 CPU cores are most of the time just stratching the floor - under 1% CPU utilisation average.

Partitions are all configured with LVM, for the whole purpose of beeing able to expand when needed.

And teh PROBLEM?
Here: when I run "du", for example, on local RAID 5 partition (600 GB), it takes 2 hours for command to complete! While "du" running, CPU's are almost idle, under 1%, only disks I/O are running on full. Issuing "du" on SAN partition (RAID 6) it is a bit faster, but it still takes 45 minutes to finish scanning 400 GB of files.
Also, when this single application, which runs on server, is trying to reindex all the files on RAID 5 and RAID 6 array to update info of all files, it takes 2 days or more, and all services are disabled at that time, while other users of the software report this same task to be finished within few hours, not days!

I then run disk benchmark, and it shows almost 200 MB/s file copy speeds, which is great.

So I am lost between fast and mostly idle CPU's, very fast disk arrays, and on the other side very poor and actually not acceptable performance of disk intensive operations.

Looking for idea of how to find the bottleneck of the system. Suggestions welcome.
0
Comment
Question by:Andrej Pirman
  • 2
6 Comments
 
LVL 76

Expert Comment

by:arnold
ID: 22798835
The only time you seem to be encountering a bottleneck of any kind is when you run du which as you said is very disk I/O intensive.
How frequently do you need to run du?

Do other users of this application have the same scope of data processed, i.e. 1 Terabyte of data?  Are the RAID partitions healthy (no failed drives)?
Do the others have the same quantity of files?
Is the reindexing mechanism optimally configured, or event identically configured among the various users of the software?
0
 
LVL 18

Author Comment

by:Andrej Pirman
ID: 22802105
Hi,

I actually do not need to run "du" at any time, because there are other methods of determining actual disk usage. "du" was just a test to determine disk speed and to illustrate slow I/O in my system.

Regarding application's internal database reindexing, I assume process is kinda optimised, since other administrators, which host many more clients as I do, do NOT report even near so long-lasting reindexing process - most of them report reindexing to be finished in couple of hours, not days. And to be even more weird, most of others do not have such a good server, which points to a conclustion, that there is a configuration mistake or some other botleneck in my system.
0
 
LVL 76

Accepted Solution

by:
arnold earned 500 total points
ID: 22803470
You seem to go on a premise rather than a qualitative analysis.
A reindex process can take two months or longer, but as long as the recent and the just added documents were indexed first, no one will notice how long the reindexing process takes which is where the "kinda optimised" mechanism comes into play. An added mechanism could be cached queries which could expedite the indexing by including those documents in the reindex mechanism.

The question is how much time passes from the initiation of the reindex and the return of some of the functionality?

Regarding defining  your server as better than the others, as you've noted, the bottleneck is in the Disk I/O.
So a server with Dual single Core Xeon 2.8GHz processor with a similar RAID configuration will likely run the same as yours.

Do the others perhaps have a RAID 10 setup?  May be they configured their setup slightly differently i.e. instead of having two huge partitions (400GB and 600GB) may be they separated the content into more designated paritions. instead of a single 600 GB they have 5 120GB with each partition defined to include a specific set of content.

Do they also use LVM on their RAID allocated partitions?

What is the application that is being referenced here?

0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Usually shares are where we want them for our users and we tend to take them for granted. There are times, however, when those shares may disappear causing difficulty for your users. One of the first things to try is searching for files that shou…
The Samsung SSD 840 EVO and 840 EVO mSATA have a well-known problem with a drop in read performance. I first learned about this in an interesting thread here at Experts Exchange: http://www.experts-exchange.com/Hardware/Storage/Hard_Drives/Q_2852…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now