I have production server on RHEL 5.2 (Linux 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:12 EDT 2008 i686 i686 i386 GNU/Linux) which has LOCAL disk array on 15k SAS drives in RAID 5 array, and SAN storage on same SAS 15k drives on RAID 6.
SAN is connected via HBA FC addapter, 4 MBps speed.
Machine is actually Industry Standard Server, with 8 GB ECC RAM and 2 x QuadCore XEON processor, real HW RAID controller etc...which makes it quite a beast.
Now, we are running only 1 single application on it, which utilises JAVA on TOMCAT platform, and all 8 CPU cores are most of the time just stratching the floor - under 1% CPU utilisation average.
Partitions are all configured with LVM, for the whole purpose of beeing able to expand when needed.
And teh PROBLEM?
Here: when I run "du", for example, on local RAID 5 partition (600 GB), it takes 2 hours for command to complete! While "du" running, CPU's are almost idle, under 1%, only disks I/O are running on full. Issuing "du" on SAN partition (RAID 6) it is a bit faster, but it still takes 45 minutes to finish scanning 400 GB of files.
Also, when this single application, which runs on server, is trying to reindex all the files on RAID 5 and RAID 6 array to update info of all files, it takes 2 days or more, and all services are disabled at that time, while other users of the software report this same task to be finished within few hours, not days!
I then run disk benchmark, and it shows almost 200 MB/s file copy speeds, which is great.
So I am lost between fast and mostly idle CPU's, very fast disk arrays, and on the other side very poor and actually not acceptable performance of disk intensive operations.
Looking for idea of how to find the bottleneck of the system. Suggestions welcome.