Solved

Slow disk performance on HP DL 380G5 SAS RAID1+0

Posted on 2009-05-17
26
3,825 Views
Last Modified: 2012-05-07
Dear all,

I have three HP DL 380 G5 servers, all with a RAID1+0 configuration. They all experience slow disk performance, especially when transferring large files.

No errors show up on the event log or in the HP Insight Diagnostics.

Any clues?


Hereunder the details of the storage configuration.


Array Controller, HP P400, Slot 1 - HP P400
Model HP P400
Firmware 5.20
Configured logical drives 2
Total Memory Size 512 Mbytes
Installed Memory 512 Mbytes
Usable Cache RAM 464 Mbytes
Battery 1 OK
Serial Number PAFGK0P9VWO6XA
PCI Slot Number 1
World Wide Name 50014380028C32D8
BIOS boot device order 1
Logical Volume 0, Controller Slot 1 Bus 0 - 73.4 Gbytes RAID 1
Status Code 0
Status Description OK
Model HP LOGICAL VOLUME
Firmware 5.20
Capacity 73.4 Gbytes
Device Path \\.\SCSI2:
Volume ID 0
Fault Tolerance RAID 1
Logical Drive Parameters
Controller Drive Count 4
Volume Drive Count 2
Drive Parameter Table
Stripe Size (physical blocks) 256
Physical Hard Drive 1, Controller Slot 1 - 73.4 Gbytes 15K RPM - HP DH072ABAA6
Model HP DH072ABAA6
Firmware HPD9
Capacity 73.4 Gbytes
Controller Array Controller, HP P400, Slot 1
Rotation Rate 15K RPM
Serial Number 3PD0KL6H00009802FDZS
Temperature 32 Degrees Celsius
Cable Configuration Not available
Connector 2I
Enclosure Number 1
Enclosure Bay 1
Drive type SAS Hard Drive
Negotiated link rate 3.0 Gbps
Read Errors Hard 00000000
Read Errors Retry Recovered 00000000
Write Errors Hard 00000000
Write Errors Retry Recovered 00000000
Predictive Failure Errors 00000000
Drive Present and Operational Yes
Sectors Written 0000000153b0e20f
Physical Hard Drive 2, Controller Slot 1 - 73.4 Gbytes 15K RPM - HP DH072ABAA6
Model HP DH072ABAA6
Firmware HPD9
Capacity 73.4 Gbytes
Controller Array Controller, HP P400, Slot 1
Rotation Rate 15K RPM
Serial Number 3PD0KA1Y00009802DTV2
Temperature 31 Degrees Celsius
Cable Configuration Not available
Connector 2I
Enclosure Number 1
Enclosure Bay 2
Drive type SAS Hard Drive
Negotiated link rate 3.0 Gbps
Read Errors Hard 00000000
Read Errors Retry Recovered 00000000
Write Errors Hard 00000000
Write Errors Retry Recovered 00000000
Predictive Failure Errors 00000000
Drive Present and Operational Yes
Sectors Written 000000014b265311
Logical Volume 1, Controller Slot 1 Bus 0 - 146.8 Gbytes RAID 1
Status Code 0
Status Description OK
Model HP LOGICAL VOLUME
Firmware 5.20
Capacity 146.8 Gbytes
Device Path \\.\SCSI2:
Volume ID 0
Fault Tolerance RAID 1
Logical Drive Parameters
Controller Drive Count 4
Volume Drive Count 2
Drive Parameter Table
Stripe Size (physical blocks) 256
Physical Hard Drive 3, Controller Slot 1 - 146.8 Gbytes 10K RPM - HP DG146BAAJB
Model HP DG146BAAJB
Firmware HPD9
Capacity 146.8 Gbytes
Controller Array Controller, HP P400, Slot 1
Rotation Rate 10K RPM
Serial Number P4WV49JA
Temperature 34 Degrees Celsius
Cable Configuration 16 bit wide data path on a cable with 16 SCSI IDs supported
Connector 1I
Enclosure Number 1
Enclosure Bay 7
Drive type SAS Hard Drive
Negotiated link rate 3.0 Gbps
Read Errors Hard 00000000
Read Errors Retry Recovered 00000000
Write Errors Hard 00000000
Write Errors Retry Recovered 00000000
Predictive Failure Errors 00000000
Drive Present and Operational Yes
Sectors Written 0000000090a8c491
Physical Hard Drive 4, Controller Slot 1 - 146.8 Gbytes 10K RPM - HP DG146BAAJB
Model HP DG146BAAJB
Firmware HPD9
Capacity 146.8 Gbytes
Controller Array Controller, HP P400, Slot 1
Rotation Rate 10K RPM
Serial Number P4WUAL2A
Temperature 35 Degrees Celsius
Cable Configuration 16 bit wide data path on a cable with 16 SCSI IDs supported
Connector 1I
Enclosure Number 1
Enclosure Bay 8
Drive type SAS Hard Drive
Negotiated link rate 3.0 Gbps
Read Errors Hard 00000000
Read Errors Retry Recovered 00000000
Write Errors Hard 00000000
Write Errors Retry Recovered 00000000
Predictive Failure Errors 00000000
Drive Present and Operational Yes
Sectors Written 0000000090a8c463
Logical Disks
C: Logical Disk 1 ~ 73.37GB
Volume Name
File System NTFS
Size (Byte) 73368432640
Free Space (Byte) 41185562624
D: Logical Disk 2 ~ 146.77GB
Volume Name
File System NTFS
Size (Byte) 146774487040
Free Space (Byte) 47386877952
0
Comment
Question by:nd2u
  • 13
  • 5
  • 4
  • +1
26 Comments
 
LVL 42

Expert Comment

by:paulsolov
ID: 24409101
What is your source? Where are you copying from and to?  If you're doing it over the network is the switch you're using 10/100 or 1000?

Do you have a BBWC installed on the raid controller?

Have you baselined the transfer rate?
0
 

Author Comment

by:nd2u
ID: 24409487
It's not due to network connectivity as I'm copying from one logical drive to another.

What is BBWC ? Cache? Where do I check it?

What do you mean with "baselined the transfer rate"?
0
 
LVL 42

Expert Comment

by:paulsolov
ID: 24411152
Many times when you are copying from one drive to another it is slower then doing it over the network because the system is doing both the read and write functions.

http://search.hp.com/query.html?charset=iso-8859-1&lk=1&la=en&nh=10&st=1&rf=0&qs=&hpvc=sitewide&uf=1&qt=HP+Smart+Array+P400+Battery+Attach+Kit+bbwc&ocoldqt=P400+bbwc&oc=2511182

BBWC offers the following and is located on or near the raid and offers read cache functionality and other caching functionality.



By baselining transfer rate I meant have you times a large file and calculated the GB/minute?
0
 

Author Comment

by:nd2u
ID: 24411230
I checked the information on the system homepage. I think the BBWC was installed off factory?

Smart Array P400 Controller in Slot 1
Model:   Smart Array P400
Controller Status:   OK   CPU Usage:   6 %
Firmware Version:   5.20   Command Count:   59 /sec
Product Revision:   E   Command Latency:   138 /100000 sec
Serial Number:   PAFGK0M9VWJ9A5        
Rebuild Priority:   Medium   Expand Priority:   Medium
Internal SAS Ports:   2   External SAS Ports:   0

 Accelerator
Status:   Enabled   Battery Status:   OK
Serial Number:   PA2270J9VWH0Q1   Read Errors:   0
Total Memory:   512 MB   Write Errors:   0
Read Cache:   25%   Error Code:   None
Write Cache:   75%   Bad Data:  



Baseline transfer: I actually noticed the problem on the SQL dump. Where until the end of april, this took approximately 4 minutes, it now takes 75 minutes for a 7,5Gb database...

After looking in the SQL settings, I found out that the problem generally occurs when transferring large files. It does not occur when transferring many smaller files...

We have a D2D2T server in place; I tried to transfer a 11Gb files to my workstation. This took about 5 minutes. I then tried  to copy it to our DB server and it took 133 minutes...

No installation/configuration changes found place at the end of april and I really don't know what caused the problem. Also already tried deactivating the anti-virus, but this didn't improve the transfer speed...

I'm now downloading the HP Support pack and will install all latest drivers etc... (servers were installed beginning of march)

Any clues?
0
 
LVL 42

Expert Comment

by:paulsolov
ID: 24411382
How much memory do you have on the sytems and how much space do you have on the drive that has your pagefile?

When you're copying to a network location you're breaking up the Read/Write IO.  If you are writing to the same spool (same raid set and set of disks) the read/write are often slower than you would see on the network.
0
 

Author Comment

by:nd2u
ID: 24411544
There's 4 Gb of RAM in the systems; free disk space is around 50% on all drives.

I can understand that there's a difference in speed when copying over the network, but 11 Gb on 133 minutes is not acceptable in any case...
0
 
LVL 12

Expert Comment

by:rionroc
ID: 24419445
Hello

Look for a pin jumper settings.

Cheers!
0
 

Author Comment

by:nd2u
ID: 24419549
???
0
 
LVL 12

Expert Comment

by:rionroc
ID: 24424758
try one of your server, set raid1+0 configuration to [normal], for my understanding raid can only be use if you have more than one physical disk is active, and if there is only 1 physical disk (raid), a possibility to slow down the disk performance.

try to see also your scsi, maybe some jumpers needed to hookup in the pin.  for safe handling, see your scsi manual.

Cheers!
0
 

Author Comment

by:nd2u
ID: 24424873
We have 2 sets of 2 disks under RAID1+0 . Don't see a problem there? All disks are in hot-plug bays and I didn't even open the server (rack mounted model) yet!
0
 
LVL 12

Expert Comment

by:rionroc
ID: 24428847
Ok

what file manager are you using? is it the default windows explorer?

i doubt about the performance issue is not on your setup,

and if is not about your file manager, i think you just feel a performance degrade because you have compared it to the other system that is not HP DL 380 G5 server.

try to use different file manager and compare it with...

if is not with file managers, try to increase your virtual memory size and make sure was set to system cache.


cheers!
0
 

Author Comment

by:nd2u
ID: 24429106
I was indeed using the default windows explorer.

About the comparisons. The situation is rather bizarre.

I have 2 DL380G5's on site (DB1 and DB2) and one in a datacenter (DB3).

DB1 and DB2 are having the problem. DB3 not...

All machines have been installed from a clear system, without too many settings changes...

Virtual memory was at Custom Size (initial 2046, max 4092) for all servers and was set to Adjust best performance of programs. There's no difference in setting between DB1 and DB3

Main function for this server is SQL server!! On DB2 and DB3 there's an active Double Take in order to have a full system fail-over.

I update all drivers and firmware on DB1 and DB2, which didn't help.
0
Give your grad a cloud of their own!

With up to 8TB of storage, give your favorite graduate their own personal cloud to centralize all their photos, videos and music in one safe place. They can save, sync and share all their stuff, and automatic photo backup helps free up space on their smartphone and tablet.

 
LVL 12

Expert Comment

by:rionroc
ID: 24439335
Oh well if that's the case, I guess the server's you bought are not enough for your expectations.
Maybe next time you have to research more on what to buy or not.  You should have investigate first the hardware specs before buying it.


Good Luck.

Cheers!
0
 
LVL 55

Expert Comment

by:andyalder
ID: 24439406
I can't really see how you can blame the server, the P400 is a good controller and they have 4 * 15K disks, it may be a virus checker slowing it down or just a very fragmented file system. I would run perfmon and see just how many IOPS you're getting when transferring from one logical disk to the other. You can improve I/O by making sure you have the right stripe size (64K for SQL) and ensuring you align the disk using diskpart if pre Windows 2008 but that's only going to effect it by 10%.

If it used to be fast and now it's slow it's most likely something else accessing the disks slowing your test down, perfmon will probably show it.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 24439410
I just saw you mention doubletake. That probably explains it, can you suspend that for a while and test again.
0
 

Author Comment

by:nd2u
ID: 24439799
Hi Andy,

Already suspended Double Take, and this didn't really change anything... Double Take was activated about 10 days before the problem arose. The directories where I try to copy to/from are not replicated... E.g. today (holiday) DoubleTake has almost no activity, but still the problem shows...

I had already uninstalled NAI VirusScan on DB1 and DB2.


Some important info. The machines are only used for SQL 2000 and file server:

C: contains system + SQL database. No other activity.
D: contains our digital scan archive... meaning millions of TIFF files

I use Diskeeper on DB1. There's certainly no fragmentation. Diskeeper does not run during office hours.

DB1, DB2 and DB3 contain the same data and have the same hardware. Only difference is that SQL is not running on DB2 and DB3.


So, regarding your suggestions:

1° Changing stripe size of the RAID controller. Already checked and it's the same on all servers... What size should I test seen the issue of our digital archive on D: ?
2° About the IOPS. I attached two screenshots from the Task Manager on DB1, sorted by I/O Read bytes and I/O Write bytes. Is this sufficient information for you or do I need to run PerfMon? If so, could you perhaps give me some details about which Performance Object and Counter you need?


The weirdest thing for me is that DB3 does not have the problem. I also placed a case with HP and sent them the configuration scan of DB2 and DB3 from HP tools to check for differences...  They already asked me to run offline diagnostics, which showed no hardware errors.

TM-IO-Write.jpg
TM-IO-READ.jpg
0
 

Author Comment

by:nd2u
ID: 24439867
oh yes, one important aspect: Speed is fine at the beginning! Slow down starts after +/- 1 Gb
0
 
LVL 55

Expert Comment

by:andyalder
ID: 24439919
Remove Windows desktop search, searchindexer.exe shouldn't be on servers but someone at MS decided to push it out on an automatic update. Does DB3 have searchindexer.exe on it?
0
 

Author Comment

by:nd2u
ID: 24439932
hmmm... Double Take seems indeed to be the trouble causer...

Just stopped the service and transfer of a 7GB file from our D2D2T backup server to DB2 went fine at full speed...

Then I reactivated it and transfer of a +/- equally large file went down again...
Transfer-of-7GB-from-archive-to-.jpg
Transfer-of-7GB-from-archive-to-.jpg
Transfer-of-7GB-from-archive-to-.jpg
0
 

Author Comment

by:nd2u
ID: 24439944
About searchindexer: doesn't this improve the speed for the clients as well if there looking up a file on the server (often needed in the digital archive)
0
 
LVL 55

Expert Comment

by:andyalder
ID: 24440190
Windows search may improve the speed if they are searching for words within the documents but it's not going to help with TIFFs since you aren't searching for them but presumably look up the file name under SQL. Is it installed on server that works?
0
 

Author Comment

by:nd2u
ID: 24440235
Well, our users also have their working directories on D: where they store their Word files. They often use the search function on their workstation if they need to find back a file that cannot be retrieved via our application.

The .doc files are converted to .tiff when they're sent out to our clients, and .doc files are deleted after 2 months. Incoming documents are also scanned & archived as .tiff.

That's the background of our archive.


Does it make sense to keep the searchindexer in this perspective or not according to you?

I'm in the process of opening a case with Double Take.
0
 
LVL 55

Accepted Solution

by:
andyalder earned 500 total points
ID: 24440728
Considering the amount of I/O it's creating I'd take it off at least as an experiment. I doubt it was insatlled when the servers were first set up since it's fairly new.
0
 

Author Comment

by:nd2u
ID: 24440753
Will do!
0
 

Author Comment

by:nd2u
ID: 24534321
Problem solved.

Problem causer was the Double Take version for 2008 + 2003.

DT Helpdesk advised me to uninstall this version and install the version for 2003.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

The Samsung SSD 840 EVO and 840 EVO mSATA have a well-known problem with a drop in read performance. I first learned about this in an interesting thread here at Experts Exchange: http://www.experts-exchange.com/Hardware/Storage/Hard_Drives/Q_2852…
We recently endured a series of broadcast storms that caused our ISP to shut us down for brief periods of time. After going through a multitude of tests, we determined that the issue was related to Intel NIC drivers on some new HP desktop computers …
This video Micro Tutorial explains how to clone a hard drive using a commercial software product for Windows systems called Casper from Future Systems Solutions (FSS). Cloning makes an exact, complete copy of one hard disk drive (HDD) onto another d…
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now