Start Free Trial

asked on

Hard Disk access speeds bad?

Hi Experts,

We have 2 HP servers both with two RAID arrays. One is a RAID 5 and the other is RAID 10.
I run CrystalDiskMark 3.0.1 on our RAID 5 array and I am getting the following results:

5 * 1000MB

READ (RAID 5)
--------
SEQ: 175.8 MB
512K: 63.48
4K: 0.957
4K QD: 6.839 (QD=32)

Write (RAID 5)
--------
SEQ: 35.10
512k: 14.73 MB/s
4K: 0.524 MB/s
4K QD: 2.599 MB/s (QD=32)
=======================

READ (RAID 10)
--------
SEQ: 178.2 MB/s
512K: 62.85 MB/s
4K: 0.977 MB/s
4K QD: 9.145 MB/s (QD=32)

Write (RAID 10)
--------
SEQ: 172.5 MB/s
512k: 103.9 MB/s
4K: 4.867 MB/s
4K QD: 5.280 MB/s (QD=32)
=======================

READ (MY PC)
--------
SEQ: 127.2 MB
512K: 41.26
4K: 0.489
4K QD: 1.257 (QD=32)

Write (MY PC)
--------
SEQ: 121.7 MB
512k: 59.43
4K: 0.591
4K QD: 0.578 (QD=32)
======================

Each server also hosts Virtual Machines and when the above tests are run within them the test results are a little worse but I guess that is normal... (HyperV)

I'm extremely shocked the RAID 5 array on 3 Dual Port SAS drives are performing like this?
Is this usual? Am I missing something?

The same controller is hosting a RAID 10 on 4 Dual Port SAS drive array is performing a little better then my local machine.

Is all this normal..?

ASKER

I forgot to add, both servers are HP Proliants DEL 380 G5's.
XEON 3.0Ghz, 14Gb Ram, etc. (Which I beleive has nothing to do with the above results).

Thanks folks!

There are a lot of factors, the rpm of the drives could be a factor as well as the onboard cache, but unless you have an excellent desktop drive, and a very poor sas drive we would expect a little better performance out of the sas of course. You also have to realize that there is overhead for the raid controller, and not all raid controllers are equal. A 3 drive raid 5 does theorhetically could have 3x read performance and 1x write performance - overhead, so if you really want to increase read you need more than 3 drives. This is of course also dependant on type and quality of drives and controller, as well as the machine itself and its busses and capability (in the event that there are other bottlenecks). A Raid 10 will usually provide somewhere around read times N where N=number of drives, and write times N/2. so you should get better performance on the raid 10 definetely which you do see in the write fairly clearly.

Did you shut off all of the vms while running the tests?

so you have both a raid 5 array and a raid 10 array on the same controller and your server is running hyper-v yes

how much ram has your raid controller got ? has it got write back or write through ?

ram 14gb how many virtual machines is the server running in hyper-v

Member_2_231077

A single drive is faster than 3 disks in RAID 5 for write - 4 I/Os per write so 3/4 of the speed, plus by default your PC probably has the OS cache plus disk cache turned on.

As per above, you need a battery to get any performance in RAID 5 on a HP Smart Array controller.

ASKER

Battery Present
Write Back
256mb
3VM's
2gb ram left for OS ( 12gb distributed across vm's)
No av software however VM's were running when testing

ASKER

Oh, and 10k RPM drives..

Thanks guys..!

Member_2_231077

Looks like you'll have to add another disk and migrate to RAID 10, you can do that live if you've got the ACU installed (of course the RAID level migration is more work for the disks so it'll be even slower than now until it's complete).

ASKER

Do you really suggest that..?
I mean is the current RAID 10 results even good?

I'm so confused, that spec slower then my PC? (my pc has 16gb ram btw) VMware tests
should I try with VM's switched off or not really much difference?

Member_2_231077

There is a setting under Windows you can try, the Windows write cache buffer is turned off by default on server (it's on on your PC probably), called Advanced performance on 2008. Workstation OSs are tuned for speed, serverOSs are tuned for data security by default, and that can make a big difference.

http://www.windowsreference.com/windows-server-2008/enable-disk-write-caching-to-improve-performance-in-windows-7-windows-server-2008/

ASKER

Your right it was switched off but when I went to switch it on it came up with
"Windows could not change the write-caching setting for the device. your device might not support this feature or changing this setting"

Member_2_231077

So you can't tick the lower of the two checkboxes - enable advanced performance?

ASKER

MAJOR mistake, the RAID card had no battery.
I just ran the same test on our Database Server which does have a battery and the results are as follows:

READ (RAID 10)
--------
SEQ: 275.0 MB/s
512K: 96.71 MB/s
4K: 1.59 MB/s
4K QD: 8.51 MB/s (QD=32)

Write (RAID 10)
--------
SEQ: 263.5 MB/s
512k: 245.9 MB/s
4K: 10.36 MB/s
4K QD: 10.44 MB/s (QD=32)

Are the above results in line with normal levels with these sort of servers?
Or are they still pretty low?

ASKER

no unfortunately :(

I also tried with the server with the BBWC and that didnt let me either..?

Member_2_231077

Battery shouldn't affect read speed (in fact it should be a fraction faster without a battery since all the cache is read cache in that case) so it might just be the I/Os of your VMs.

4K: 1.59 MB/s - that's 400 IOPS near enough, a reasonable figure.

ASKER

But the read speeds are alrght arent they? its the write speeds?

READ (RAID 5)
--------
SEQ: 175.8 MB
512K: 63.48
4K: 0.957
4K QD: 6.839 (QD=32)

Write (RAID 5)
--------
SEQ: 35.10
512k: 14.73 MB/s
4K: 0.524 MB/s
4K QD: 2.599 MB/s (QD=32)

Your read speeds look ok; your write speeds are very slow. Try it with another benchmark suite - ATTO: http://www.attotech.com/products/product.php?sku=Disk_Benchmark

ASKER

Are the read speeds in line with what they should be for a server of this kind?
I'm going to order the battery for the raid controllers in a few days time.

Are the database server results good?
READ (RAID 10)
--------
SEQ: 275.0 MB/s
512K: 96.71 MB/s
4K: 1.59 MB/s
4K QD: 8.51 MB/s (QD=32)

Write (RAID 10)
--------
SEQ: 263.5 MB/s
512k: 245.9 MB/s
4K: 10.36 MB/s
4K QD: 10.44 MB/s (QD=32)

Your read speeds look normal, and your RAID 10 looks normal, except for the 512K read being slower than the 512K write - that is counterintuitive.

Member_2_231077

Yes, the DB servers results look pretty good. Part no for battery is in the quickspecs - http://h18004.www1.hp.com/products/quickspecs/12477_na/12477_na.html#Additional%20Options , note that you don't need the 24" cable.

Writes are faster than reads since it's buffered by the cache, if you run the test for longer that should fall off a bit as the cache gets full.

ASKER

I'm a little confused, I cant seem to find the battery pack for the 256mb version of our P400.. All I can find is for the 512mb.. There is one article saying that the battery pack also upgrades the cache to 512mb..? what's the battery pack add-on got to do with the cache?

Member_2_231077

Battery-backed write cache upgrade
NOTE: Contains battery only. For Smart Array P400 Controller. 383280-B21

ASKER

Thanks!

Just trying out the software you recommended. I'll get back to you shortly.

The SSD rules the IOPS world.

Some HDD theory points out:

10krpm drive allows 166 plate rotation per second which is an average 330 IOPS (whatever the io size is)

4KB io size x 330 IOPS = 1,3MB/s

512KB io size x 330 IOPS = 168MB/s ... but it is almost incoherent with the SEQuential throughput (about 160MB/s) which means a 10krpm drive can not deliver an avg 330 IOPS for 512KB io size stream...I would expect about half of it. 70-80MB/s should be a good throughput.

Stripped drives allows to add each drive performance less a mgmt small (<5%) penalty

RAID 10 with 4 drives would expect doubling write performance and may more than double the read performance if the controller if smart enough to distribute read io on each drives

RAID 5 is always terrible at random write, this is normal and expected, even a battery backed write back cache won't transform a random stream in a sequential one...it will mainly delay the write io and deliver an apparently good random io throughput

Most VM usage need mainly random io where SSD can sustain 40000 4KB IOPS !
It's time for a move...

ASKER

Haha..! Yep, when the disks fail that is...

Do you think its not worth to buy the battery backup? I've already put quotes in for it?
I was going to do that then expand the array to raid 10..?

p.s. i can mix dual port drives with single port right?

I have absolutely no idea if you can buy a non-HP SSD, install it and still be HP supported...

Regarding dual-ports drives (some SAS SSD exists with dual-ports...more expensive than SATA SSD), they are allowing MPIO/multipath capability but useless for anything else and YES, you should be able to have both dual port and single port drives in your HP server.

If you already have some battery pack running, buying a new one will, at worst, be considered replacing an old battery pack too early...not a big deal. Although, it will effectively enhance your parity arrays (RAID 5/50/6/50) performance for sure...even on SSD.

Member_2_231077

The server's backplane is single ported so the second port doesn't get used, no problem mixing.

ASKER

The server hasnt got a battery, its discussed already above, thats why I'm getting quotes for it.??

When buying the server it came with dual port drives so surely the backplane does support dual port drives..? Maybe the base system your looking at doesnt?

I have a few singleport drives that I'm going to put inside to expand the RAID 5 to RAID 10.. So thats why I am asking if I should go ahead with the battery backup.. To take advantage of the RAID 10 benefits.. and even if I cant expand the RAID 5 the second RAID 10 array can make use of it no?

At worst, the new battery pack may be used to replace the DB server's one even if you think it still work...

RAID 10 arrays won't really get a huge benefit from the write back cache allowed by the battery pack...

ASKER

Ok i'm extremely confused here :s

I thought that battery would help with those awful right speeds on the RAID 5 array (above scored)..? Now the battery wont play any role in improving performance???

Raid 10 does not suffer the Raid 5 read-xor-write penalty.

ASKER

So what exactly are we saying here.. If we look at the top of the thread up till now various comments contradict other comments.. It's all getting way to confusing from the original question.. :)

SOLUTION

Member_2_231077

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

That answer was awesome it tackleked every question... Except one..

Are you sure the server doesn't have dual port back plane why would they sell them with it then? Maybe our specific configuration has got them?

I will update you with the test results from the other application tomorrow too.

Thanks mate we are nearly there.. :)

Member_2_231077

It did cover that in the last sentence - the disk manufacturers don't make single ported SAS disks any more. A dual ported disk is always acceptable in a single port environment so why bother to make both types?

Member_2_231077

You can have dual-ported backplanes of course, but your server doesn't have one. Your backplane is a fairly simple thing with very little electronics on it*, the physical connectors have 4 SAS lanes each and there are two of them so a single lane is wired to each disk. To make it dual ported they could add some tracks and another couple of four lane cables/sockets or add some intelligence on the backplane so that each of the current two connectors connected to all the disks through a SAS expander chip - would make it cost about $100 more. Compaq did that many years ago with an early ProLiant well before HP bought them but the added cost didn't make it sell well. There's not much point in building disk controller redundancy into a server since it's only got one motherboard.

*The intelligent chip on your server's disk backplane is a PSoC and is used to drive the LEDs.

Gerald Connolly

@BigSchmuh - 330 IOPS from a single disk?

@connollyg : I just computed theoretical figures which clearly show that a 10krpm drive allows 166 plate rotation per second which, assuming the read/write io is an instantaneous operation, means an average 330 IOPS...In fact, using a reasonable queue depth, this figure is quite fair for 10krpm drive.....anyway, we should not spend too much time on those legacy drives when a single SSD can deliver more IOPS than 50x 15krpm drives

Member_2_231077

You've missed out something, 166 revs per second = 6ms - but the average is only half a revolution = so 3ms for half a spin, then you have to add 3-4 ms for the average seek time - about 6.5ms total, 1000/6.5 = about 150 IOPS.

@andyalder: I wrote "average 330 IOPS" didn't you notice that 166x2 ~= 330 ? Don't you although think that the head is moving at the SAME time that the platter is rotating ? The lower real IOPS is NOT due to head seeking but to head reading/writing...and we can evaluate that because 160MB/s avg (example sequential throughput) means about 1MB/rotation 512KB/half rotation.

If you go just a inch further, you may conclude that you have 2 "stripe strategy" only:
1/ To maximize IOPS, you MUST use a very small stripe allowing for many seek moves per drive rotation
Defining your stripe as the largest io size usable by your io client is a key driver for IOPS top performance
2/ to maximize random MB/s, you MUST use a small multiple of the largest stripe usable by your io client
Defining your stripe as a multiple of the largest io size usable by your io client is a key driver for random io MB/s top performance

Corollary/
-You should better know what is the largest io size that your io clients can use before defining your RAID arrays
-You should better use a single io size instead of all "io client" available io size
-You should define different stripe size to different usages

Member_2_231077

The platter is indeed spinning when the head is seeking, however once the head reaches the correct track it still has to wait half a revolution on average for the data to be under the head.

ASKER

On that note, what should one be stripping at on database servers that are both heavy on read and right and VMs that are heavy on the writing side..? In my case the RAID 5 for VM and 10 for DB..?

Yet to try that software Andy :)

Lowering the global io completions are exactly what firmware + TCQ/NCQ are about : synchronizing io queries, head moves and drive surface + reordering io to lower the total delay...so I expect to wait for LESS than an avg half platter rotation per io. A third of a rotation would be a pretty realistic goal AMHO.

Regarding our host question, I would expect most VM to suffer from RAID 5 read-xor-write penalty and switch to RAID 10 for the OS/Apps/Swap/Temp. I always try to get away from parity RAID (5/6/50/60) when I expect some (ex: more than 5%) random io.

Member_2_231077

I would leave the stripe size at the default, it's fairly good for most things (which is why it is set that way) and makes data recovery easier since if you accidentally delete an array and then create it again your data's still there, but only if you used the same parameters.

I would NOT leave the stripe size at default for a parity (5/6/50/60) array...
I would keep the stripe default for RAID 1/10 array.

That is although why I dislike parity RAID, they require a strong technical team and, regarding the drive costs, provide low added value.

Member_2_231077

Unsub. Obvious bigshoe knows more than the controller's designers.

ASKER

What is the default exactly?
I'm going to check what I did for the RAID 10 as in sure ichanfwd that container value. I'll check and get back to you. The RAID 5 is default. I'll check what that is tomorrow but what is the default according to your advice above??

@andyalder: The controller's designers offer a way to set the stripe size...don't you think they choose to let us set this stripe size BECAUSE they just can't optimize this size for every usage ? I am just trying to value their choice...

Many hw raid controllers use a 64KB default stripe size...

ASKER

RAID 10 is just so dam expensive.
We only recently received 8 drives from a server decommissioning 3 weeks ago. They are single port 10k drives which I suppose we could use to expand our RAID 5 arrays.
I guess that will help a little.

We're still going with the battery right?

Results will be up tomorrow experts! :)

ASKER

When you say they can't optimize for every use does that mean that changes at the raid controller strip size should go in hand with the file cluster size? I've read a few articles now explaining the small strip sizes are legacy days.. I'll dig for the sites.

You have 2 "stripe strategy" only:
1/ To maximize IOPS, you MUST use a very small stripe allowing for many seek moves per drive rotation
Defining your stripe as the largest io size usable by your io client is a key driver for IOPS top performance

2/ to maximize random MB/s, you MUST use a small multiple of the largest stripe usable by your io client
Defining your stripe as a multiple of the largest io size usable by your io client is a key driver for random io MB/s top performance

If your "io client" is NTFS, knowing that most NTFS volumes are using 4KB cluster size backed by a 64KB buffer and most files are >64KB and accessed sequentially, you can use any stripe size from 64KB (My choice for RAID5) to 512KB (My choice for RAID 10)
If your "io client" is MS-SQL, knowing that all pages are 8KB and that a lot of random access are used, you may build a 8KB stripe RAID5 array (not my choice) or a 64KB-128KB stripe RAID 10
If your "io client" is a video streaming app relying on a Linux box, you should go with a 512KB RAID 5

Parity RAID 5/6/50/60 can't be seriously used without a write back cache backed by a battery.

Gerald Connolly

@andyalder - "You've missed out something, 166 revs per second = 6ms - but the average is only half a revolution = so 3ms for half a spin, then you have to add 3-4 ms for the average seek time - about 6.5ms total, 1000/6.5 = about 150 IOPS."

I'm with you on that one!

@bigshoe - latency of a 1/3 rotation might be an aim, but the stats tell us that 1/2 is what the average is, all the write collapsing that takes place in the OS, in the Controller and in the disk electronics do is to reduce the IOPS demand, but its always limited by the max IOPS that a drive is capable of and 150 is a resonable number to work with.

@corrolluig & andyaldor : I can agree that without NCQ/TCQ io reordering feature, all 10krpm HDD can't go further 150 IOPS...but in real life you have some queue depth that slightly delay some io to deliver a high IOPS
==> This Velociraptor 600GB 10krpm benchmark from Anandtech shows up to 322 IOPS for the WD VR 600GB...with an avg 3.6 io queued.

Gerald Connolly

Are they measuring application IOPS or the real disk IOPS? (We have already discussed that write gathering/collapsing can occur in the OS/Controller/Drive)

ASKER

OK, i'm going to buy the battery.
This is going to help with the write-back on the RAID 10 results right..?
This site is highlighting small strip sizes are old school logic and strip size should be 1meg plus.I'll be building this array next week. Can I get your final opinions on this as I will work with your's opinions not with the below site:

http://www.webhostingtalk.com/archive/index.php/t-1046069.html

"Personally, I do at least a 1MB stripe, preferably 2MB if it's available, whenever possible. There's little or no downside to having the stripe be that big in most or all access patterns, and in many typical access patterns, a stripe that big will give you a huge performance boost compared to, say, a 64-256k stripe size. Now this is where I fall short in my knowledge, why do a lot of places do stripes in the size of 64k-256k and not just use 2MB? Is there a reason behind it or is it something from years ago when that was "normal" and "good" that has just since stuck around?"

Further down:

1MB stripe, preferably 2MB if it's available
Yes, both are available but you are sure that is not too much, i won't host big file on my server !

Now this is where I fall short in my knowledge, why do a lot of places do stripes in the size of 64k-256k and not just use 2MB? Is there a reason behind it or is it something from years ago when that was "normal" and "good" that has just since stuck around?

and the list goes on and on...

by the way, what are hard faults...?

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Member_2_231077

BigSchmuh, you have stretched this thread so far from the original problem that I've unsubscribed, if dqnet wants recommendations for stripe size they can ask another question IMHO.

@andyalder: I do not understand why you left the thread... if a HDD 10krpm can't theoretically sustain more than 150 IOPS but a benchmark shows that it can sustain 320 IOPS, IMHO you should stay and debate. The stripe size design goals are well commented and you can still debate about it.

Member_2_231077

At 57 comments I don't think I will stretch the thread any more thank you, initial question is why is performance so poor and was answered in first few posts - because they don't have a cache battery. Everything else is secondary.

The stripe size accounts for under-performance as well...I don't thing this a secondary point.

ASKER

@andyyalder, I'm just learning. Every post going up is actually giving me a deeper insight into my problem and how I can improve. Nothing wrong with getting this long is there?
Points assigned it's become extremely interesting to learn....

Member_2_231077

Well, once you've fitted the cache battery you can change the stripe size on the fly to experiment, that'll slow it down during the re-striping of course but you can set the expansion priority to low so it doesn't have much effect on production.

The stripe element size should match the application blocksize, e.g. MS SQL likes 64K stripe elements but multiples of that size are also just about as fast. The one exception to that is when something's writing or reading very large blocks in which case you want it to split a single huge write or read over all available spindles in a RAID 3 type way in which case the full width stripe should match the application block size but that's not very common. When you're running a general application server such as a VMware host though you can't match the application block size because there are multiple apps all using different sizes which is why I would leave it at default. Set it too big and you'll run out of cache.

Now take a look at http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02249094/c02249094.pdf and see whether you can make sense of whether HP are referring to the stripe element size or the full width stripe under the section headed "Disk Striping and Performance", it's pretty badly worded to me.

Member_2_231077

BigSchmuh et al, as an asside please post on https://www.experts-exchange.com/questions/27669737/RAID-what-does-strip-size-mean.html - it's all about stripping ;)

Gerald Connolly

@andy :-)

Stripping as in taking off clothes

- or -

striping as in across spindles

:-)

ASKER

haha! :)

Ok sure. One last thing before we wrap up...
Is it actually possible to change the Strip size after the battery without affecting the current data? Or do I have to re-create the array? I'm a little confused as to what you mean:
"Change the strip size on the fly.."

And yes, I will be expanding those RAID 5 Array's to 10 once I receive the battery.

Member_2_231077

It does it live without affecting the data. It has to shuffle the data about quite a bit so it uses the cache to temporarily store data off the disks which is why it is only enabled with battery/flash backup.

Gerald Connolly

I think it might be a good idea to back it up beforehand though!

ASKER

You guys are the best!
Just on a mobile device.. Will assign points tomorrow morning..

Top quality answers!!
Thanks again!

ASKER

By the way, if you got a few minutes to spare.
How comes that option for write-back caching is disabled on my servers.
They wont allow me to turn it on even on the server WITH the battery backed cache..?

Promise that's it :)

Member_2_231077

It should be temporarily disabled until the battery is charged which takes about 3 hours.

ASKER

Yes, but even on our G7 and our G5 which has battery backed controller, the option isnt available..? Everytime you try to switch it on it says

"Windows could not change the write-chaching settings for the device. Your device might not support this feature or changing the setting."

Member_2_231077

That's correct, Windows has no control over the setting, it's configured through the ACU. As far as Windows is concerned it thinks it's off although it's on really. I'd rather the driver allowed Windows to change the setting (then simply ignore it) just so that you could change the Windows cache setting checkbox below the disk cache setting.

Sometimes, using the "Rescan disk" from the Action menu in the Disk management console (DiskMgmt.msc) may refresh the Windows parameters allowing to correctly set your write cache strategy.

Hi dqnet,

I am currently in the same predicament as you were in when you posted this question.

Did you get the BBWC in the end ?

What difference did it make to the RAID 10 write and read speeds ?

Was it easy to fit and then switch on write caching on the controller ?

Hope you help.

Regards

ASKER

Hello Zoltan,

Yes, I did indeed purchase the BBWC for my servers that run both a RAID 5 and a RAID 10 array set. The speeds have definitely increased and I can post the results for you tomorrow if you are still interested?

My db server (RAID 10 array) already had BBWC and those results are above:
(275mb/S and 263mb/S respectively). So no change on that front.

Kind Regards,
dqnet

Member_2_231077

@ zoltan9992000,

The Windows setting is irrelevant, on some versions it can be changed but it doesn't actually change anything on the controller, the real cache settings are done through the ACU.