AXISHK
asked on
VMware Performance
Report show that the disk go up to 12,000 KBps (12MB) . Is it reasonable on a 146GB x 8 (RAID-5 in 10K) ? The RAID Controller is P400i. What is the theoretical throughput ?
Tks
Tks
Writes are very slow on RAID-5 as they involve reading of stripe from all disks and writing stripe back to two disks.
vmware prefers 64k block size so 4/8k stripe size will be optimal (but still slow) for 8-disk array.
if write performance is of concern use RAID-1+0 or something like that. 300Gb space lost that way should be worth it.
vmware prefers 64k block size so 4/8k stripe size will be optimal (but still slow) for 8-disk array.
if write performance is of concern use RAID-1+0 or something like that. 300Gb space lost that way should be worth it.
ASKER
speed about 10MB per drive .. for 8 RAID-5 drive, can I say the total througput can be estimated as 7 * 10MB =70MB as max 7 drive can provide data.
Hence, the max peak is 12MB. Hence, the disk shouldn't have problem. However, I have the following logged in HP.
2014-01-29T18:03:57.575Z cpu3:32788)WARNING: ScsiDeviceIO: 1223: Device mpx.vmhba1:C0:T0:L0 performance has deteriorated. I/O latency increased from average value of 31890 microseconds to 958539 microseconds.
Hence, the max peak is 12MB. Hence, the disk shouldn't have problem. However, I have the following logged in HP.
2014-01-29T18:03:57.575Z cpu3:32788)WARNING: ScsiDeviceIO: 1223: Device mpx.vmhba1:C0:T0:L0 performance has deteriorated. I/O latency increased from average value of 31890 microseconds to 958539 microseconds.
1s is hell a lot of latency
33333 (aka 30 IOPS) is maximum latency acceptable for vmware per vm
A single 10K drive can do 160 writes per second. 8 in RAID0 could do 8 times
8 disk RAID5 can write 20 IOPS except if it cheats with write caching.
33333 (aka 30 IOPS) is maximum latency acceptable for vmware per vm
A single 10K drive can do 160 writes per second. 8 in RAID0 could do 8 times
8 disk RAID5 can write 20 IOPS except if it cheats with write caching.
What disks are you using. Does the P400i have battery (or flash) backed cache and is the battery working since these die after about 4 years. RAID 5 performance it horrendous with a dead battery.
I would disagree with using (very small) 8k strip size, if something requests data in 64K chunks then you should use 64K or larger stripe element size. Splitting it up so that all 8 disks are involved in any I/O may speed up reading the first chunk but if you request two blocks you have to wait until the first has been read before the disks can get the second. If you used 64K instead then 1 disk will do the first read leaving the others free to get the second block at the same time.
RAID 5 writes don't involve reading all disks, the controller reads the data and parity disks, XORs this with the new data to generate new parity, then writes the new data and parity. Therefore 4 physical I/Os for one logical one.
I would disagree with using (very small) 8k strip size, if something requests data in 64K chunks then you should use 64K or larger stripe element size. Splitting it up so that all 8 disks are involved in any I/O may speed up reading the first chunk but if you request two blocks you have to wait until the first has been read before the disks can get the second. If you used 64K instead then 1 disk will do the first read leaving the others free to get the second block at the same time.
RAID 5 writes don't involve reading all disks, the controller reads the data and parity disks, XORs this with the new data to generate new parity, then writes the new data and parity. Therefore 4 physical I/Os for one logical one.
vmware writes in 64K blocks, that should be 1+2/10 of a 8K stripe on disk (or a waste of IO if stripe is bigger)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
8 disks 8K stripe = 6xdata+1xcsum+1xspare=48K
64K stripe=... you guess
64K stripe=... you guess
See http://blogs.technet.com/b/evand/archive/2004/10/14/242127.aspx for another example, Exchange 2003 writes in 32K blocks so Microsoft recommend 32K stripe element size for it.
(I put "RAID 10 and RAID 10 writes" above, I meant "RAID 10 and RAID 5 writes" both those RAID levels tie up two disks to do a single write.)
(I put "RAID 10 and RAID 10 writes" above, I meant "RAID 10 and RAID 5 writes" both those RAID levels tie up two disks to do a single write.)
RAID5 readsstipe from all but one disks and then writes to 2
No it doesn't. It reads 2 then writes 2. The RAID 5 write penalty is 4 physical I/Os for one logical I/O. If you need an explanation as to why it does not read all the data disks then you can ask a new question about it (you don't need to know what's on the data disks, only what the result of XORing them together and you can work that out from the parity disk because it already has a previous XOR calculation residing on it).
so you get 20k rpms-> 300iops... should make it 20MB/s write...
Indeed, and 20MB/s is pretty conservative. He's probably got 7.2K disks and a dead cache battery. One of the disks may even be faulty with latency suddenly jumping up to 1 second.
He gets less....
Asker:
Please boot from spp_2014.02 and upgrade firmware and run Array configuration utility (you can mount ISO from ILO)
Check if RAID battery is working. Remember that it is charging for 6h and you have no write cache after server was off for a hour (hic)
Set RAID background checks priority to lowest,,,
OK, migrate RAID strip to smallest (8k or 64k depending on array- it sometimes kicks in background parity recalculation. If you select very wrong options like change raid level that provides less space it says it will wipe data... yes it will if you accept - be very careful strip resize is not possible in all array controllers)
boot, check ESXi hardware status and when battery is charged redo the benchmark.
If starting at 8k go up step at a time - reboot, wait, benchmark...
(probably it is 32k or 64k, but if you have small random writes it must be less)
Asker:
Please boot from spp_2014.02 and upgrade firmware and run Array configuration utility (you can mount ISO from ILO)
Check if RAID battery is working. Remember that it is charging for 6h and you have no write cache after server was off for a hour (hic)
Set RAID background checks priority to lowest,,,
OK, migrate RAID strip to smallest (8k or 64k depending on array- it sometimes kicks in background parity recalculation. If you select very wrong options like change raid level that provides less space it says it will wipe data... yes it will if you accept - be very careful strip resize is not possible in all array controllers)
boot, check ESXi hardware status and when battery is charged redo the benchmark.
If starting at 8k go up step at a time - reboot, wait, benchmark...
(probably it is 32k or 64k, but if you have small random writes it must be less)
ASKER
Sorry, need some more clarification. Indeed, there are a lot of useful information which I haven't considered them yet.
#My server is HP DL380G5 server. Does this one support ?
http://h17007.www1.hp.com/us/en/enterprise/servers/products/service_pack/spp/index.aspx
Or, I should use this one ?
http://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/swdDetails/?javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken&javax.portlet.prp_bd9b6997fbc7fc515f4cf4626f5c8d01=wsrp-navigationalState%3Didx%253D%257CswItem%253DMTX_9ed665a89aba447d925937f38b%257CswEnvOID%253D4115%257CitemLocale%253D%257CswLang%253D%257Cmode%253D%257Caction%253DdriverDocument&javax.portlet.tpst=bd9b6997fbc7fc515f4cf4626f5c8d01&sp4ts.oid=1157693&ac.admitted=1395381536056.876444892.492883150
#2 Is there a way to check the RAID Battery ? Using HP Diagnosis CD ?
#3 Can we tell the max KBps for my current 8x RAID-5 in 10K
#4 Some clarification for the dicussion above.
A single 10K drive can do 160 writes per second. 8 in RAID0 could do 8 times ->
8 * 160 = 1280 IOPS/s , am I correct ?
8 disk RAID5 can write 20 IOPS except if it cheats with write caching. -> why RAID 5 is lower to 20 IOPS ?
33333 microseconds is equivalent to 30 IOPS > Is there a relationship between IOPS and microsecond
"vmware writes in 64K blocks, that should be 1+2/10 of a 8K stripe on disk " -> What's the meaning ?
"8 disks 8K stripe = 6xdata+1xcsum+1xspare=48K" -> What's the mean ?
#My server is HP DL380G5 server. Does this one support ?
http://h17007.www1.hp.com/us/en/enterprise/servers/products/service_pack/spp/index.aspx
Or, I should use this one ?
http://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/swdDetails/?javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken&javax.portlet.prp_bd9b6997fbc7fc515f4cf4626f5c8d01=wsrp-navigationalState%3Didx%253D%257CswItem%253DMTX_9ed665a89aba447d925937f38b%257CswEnvOID%253D4115%257CitemLocale%253D%257CswLang%253D%257Cmode%253D%257Caction%253DdriverDocument&javax.portlet.tpst=bd9b6997fbc7fc515f4cf4626f5c8d01&sp4ts.oid=1157693&ac.admitted=1395381536056.876444892.492883150
#2 Is there a way to check the RAID Battery ? Using HP Diagnosis CD ?
#3 Can we tell the max KBps for my current 8x RAID-5 in 10K
#4 Some clarification for the dicussion above.
A single 10K drive can do 160 writes per second. 8 in RAID0 could do 8 times ->
8 * 160 = 1280 IOPS/s , am I correct ?
8 disk RAID5 can write 20 IOPS except if it cheats with write caching. -> why RAID 5 is lower to 20 IOPS ?
33333 microseconds is equivalent to 30 IOPS > Is there a relationship between IOPS and microsecond
"vmware writes in 64K blocks, that should be 1+2/10 of a 8K stripe on disk " -> What's the meaning ?
"8 disks 8K stripe = 6xdata+1xcsum+1xspare=48K"
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
If you want to upgrade the firmware then use the firmware DVD that you have posted a long URL for.
Boot from SmartStart CD and run the ACU to get the battery status. You can also generate a diagnostic report and upload it here **as an attachment** and I'll read through it.
Ignore gheist on the strip size, you need to optimise for multiple random I/Os and yet he is giving you the optimisation for single thread streaming such as video. It isn't even right that VMware uses 64K I/Os, it stores the data in 64K sub-blocks but the actual read and write requests are whatever the application asks for. That can be seen from http://www.vmware.com/pdf/esx3_partition_align.pdf
"The size of the data transfer depends on the application and is often a range rather than a single value. For Microsoft Exchange, the I/O size is generally small (from 4KB to 16KB), Microsoft SQL Server database random read and write accesses are 8KB, Oracle accesses are typically 8KB, and Lotus Domino uses 4KB. On the Windows platform, the I/O transfer size of an application can be determined using Perfmon."
You can also see it in the results graph at the bottom of that whitepaper.
There is very little penalty for using a strip size greater than the optimum (you might run out of write cache on the controller quicker but I can't verify that), there is a large penalty for using a strip size smaller than the optimum since one I/O gets split across disks instead of being served from a single disk. The controller default values are normally pretty good on Smart Array controllers.
Boot from SmartStart CD and run the ACU to get the battery status. You can also generate a diagnostic report and upload it here **as an attachment** and I'll read through it.
Ignore gheist on the strip size, you need to optimise for multiple random I/Os and yet he is giving you the optimisation for single thread streaming such as video. It isn't even right that VMware uses 64K I/Os, it stores the data in 64K sub-blocks but the actual read and write requests are whatever the application asks for. That can be seen from http://www.vmware.com/pdf/esx3_partition_align.pdf
"The size of the data transfer depends on the application and is often a range rather than a single value. For Microsoft Exchange, the I/O size is generally small (from 4KB to 16KB), Microsoft SQL Server database random read and write accesses are 8KB, Oracle accesses are typically 8KB, and Lotus Domino uses 4KB. On the Windows platform, the I/O transfer size of an application can be determined using Perfmon."
You can also see it in the results graph at the bottom of that whitepaper.
There is very little penalty for using a strip size greater than the optimum (you might run out of write cache on the controller quicker but I can't verify that), there is a large penalty for using a strip size smaller than the optimum since one I/O gets split across disks instead of being served from a single disk. The controller default values are normally pretty good on Smart Array controllers.
OKok, still strip size >64k makes little sense and wastes controller ram
Do you have that in writing anywhere about it wasting cache? I had wondered about that in the past but couldn't find anything to verify whether the controller reserved a whole strip size of RAM when it has been asked to do a 512 byte write or not. I suspect it manages the cache better than that.
ASKER
Tks
Controller always reads/writes in full strips
I don't think so and since you've been wrong on most other points in this thread I'll assume you're wrong on this one unless you can find a few URLs to back yourself up.
- Read speed is very faster than write on raid-5. If mostly operations are writes, raid-5 may not fit.
- With 2 or more drives failed, raid-5 causes total lost.
Think on raid-6 if possible, you can have 2 drives failed with a little lost of throughput.