Solved

VMware Performance

Posted on 2014-03-21
22
842 Views
Last Modified: 2014-04-01
Report show that the disk go up to 12,000 KBps (12MB) . Is it reasonable on a 146GB x 8 (RAID-5 in 10K) ? The RAID Controller is P400i. What is the theoretical throughput ?

Tks
0
Comment
Question by:AXISHK
  • 10
  • 8
  • 3
  • +1
22 Comments
 
LVL 19

Expert Comment

by:Miguel Angel Perez Muñoz
Comment Utility
According with http://wintelguy.com/raidperf.pl and speed about 10MB per drive (very conservative), throughput may be about 50MB. But think on:

- Read speed is very faster than write on raid-5. If mostly operations are writes, raid-5 may not fit.
- With 2 or more drives failed, raid-5 causes total lost.

Think on raid-6 if possible, you can have 2 drives failed with a little lost of throughput.
0
 
LVL 61

Expert Comment

by:gheist
Comment Utility
Writes are very slow on RAID-5 as they involve reading of stripe from all disks and writing stripe back to two disks.

vmware prefers 64k block size so 4/8k stripe size will be optimal (but still slow) for 8-disk array.
if write performance is of concern use RAID-1+0 or something like that. 300Gb space lost that way should be worth it.
0
 

Author Comment

by:AXISHK
Comment Utility
speed about 10MB per drive .. for 8 RAID-5 drive, can I say the total througput can be estimated as 7 * 10MB =70MB as max 7 drive can provide data.

Hence, the max peak is 12MB. Hence, the disk shouldn't have problem. However, I have the following logged in HP.


2014-01-29T18:03:57.575Z cpu3:32788)WARNING: ScsiDeviceIO: 1223: Device mpx.vmhba1:C0:T0:L0 performance has deteriorated. I/O latency increased from average value of 31890 microseconds to 958539 microseconds.
0
 
LVL 61

Expert Comment

by:gheist
Comment Utility
1s is hell a lot of latency
33333 (aka 30 IOPS) is maximum latency acceptable for vmware per vm

A single 10K drive can do 160 writes per second. 8 in RAID0 could do 8 times
8 disk RAID5 can write 20 IOPS except if it cheats with write caching.
0
 
LVL 55

Expert Comment

by:andyalder
Comment Utility
What disks are you using. Does the P400i have battery (or flash) backed cache and is the battery working since these die after about 4 years. RAID 5 performance it horrendous with a dead battery.

I would disagree with using (very small) 8k strip size, if something requests data in 64K chunks then you should use 64K or larger stripe element size. Splitting it up so that all 8 disks are involved in any I/O may speed up reading the first chunk but if you request two blocks you have to wait until the first has been read before the disks can get the second. If you used 64K instead then 1 disk will do the first read leaving the others free to get the second block at the same time.

RAID 5 writes don't involve reading all disks, the controller reads the data and parity disks, XORs this with the new data to generate new parity, then writes the new data and parity. Therefore 4 physical I/Os for one logical one.
0
 
LVL 61

Expert Comment

by:gheist
Comment Utility
vmware writes in 64K blocks, that should be 1+2/10 of a 8K stripe on disk (or a waste of IO if stripe is bigger)
0
 
LVL 55

Accepted Solution

by:
andyalder earned 250 total points
Comment Utility
Definitely not a good to use a stripe element size that's smaller than the I/O size, you want a whole block to fit on a single disk if possible.

See http://support.dell.com/support/systemsinfo/document.aspx?s=biz&~file=/systems/md3000/en/cli/html/scriptcm.htm for example -

"the virtual disk is in an environment where a single user is transferring large units of data (such as multimedia), performance is maximized when a single data transfer request is serviced with a single data stripe (a data stripe is the segment size that is multiplied by the number of physical disks in the volume group that are used for data transfers). In this case, multiple physical disks are used for the same request, but each physical disk is accessed only once. For optimal performance in a multiuser database or file system storage environment, set your segment size to minimize the number of physical disks that are required to satisfy a data transfer request."

That's from a Dell doc but it applies to any storage. your suggestion is for the first instance which is for large file transfers, it's not what you use with small random data. VMware data is in the same random format as the second instance much like a multiuser database. The segment size (strip size in HP lingo, note strip not stripe) (stripe element size in more precise lingo) should not be smaller than the size of the typical I/Os. http://www.dell.com/downloads/global/products/pvaul/en/powervault-md3200-array-tuning.pdf says the same thing and I can find loads more if you want.

Once you see the error in "Writes are very slow on RAID-5 as they involve reading of stripe from all disks and writing stripe back to two disks" you may realise why the stripe element size should match the I/O sise, it's so that only one single disk (or pair in the case of RAID10 and RAID 10 writes) is tied up with a single I/O leaving the other disks to satisfy other requests.
0
 
LVL 61

Expert Comment

by:gheist
Comment Utility
8 disks 8K stripe = 6xdata+1xcsum+1xspare=48K
64K stripe=... you guess
0
 
LVL 55

Expert Comment

by:andyalder
Comment Utility
See http://blogs.technet.com/b/evand/archive/2004/10/14/242127.aspx for another example, Exchange 2003 writes in 32K blocks so Microsoft recommend 32K stripe element size for it.

(I put "RAID 10 and RAID 10 writes" above, I meant "RAID 10 and RAID 5 writes" both those RAID levels tie up two disks to do a single write.)
0
 
LVL 61

Expert Comment

by:gheist
Comment Utility
RAID5 readsstipe from all but one disks and then writes to 2
0
 
LVL 55

Expert Comment

by:andyalder
Comment Utility
No it doesn't. It reads 2 then writes 2. The RAID 5 write penalty is 4 physical I/Os for one logical I/O. If you need an explanation as to why it does not read all the data disks then you can ask a new question about it (you don't need to know what's on the data disks, only what the result of XORing them together and you can work that out from the parity disk because it already has a previous XOR calculation residing on it).
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 61

Expert Comment

by:gheist
Comment Utility
so you get 20k rpms-> 300iops... should make it 20MB/s write...
0
 
LVL 55

Expert Comment

by:andyalder
Comment Utility
Indeed, and 20MB/s is pretty conservative. He's probably got 7.2K disks and a dead cache battery. One of the disks may even be faulty with latency suddenly jumping up to 1 second.
0
 
LVL 61

Expert Comment

by:gheist
Comment Utility
He gets less....
Asker:
Please boot from spp_2014.02 and upgrade firmware and run Array configuration utility (you can mount ISO from ILO)

Check if RAID battery is working. Remember that it is charging for 6h and you have no write cache after server was off for a hour (hic)

Set RAID background checks priority to lowest,,,

OK, migrate RAID strip to smallest (8k or 64k depending on array- it sometimes kicks in background parity recalculation. If you select very wrong options like change raid level that provides less space it says it will wipe data... yes it will if you accept - be very careful strip resize is not possible in all array controllers)
 boot, check ESXi hardware status and when battery is charged redo the benchmark.
If starting at 8k go up step at a time - reboot, wait, benchmark...
(probably it is 32k or 64k, but if you have small random writes it must be less)
0
 

Author Comment

by:AXISHK
Comment Utility
Sorry, need some more clarification. Indeed, there are a lot of useful information which I haven't considered them yet.

#My server is HP DL380G5 server. Does this one support ?
http://h17007.www1.hp.com/us/en/enterprise/servers/products/service_pack/spp/index.aspx

Or, I should use this one ?

http://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/swdDetails/?javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken&javax.portlet.prp_bd9b6997fbc7fc515f4cf4626f5c8d01=wsrp-navigationalState%3Didx%253D%257CswItem%253DMTX_9ed665a89aba447d925937f38b%257CswEnvOID%253D4115%257CitemLocale%253D%257CswLang%253D%257Cmode%253D%257Caction%253DdriverDocument&javax.portlet.tpst=bd9b6997fbc7fc515f4cf4626f5c8d01&sp4ts.oid=1157693&ac.admitted=1395381536056.876444892.492883150

#2 Is there a way to check the RAID Battery ? Using HP Diagnosis CD ?

#3  Can we tell the max KBps for my current 8x RAID-5 in 10K

#4 Some clarification for the dicussion above.

A single 10K drive can do 160 writes per second. 8 in RAID0 could do 8 times ->
8 * 160 = 1280 IOPS/s , am I correct ?

8 disk RAID5 can write 20 IOPS except if it cheats with write caching. -> why RAID 5 is lower to 20 IOPS ?

33333 microseconds is equivalent to 30 IOPS  > Is there a relationship between IOPS and microsecond

"vmware writes in 64K blocks, that should be 1+2/10 of a 8K stripe on disk "    -> What's the meaning ?

"8 disks 8K stripe = 6xdata+1xcsum+1xspare=48K"      -> What's the mean ?
0
 
LVL 61

Assisted Solution

by:gheist
gheist earned 250 total points
Comment Utility
1) yes, you just boot manual FW update to see which components get updated. It includes for sure update for iLO3 for latest JDK compatibility, and broadcom netcard firmware that adds some offload functionality to the card.
2) boot spp in "manual mode" and it will let you run array control utility where you can see RAID status, or run insight diagnostics for most system components including RAID
3) single 10k device can do write per revolution, i.e 277 writes/s
RAID5 does 4 or n+1 accesses per single write thus it is 4 ot n+1 times slower
IO-s per second
1 IO/s means 1s it takes to wite a sector
30 IO/s ... .033s or 33ms 0r 33000us
--
it means that there are two defining parameters that impact disk performance
1) disk revolutions per second that equals how many random writes it can do in that time
2) Physical media speed - i.e how fast it can rip data from disk

64k is the minimal disk block that vmware reads. You dont need to optimize for smaller.
if raidblocksize is bigger that would make raid read more
if it is smaller read can be done from multiple disks
In first case you waste lot of raid controller memory usually usable at least as a read cache.

which means that if you select raid strip in ACU it is per disk, it makes RAID access disks with this size of blocks, and smallest blocksize of 8K sums up to 56K (no spare) or 48K (1spare disk) at the moment all disks are involved to serve your command
0
 
LVL 55

Expert Comment

by:andyalder
Comment Utility
If you want to upgrade the firmware then use the firmware DVD that you have posted a long URL for.

Boot from SmartStart CD and run the ACU to get the battery status. You can also generate a diagnostic report and upload it here **as an attachment** and I'll read through it.

Ignore gheist on the strip size, you need to optimise for multiple random I/Os and yet he is giving you the optimisation for single thread streaming such as video. It isn't even right that VMware uses 64K I/Os, it stores the data in 64K sub-blocks but the actual read and write requests are whatever the application asks for. That can be seen from http://www.vmware.com/pdf/esx3_partition_align.pdf

"The size of the data transfer depends on the application and is often a range rather than a single value. For Microsoft Exchange, the I/O size is generally small (from 4KB to 16KB), Microsoft SQL Server database random read and write accesses are 8KB, Oracle accesses are typically 8KB, and Lotus Domino uses 4KB. On the Windows platform, the I/O transfer size of an application can be determined using Perfmon."

You can also see it in the results graph at the bottom of that whitepaper.

There is very little penalty for using a strip size greater than the optimum (you might run out of write cache on the controller quicker but I can't verify that), there is a large penalty for using a strip size smaller than the optimum since one I/O gets split across disks instead of being served from a single disk. The controller default values are normally pretty good on Smart Array controllers.
0
 
LVL 61

Expert Comment

by:gheist
Comment Utility
OKok, still strip size >64k makes little sense and wastes controller ram
0
 
LVL 55

Expert Comment

by:andyalder
Comment Utility
Do you have that in writing anywhere about it wasting cache? I had wondered about that in the past but couldn't find anything to verify whether the controller reserved a whole strip size of RAM when it has been asked to do a 512 byte write or not. I suspect it manages the cache better than that.
0
 

Author Closing Comment

by:AXISHK
Comment Utility
Tks
0
 
LVL 61

Expert Comment

by:gheist
Comment Utility
Controller always reads/writes in full strips
0
 
LVL 55

Expert Comment

by:andyalder
Comment Utility
I don't think so and since you've been wrong on most other points in this thread I'll assume you're wrong on this one unless you can find a few URLs to back yourself up.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

HOW TO: Install and Configure VMware vSphere Hypervisor 6.5 (ESXi 6.5), Step by Step Tutorial with screenshots. From Download, Checking Media, to Completed Installation.
HOW TO: Connect to the VMware vSphere Hypervisor 6.5 (ESXi 6.5) using the vSphere (HTML5 Web) Host Client 6.5, and perform a simple configuration task of adding a new VMFS 6 datastore.
Teach the user how to use configure the vCenter Server storage filters Open vSphere Web Client:  Navigate to vCenter Server Advanced Settings: Add the four vCenter Server storage filters: Review the advanced settings: Modify the values of the four v…
This Micro Tutorial walks you through using a remote console to access a server and install ESXi 5.1. This example is showing remote access and installation using a Dell server. The hypervisor is the very first component of your virtual infrastructu…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now