RAID Stripe Size recommendataion for VMWare ESXi 5.1?


I am creating a RAID logical disk RAID 5 for VMWare ESXi 5.1. (I will convert to Raid 6 later).

Anyway I wanted to get a recommendataion for how large I should set the Stripe size to be.

The logical disks would be holding virtual machines and made up of 3 x 4 TB sata disks initially.

So I am looking for peoples ideas on the recommended setting.


Who is Participating?

Improve company productivity with a Business Account.Sign Up

DavidConnect With a Mentor PresidentCommented:
No, the controller default could be anything.   The optimal stripe size is the one that results in the least amount of physical disk I/O, as that is measured in milliseconds, and cached I/Os are microseconds to nanoseconds, depending on the cache.

ESXi makes I/O requests of size X based on the block count of the logical volume.   The larger the volume size, the larger the request.   Get the number too big and you could easily waste 10X or more of your I/O.

If you have a 4-disk RAID 5 where controller reads n blocks at a time, then if your I/O request is for an even multiple, like 4n blocks, then no matter what, it is always going to be inefficient, because you have to read all disks once, then another disk twice.

But if you had a 5-disk RAID5, then read 4n blocks, you read each disk only once, so can get it in one pass.   If you had a chunk size of 2n and had that same I/O request on a 4-drive RAID6 then you only had to read that data from 2 drives.

If you had 2 x RAID1 then doesn't matter what n is because you can always read n, or 1/2 n, or anything else and

Now for writes, here is where you get the big hit, because the low-end controllers do chunk-level parity on RAID5.  So if you have to write 1 byte of data, and chunk size is 64KB, and 4 disks, then no matter what, you HAVE to read 64x3 KB, then write 64x2KB.    But if this was a RAID1, then you could very well end up having to write only 2x2KB, and the 2nd write would always be in the background.

Knowing true I/O characteristics and understanding the ramifications of the RAID level, topology,and controller is much more involved than what most people think.  But then again, most people don't develop RAID controller software/firmware/diagnostics like I do, either.

The RAID controller default can't possibly be optimal because it is unaware of whether your I/O is more random, sequential, large block, or small block.  An esxi system is pretty much large block random I/O.  That means little benefit of cache, so you want to optimize for as little I/O as possible.  Multiple RAID1s is going to be best for all types of I/O. But chunk size needs to be tuned to match esxi.
Larry Struckmeyer MVPCommented:

Three disks in RAID5 or 4 disks in RAID6 is really no more fault tolerant than RAID1.  You do gain a little better write speed, but you loose a little read speed.  Yes, with RAID6 you could lose 2 disks without suffering any data loss, but RAID1 with a hot spare would do the same.

Worse, loss of one disk in RAID5, or two disks in RAID6 would cause the system to react very slowly as it recreates the info from the parity bit.  RAID 1 does not have such limitations.

The stripe size is best left at the default for the controller you have, but if you must modify it it depends on the use.  Frequent loading and saving of the encyclopedia is different from high volume SQL transactional data entry.  afaik, for VM use it won't make any difference.
Bottom line, no matter what stripe size you use, performance will be awful if you have any significant number of writes in a RAID5.  Worst possible case scenario for a RAID5 is also a 4-disk raid 5.   If you so much as write 1 byte in a file, each disk drive will have to go through a minimum of 2 I/Os.  

Do yourself a favor and buy one more disk and go to a RAID10.  In general, your I/O will be at least twice as fast.  Why?  Every byte of data is always going to be in two places. With RAID5 not only is your data guaranteed to be in only one place, but the XOR parity is going to hold up I/O requests with additional reads and writes in 100% of your I/Os.
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Respectfully fl_flyfishing is profoundly wrong in assessing fault tolerance of RAID5 vs RAID6 vs RAID1.

Proof? Lets say you have a RAID1 with a hot spare, vs a RAID6.  You lose 1 disk, then 5 minutes into the rebuild of the RAID1 you lose the other disk in the RAID1.

100% data loss with the RAID1 + spare config. 100% data loss if you had a RAID5 config.

Yet if you lost 1 disk in the RAID6 and another in the RAID6, you survive with no data loss.

But now lets look at the more likely real-world scenario, you have a unrecoverable read error while you are degraded (due to a drive loss) or the parity wasn't consistent.

In RAID1 and RAID5 situations, a single HDD failure + an unrecoverable read error means partial data loss.  RAID6 goes on, with no data loss
Larry Struckmeyer MVPCommented:
I will concede that if you lose both of the original pair of RAID1 drives before the rebuild you would lose the entire set.  But that is most likely because the drives are the same age or there has been some power issue.  The same age or power related problems can, and probably will, affect a set of RAID6 drives where more than one of the original set go off line at the same time.

However, one should do whatever makes them most comfortable with budget constraints.  You could, after all do RAID60 with clustered servers.
no most likely is unrecoverable read error.  Next more likely is inconsistency before the rebuild.  Also during a rebuild you put extreme stress on surviving disks, more stress then the disks have likely had since initial installation.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
also do yourself a favour and install ESXi on a SD card or USB flash drive, and just leave all your disks for the datastore.

Here is the VMware KB on installing 5.0 on USB/SD:
Robert SaylorSenior DeveloperCommented:
Riad 0 is the fastest but no protection.

Raid 1 is just a mirror. This is common for the OS.

Raid 5 is writes a stripe, strip and partity accross multiple disks and has a better recovery then Raid 1.

How fast is your I/O controller? If it's LSI or Adaptec you should be fine with Raid 5. I am about to deploy a 6 drive 600GB SAS and will most likely go with Raid 6 myself. I am about to deploy a VMware ESX/i server on a new IBM 3650 M4 that we just got. The I/O controller that I will be using is a ServeRAID "mega raid" controller. The speed is suppose to be 6Gbps. So with a high speed controller having the better fault tolerant RAID should be fine.
RAID1 is faster than RAID0 in reads (which is the bulk of most I/O with a decent controller).

Reason is simple .... in 100% of the I/Os you have the data in two places, so you get read load balancing.   In RAID0 all of your data is in just one place. If you need block n and block n+100 then on a RAID0,  whatever application that needs to read block n+100 suspends until the first read is done.   But in RAID1, both I/Os happen at the same time.  In RAID0, in perfect world only half the I/O can be split between 2 drives.  Real-world statistically speaking you will do more adjacent I/Os so you are going to have higher probability more I/O will be on any one given disk at a time.

As for recovery, RAID5 is MUCH slower than RAID1 when degraded, at a minimum, best case, it will take twice as long to recover, but in real world, even a 3-drive array will typically take 4X longer to recover.  If you are doing a lot of I/O, it could easily take 8X longer.   A degraded RAID1 has NO performance hit.

Assuming a controller with cache & read load balancing, then traditionally, you would do a RAID1 for the O/S, swap, and scratch table space, then put your databases on a RAID6.

(But better would be to go with a pair of SSDs in RAID1), then a RAID6 for the database files.
Larry Struckmeyer MVPCommented:
All of the comments so far have been right on - assuming you have more than three spindles in RAID5 or 4 in RAID6.  I was only making the point that you gain nothing in usable space over RAID1 and little in fault tollerance until you have more spindles than that.  Yes, 4 spindles in RAID6 might offer a bit more since you can lose two but since they will all be subject to the same age and power conditions, at least at first, I think that to install 4 drives and get the usuable space of 2 is a bit expensive.  And performance in a degraded RAID6 would be awful.

But I think we are all agreed, the strip size, which is what the original question was really about, is best left to the controller default?
Larry Struckmeyer MVPCommented:
@dlethe:  Good stuff.  I knew it was more complicated that what I was saying, but I also knew we did not have enough information to advise.  Your explanation is perfect up to the point that the op will try to impliment it.  Perhaps he can start with the default and apply some testing then increase the size until the results fall off?  

Or perhaps you can direct him to some formula?  Or maybe the manufacture of the controller can best advise him?

So far we have been of more help (I think) in his choice of raid type than on his choice of stripe size, which is really what he asked about.
The formula is too complicated, and you need information specific to the controllers that is not available except under non-disclosure to nail it, so I can't share them.

So best advice is to look at ESXi and they have tables, somewhere, that defines native I/O sizes based on the volume size and the version of your hypervisor.  Start with that and set the RAID controller so that you can get all the data with as few disk I/Os as possible.   Set NTFS (assuming windows) so that it matches the physical disk I/O size.  If each disk ends up reading 64KB at a time, then set up NTFS to be 64KB at a time.

That is a good start regardless of RAID level   But if you are going to be doing database and need index files with lots of seeks then I suggest making that on a RAID1, using smaller disks if necessary.  RAID1 doesn't require parity I/O so is going to be much, much, much faster in most real-world cases.
whorsfallAuthor Commented:

Thanks for the excellent responses - if I said I was using and Adaptec 7805 and the disks were 4 TB Seagate SATA Enterprise disks?

I think for memory the default block size is 256k

Does that sound ok - change any of your answers?


You should call adaptec and ask them for the specific RAID level you are using.  Also look at ESXi for the specific usable capacity you desire, and see how may blocks each I/O request is.

Then it is an equation they can solve for you.  
1. You know that with a X GB logical device, ESXi will do I/Os of Y KB.
2. Now that your I/O is Y KB each, you can ask adaptec the size of each I/O that is going to be done on each of the physical disks for  your given RAID level.   The most efficient answer is that if ESXi asks for 256KB at a time, then you want each HDD to also read or write 256KB at a time.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.