Solved

RAID Stripe Size recommendataion for VMWare ESXi 5.1?

Posted on 2013-06-20
14
9,273 Views
Last Modified: 2013-10-25
Hi,

I am creating a RAID logical disk RAID 5 for VMWare ESXi 5.1. (I will convert to Raid 6 later).

Anyway I wanted to get a recommendataion for how large I should set the Stripe size to be.

The logical disks would be holding virtual machines and made up of 3 x 4 TB sata disks initially.

So I am looking for peoples ideas on the recommended setting.

Thanks,

Ward
0
Comment
Question by:whorsfall
14 Comments
 
LVL 21

Expert Comment

by:Larry Struckmeyer MVP
ID: 39264760
Hi:

Three disks in RAID5 or 4 disks in RAID6 is really no more fault tolerant than RAID1.  You do gain a little better write speed, but you loose a little read speed.  Yes, with RAID6 you could lose 2 disks without suffering any data loss, but RAID1 with a hot spare would do the same.

Worse, loss of one disk in RAID5, or two disks in RAID6 would cause the system to react very slowly as it recreates the info from the parity bit.  RAID 1 does not have such limitations.

The stripe size is best left at the default for the controller you have, but if you must modify it it depends on the use.  Frequent loading and saving of the encyclopedia is different from high volume SQL transactional data entry.  afaik, for VM use it won't make any difference.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 39264776
Bottom line, no matter what stripe size you use, performance will be awful if you have any significant number of writes in a RAID5.  Worst possible case scenario for a RAID5 is also a 4-disk raid 5.   If you so much as write 1 byte in a file, each disk drive will have to go through a minimum of 2 I/Os.  

Do yourself a favor and buy one more disk and go to a RAID10.  In general, your I/O will be at least twice as fast.  Why?  Every byte of data is always going to be in two places. With RAID5 not only is your data guaranteed to be in only one place, but the XOR parity is going to hold up I/O requests with additional reads and writes in 100% of your I/Os.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 39264780
Respectfully fl_flyfishing is profoundly wrong in assessing fault tolerance of RAID5 vs RAID6 vs RAID1.

Proof? Lets say you have a RAID1 with a hot spare, vs a RAID6.  You lose 1 disk, then 5 minutes into the rebuild of the RAID1 you lose the other disk in the RAID1.

100% data loss with the RAID1 + spare config. 100% data loss if you had a RAID5 config.

Yet if you lost 1 disk in the RAID6 and another in the RAID6, you survive with no data loss.

But now lets look at the more likely real-world scenario, you have a unrecoverable read error while you are degraded (due to a drive loss) or the parity wasn't consistent.

In RAID1 and RAID5 situations, a single HDD failure + an unrecoverable read error means partial data loss.  RAID6 goes on, with no data loss
0
 
LVL 21

Expert Comment

by:Larry Struckmeyer MVP
ID: 39264792
I will concede that if you lose both of the original pair of RAID1 drives before the rebuild you would lose the entire set.  But that is most likely because the drives are the same age or there has been some power issue.  The same age or power related problems can, and probably will, affect a set of RAID6 drives where more than one of the original set go off line at the same time.

However, one should do whatever makes them most comfortable with budget constraints.  You could, after all do RAID60 with clustered servers.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 39264795
no most likely is unrecoverable read error.  Next more likely is inconsistency before the rebuild.  Also during a rebuild you put extreme stress on surviving disks, more stress then the disks have likely had since initial installation.
0
 
LVL 117
ID: 39265113
also do yourself a favour and install ESXi on a SD card or USB flash drive, and just leave all your disks for the datastore.


Here is the VMware KB on installing 5.0 on USB/SD:
http://kb.vmware.com/kb/2004784
0
 
LVL 6

Expert Comment

by:Robert Saylor
ID: 39265508
Riad 0 is the fastest but no protection.

Raid 1 is just a mirror. This is common for the OS.

Raid 5 is writes a stripe, strip and partity accross multiple disks and has a better recovery then Raid 1.

How fast is your I/O controller? If it's LSI or Adaptec you should be fine with Raid 5. I am about to deploy a 6 drive 600GB SAS and will most likely go with Raid 6 myself. I am about to deploy a VMware ESX/i server on a new IBM 3650 M4 that we just got. The I/O controller that I will be using is a ServeRAID "mega raid" controller. The speed is suppose to be 6Gbps. So with a high speed controller having the better fault tolerant RAID should be fine.
0
Give your grad a cloud of their own!

With up to 8TB of storage, give your favorite graduate their own personal cloud to centralize all their photos, videos and music in one safe place. They can save, sync and share all their stuff, and automatic photo backup helps free up space on their smartphone and tablet.

 
LVL 47

Expert Comment

by:dlethe
ID: 39265667
RAID1 is faster than RAID0 in reads (which is the bulk of most I/O with a decent controller).

Reason is simple .... in 100% of the I/Os you have the data in two places, so you get read load balancing.   In RAID0 all of your data is in just one place. If you need block n and block n+100 then on a RAID0,  whatever application that needs to read block n+100 suspends until the first read is done.   But in RAID1, both I/Os happen at the same time.  In RAID0, in perfect world only half the I/O can be split between 2 drives.  Real-world statistically speaking you will do more adjacent I/Os so you are going to have higher probability more I/O will be on any one given disk at a time.

As for recovery, RAID5 is MUCH slower than RAID1 when degraded, at a minimum, best case, it will take twice as long to recover, but in real world, even a 3-drive array will typically take 4X longer to recover.  If you are doing a lot of I/O, it could easily take 8X longer.   A degraded RAID1 has NO performance hit.

Assuming a controller with cache & read load balancing, then traditionally, you would do a RAID1 for the O/S, swap, and scratch table space, then put your databases on a RAID6.

(But better would be to go with a pair of SSDs in RAID1), then a RAID6 for the database files.
0
 
LVL 21

Expert Comment

by:Larry Struckmeyer MVP
ID: 39265794
All of the comments so far have been right on - assuming you have more than three spindles in RAID5 or 4 in RAID6.  I was only making the point that you gain nothing in usable space over RAID1 and little in fault tollerance until you have more spindles than that.  Yes, 4 spindles in RAID6 might offer a bit more since you can lose two but since they will all be subject to the same age and power conditions, at least at first, I think that to install 4 drives and get the usuable space of 2 is a bit expensive.  And performance in a degraded RAID6 would be awful.

But I think we are all agreed, the strip size, which is what the original question was really about, is best left to the controller default?
0
 
LVL 47

Accepted Solution

by:
dlethe earned 500 total points
ID: 39265868
No, the controller default could be anything.   The optimal stripe size is the one that results in the least amount of physical disk I/O, as that is measured in milliseconds, and cached I/Os are microseconds to nanoseconds, depending on the cache.

ESXi makes I/O requests of size X based on the block count of the logical volume.   The larger the volume size, the larger the request.   Get the number too big and you could easily waste 10X or more of your I/O.

If you have a 4-disk RAID 5 where controller reads n blocks at a time, then if your I/O request is for an even multiple, like 4n blocks, then no matter what, it is always going to be inefficient, because you have to read all disks once, then another disk twice.

But if you had a 5-disk RAID5, then read 4n blocks, you read each disk only once, so can get it in one pass.   If you had a chunk size of 2n and had that same I/O request on a 4-drive RAID6 then you only had to read that data from 2 drives.

If you had 2 x RAID1 then doesn't matter what n is because you can always read n, or 1/2 n, or anything else and

Now for writes, here is where you get the big hit, because the low-end controllers do chunk-level parity on RAID5.  So if you have to write 1 byte of data, and chunk size is 64KB, and 4 disks, then no matter what, you HAVE to read 64x3 KB, then write 64x2KB.    But if this was a RAID1, then you could very well end up having to write only 2x2KB, and the 2nd write would always be in the background.

Knowing true I/O characteristics and understanding the ramifications of the RAID level, topology,and controller is much more involved than what most people think.  But then again, most people don't develop RAID controller software/firmware/diagnostics like I do, either.

The RAID controller default can't possibly be optimal because it is unaware of whether your I/O is more random, sequential, large block, or small block.  An esxi system is pretty much large block random I/O.  That means little benefit of cache, so you want to optimize for as little I/O as possible.  Multiple RAID1s is going to be best for all types of I/O. But chunk size needs to be tuned to match esxi.
0
 
LVL 21

Expert Comment

by:Larry Struckmeyer MVP
ID: 39265896
@dlethe:  Good stuff.  I knew it was more complicated that what I was saying, but I also knew we did not have enough information to advise.  Your explanation is perfect up to the point that the op will try to impliment it.  Perhaps he can start with the default and apply some testing then increase the size until the results fall off?  

Or perhaps you can direct him to some formula?  Or maybe the manufacture of the controller can best advise him?

So far we have been of more help (I think) in his choice of raid type than on his choice of stripe size, which is really what he asked about.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 39265922
The formula is too complicated, and you need information specific to the controllers that is not available except under non-disclosure to nail it, so I can't share them.

So best advice is to look at ESXi and they have tables, somewhere, that defines native I/O sizes based on the volume size and the version of your hypervisor.  Start with that and set the RAID controller so that you can get all the data with as few disk I/Os as possible.   Set NTFS (assuming windows) so that it matches the physical disk I/O size.  If each disk ends up reading 64KB at a time, then set up NTFS to be 64KB at a time.

That is a good start regardless of RAID level   But if you are going to be doing database and need index files with lots of seeks then I suggest making that on a RAID1, using smaller disks if necessary.  RAID1 doesn't require parity I/O so is going to be much, much, much faster in most real-world cases.
0
 

Author Comment

by:whorsfall
ID: 39266247
Hi,

Thanks for the excellent responses - if I said I was using and Adaptec 7805 and the disks were 4 TB Seagate SATA Enterprise disks?

http://www.adaptec.com/en-us/support/raid/sas_raid/sas-7805/

I think for memory the default block size is 256k

Does that sound ok - change any of your answers?

Thanks,

Ward
0
 
LVL 47

Expert Comment

by:dlethe
ID: 39600527
You should call adaptec and ask them for the specific RAID level you are using.  Also look at ESXi for the specific usable capacity you desire, and see how may blocks each I/O request is.

Then it is an equation they can solve for you.  
1. You know that with a X GB logical device, ESXi will do I/Os of Y KB.
2. Now that your I/O is Y KB each, you can ask adaptec the size of each I/O that is going to be done on each of the physical disks for  your given RAID level.   The most efficient answer is that if ESXi asks for 256KB at a time, then you want each HDD to also read or write 256KB at a time.
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Will try to explain how to use the VMware feature TAGs in the VMs and create Veeam Backup Jobs using TAGs. Since this article is too long, I will create second article for the Veeam tasks.
Veeam Backup & Replication has added a new integration – Veeam Backup for Microsoft Office 365.  In this blog, we will discuss how you can benefit from Office 365 email backup with the Veeam’s new product and try to shed some light on the needs and …
Teach the user how to rename, unmount, delete and upgrade VMFS datastores. Open vSphere Web Client: Rename VMFS and NFS datastores: Upgrade VMFS-3 volume to VMFS-5: Unmount VMFS datastore: Delete a VMFS datastore:
Teach the user how to convert virtaul disk file formats and how to rename virtual machine files on datastores. Open vSphere Web Client: Review VM disk settings: Migrate VM to new datastore with a thick provisioned (lazy zeroed) disk format: Rename a…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now