whorsfall
asked on
RAID Stripe Size recommendataion for VMWare ESXi 5.1?
Hi,
I am creating a RAID logical disk RAID 5 for VMWare ESXi 5.1. (I will convert to Raid 6 later).
Anyway I wanted to get a recommendataion for how large I should set the Stripe size to be.
The logical disks would be holding virtual machines and made up of 3 x 4 TB sata disks initially.
So I am looking for peoples ideas on the recommended setting.
Thanks,
Ward
I am creating a RAID logical disk RAID 5 for VMWare ESXi 5.1. (I will convert to Raid 6 later).
Anyway I wanted to get a recommendataion for how large I should set the Stripe size to be.
The logical disks would be holding virtual machines and made up of 3 x 4 TB sata disks initially.
So I am looking for peoples ideas on the recommended setting.
Thanks,
Ward
Bottom line, no matter what stripe size you use, performance will be awful if you have any significant number of writes in a RAID5. Worst possible case scenario for a RAID5 is also a 4-disk raid 5. If you so much as write 1 byte in a file, each disk drive will have to go through a minimum of 2 I/Os.
Do yourself a favor and buy one more disk and go to a RAID10. In general, your I/O will be at least twice as fast. Why? Every byte of data is always going to be in two places. With RAID5 not only is your data guaranteed to be in only one place, but the XOR parity is going to hold up I/O requests with additional reads and writes in 100% of your I/Os.
Do yourself a favor and buy one more disk and go to a RAID10. In general, your I/O will be at least twice as fast. Why? Every byte of data is always going to be in two places. With RAID5 not only is your data guaranteed to be in only one place, but the XOR parity is going to hold up I/O requests with additional reads and writes in 100% of your I/Os.
Respectfully fl_flyfishing is profoundly wrong in assessing fault tolerance of RAID5 vs RAID6 vs RAID1.
Proof? Lets say you have a RAID1 with a hot spare, vs a RAID6. You lose 1 disk, then 5 minutes into the rebuild of the RAID1 you lose the other disk in the RAID1.
100% data loss with the RAID1 + spare config. 100% data loss if you had a RAID5 config.
Yet if you lost 1 disk in the RAID6 and another in the RAID6, you survive with no data loss.
But now lets look at the more likely real-world scenario, you have a unrecoverable read error while you are degraded (due to a drive loss) or the parity wasn't consistent.
In RAID1 and RAID5 situations, a single HDD failure + an unrecoverable read error means partial data loss. RAID6 goes on, with no data loss
Proof? Lets say you have a RAID1 with a hot spare, vs a RAID6. You lose 1 disk, then 5 minutes into the rebuild of the RAID1 you lose the other disk in the RAID1.
100% data loss with the RAID1 + spare config. 100% data loss if you had a RAID5 config.
Yet if you lost 1 disk in the RAID6 and another in the RAID6, you survive with no data loss.
But now lets look at the more likely real-world scenario, you have a unrecoverable read error while you are degraded (due to a drive loss) or the parity wasn't consistent.
In RAID1 and RAID5 situations, a single HDD failure + an unrecoverable read error means partial data loss. RAID6 goes on, with no data loss
I will concede that if you lose both of the original pair of RAID1 drives before the rebuild you would lose the entire set. But that is most likely because the drives are the same age or there has been some power issue. The same age or power related problems can, and probably will, affect a set of RAID6 drives where more than one of the original set go off line at the same time.
However, one should do whatever makes them most comfortable with budget constraints. You could, after all do RAID60 with clustered servers.
However, one should do whatever makes them most comfortable with budget constraints. You could, after all do RAID60 with clustered servers.
no most likely is unrecoverable read error. Next more likely is inconsistency before the rebuild. Also during a rebuild you put extreme stress on surviving disks, more stress then the disks have likely had since initial installation.
also do yourself a favour and install ESXi on a SD card or USB flash drive, and just leave all your disks for the datastore.
Here is the VMware KB on installing 5.0 on USB/SD:
http://kb.vmware.com/kb/2004784
Here is the VMware KB on installing 5.0 on USB/SD:
http://kb.vmware.com/kb/2004784
Riad 0 is the fastest but no protection.
Raid 1 is just a mirror. This is common for the OS.
Raid 5 is writes a stripe, strip and partity accross multiple disks and has a better recovery then Raid 1.
How fast is your I/O controller? If it's LSI or Adaptec you should be fine with Raid 5. I am about to deploy a 6 drive 600GB SAS and will most likely go with Raid 6 myself. I am about to deploy a VMware ESX/i server on a new IBM 3650 M4 that we just got. The I/O controller that I will be using is a ServeRAID "mega raid" controller. The speed is suppose to be 6Gbps. So with a high speed controller having the better fault tolerant RAID should be fine.
Raid 1 is just a mirror. This is common for the OS.
Raid 5 is writes a stripe, strip and partity accross multiple disks and has a better recovery then Raid 1.
How fast is your I/O controller? If it's LSI or Adaptec you should be fine with Raid 5. I am about to deploy a 6 drive 600GB SAS and will most likely go with Raid 6 myself. I am about to deploy a VMware ESX/i server on a new IBM 3650 M4 that we just got. The I/O controller that I will be using is a ServeRAID "mega raid" controller. The speed is suppose to be 6Gbps. So with a high speed controller having the better fault tolerant RAID should be fine.
RAID1 is faster than RAID0 in reads (which is the bulk of most I/O with a decent controller).
Reason is simple .... in 100% of the I/Os you have the data in two places, so you get read load balancing. In RAID0 all of your data is in just one place. If you need block n and block n+100 then on a RAID0, whatever application that needs to read block n+100 suspends until the first read is done. But in RAID1, both I/Os happen at the same time. In RAID0, in perfect world only half the I/O can be split between 2 drives. Real-world statistically speaking you will do more adjacent I/Os so you are going to have higher probability more I/O will be on any one given disk at a time.
As for recovery, RAID5 is MUCH slower than RAID1 when degraded, at a minimum, best case, it will take twice as long to recover, but in real world, even a 3-drive array will typically take 4X longer to recover. If you are doing a lot of I/O, it could easily take 8X longer. A degraded RAID1 has NO performance hit.
Assuming a controller with cache & read load balancing, then traditionally, you would do a RAID1 for the O/S, swap, and scratch table space, then put your databases on a RAID6.
(But better would be to go with a pair of SSDs in RAID1), then a RAID6 for the database files.
Reason is simple .... in 100% of the I/Os you have the data in two places, so you get read load balancing. In RAID0 all of your data is in just one place. If you need block n and block n+100 then on a RAID0, whatever application that needs to read block n+100 suspends until the first read is done. But in RAID1, both I/Os happen at the same time. In RAID0, in perfect world only half the I/O can be split between 2 drives. Real-world statistically speaking you will do more adjacent I/Os so you are going to have higher probability more I/O will be on any one given disk at a time.
As for recovery, RAID5 is MUCH slower than RAID1 when degraded, at a minimum, best case, it will take twice as long to recover, but in real world, even a 3-drive array will typically take 4X longer to recover. If you are doing a lot of I/O, it could easily take 8X longer. A degraded RAID1 has NO performance hit.
Assuming a controller with cache & read load balancing, then traditionally, you would do a RAID1 for the O/S, swap, and scratch table space, then put your databases on a RAID6.
(But better would be to go with a pair of SSDs in RAID1), then a RAID6 for the database files.
All of the comments so far have been right on - assuming you have more than three spindles in RAID5 or 4 in RAID6. I was only making the point that you gain nothing in usable space over RAID1 and little in fault tollerance until you have more spindles than that. Yes, 4 spindles in RAID6 might offer a bit more since you can lose two but since they will all be subject to the same age and power conditions, at least at first, I think that to install 4 drives and get the usuable space of 2 is a bit expensive. And performance in a degraded RAID6 would be awful.
But I think we are all agreed, the strip size, which is what the original question was really about, is best left to the controller default?
But I think we are all agreed, the strip size, which is what the original question was really about, is best left to the controller default?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
@dlethe: Good stuff. I knew it was more complicated that what I was saying, but I also knew we did not have enough information to advise. Your explanation is perfect up to the point that the op will try to impliment it. Perhaps he can start with the default and apply some testing then increase the size until the results fall off?
Or perhaps you can direct him to some formula? Or maybe the manufacture of the controller can best advise him?
So far we have been of more help (I think) in his choice of raid type than on his choice of stripe size, which is really what he asked about.
Or perhaps you can direct him to some formula? Or maybe the manufacture of the controller can best advise him?
So far we have been of more help (I think) in his choice of raid type than on his choice of stripe size, which is really what he asked about.
The formula is too complicated, and you need information specific to the controllers that is not available except under non-disclosure to nail it, so I can't share them.
So best advice is to look at ESXi and they have tables, somewhere, that defines native I/O sizes based on the volume size and the version of your hypervisor. Start with that and set the RAID controller so that you can get all the data with as few disk I/Os as possible. Set NTFS (assuming windows) so that it matches the physical disk I/O size. If each disk ends up reading 64KB at a time, then set up NTFS to be 64KB at a time.
That is a good start regardless of RAID level But if you are going to be doing database and need index files with lots of seeks then I suggest making that on a RAID1, using smaller disks if necessary. RAID1 doesn't require parity I/O so is going to be much, much, much faster in most real-world cases.
So best advice is to look at ESXi and they have tables, somewhere, that defines native I/O sizes based on the volume size and the version of your hypervisor. Start with that and set the RAID controller so that you can get all the data with as few disk I/Os as possible. Set NTFS (assuming windows) so that it matches the physical disk I/O size. If each disk ends up reading 64KB at a time, then set up NTFS to be 64KB at a time.
That is a good start regardless of RAID level But if you are going to be doing database and need index files with lots of seeks then I suggest making that on a RAID1, using smaller disks if necessary. RAID1 doesn't require parity I/O so is going to be much, much, much faster in most real-world cases.
ASKER
Hi,
Thanks for the excellent responses - if I said I was using and Adaptec 7805 and the disks were 4 TB Seagate SATA Enterprise disks?
http://www.adaptec.com/en-us/support/raid/sas_raid/sas-7805/
I think for memory the default block size is 256k
Does that sound ok - change any of your answers?
Thanks,
Ward
Thanks for the excellent responses - if I said I was using and Adaptec 7805 and the disks were 4 TB Seagate SATA Enterprise disks?
http://www.adaptec.com/en-us/support/raid/sas_raid/sas-7805/
I think for memory the default block size is 256k
Does that sound ok - change any of your answers?
Thanks,
Ward
You should call adaptec and ask them for the specific RAID level you are using. Also look at ESXi for the specific usable capacity you desire, and see how may blocks each I/O request is.
Then it is an equation they can solve for you.
1. You know that with a X GB logical device, ESXi will do I/Os of Y KB.
2. Now that your I/O is Y KB each, you can ask adaptec the size of each I/O that is going to be done on each of the physical disks for your given RAID level. The most efficient answer is that if ESXi asks for 256KB at a time, then you want each HDD to also read or write 256KB at a time.
Then it is an equation they can solve for you.
1. You know that with a X GB logical device, ESXi will do I/Os of Y KB.
2. Now that your I/O is Y KB each, you can ask adaptec the size of each I/O that is going to be done on each of the physical disks for your given RAID level. The most efficient answer is that if ESXi asks for 256KB at a time, then you want each HDD to also read or write 256KB at a time.
Three disks in RAID5 or 4 disks in RAID6 is really no more fault tolerant than RAID1. You do gain a little better write speed, but you loose a little read speed. Yes, with RAID6 you could lose 2 disks without suffering any data loss, but RAID1 with a hot spare would do the same.
Worse, loss of one disk in RAID5, or two disks in RAID6 would cause the system to react very slowly as it recreates the info from the parity bit. RAID 1 does not have such limitations.
The stripe size is best left at the default for the controller you have, but if you must modify it it depends on the use. Frequent loading and saving of the encyclopedia is different from high volume SQL transactional data entry. afaik, for VM use it won't make any difference.