Link to home
Start Free TrialLog in
Avatar of dev775
dev775

asked on

RAID 1, 10 or other? A Special Situation with a Slightly Older Server Model Requiring Two VMs.

We are migrating an older Windows 2016 Standard server to a Dell R710, same OS, and are utilizing RAID to help with server stability, speed and reliability, amongst other advantages (and disadvantages). The server configuration currently uses 8x 146 gig 15k/rpm SAS hdds. The server will use HyperV for two VMs, one for SQL and the other for the Application software, with users in the very low or below hundred(s) who operate the App software. The software itself is not extraordinarily taxing, and the system will soon back up to NAS and a Cloud, so we wont rely on just RAID as a backup, but instead as a cushion against drive failure to keep runtime constant without interruption. The server itself has had very low running time because it was bought from a company that ended up discarding them before being in production. The users and software require very low HD space, but we want to give the server the advantages of mirroring and striping. We originally thought RAID 1 with hot spare would work but upon further research, RAID 10 appears to be a better solution considering how many hdds we have. Which RAID configuration would best suit this situation? 

ASKER CERTIFIED SOLUTION
Avatar of smilieface
smilieface
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Using all 8 drives in RAID 10 will give a 33% performance boost over using just 6 drives. There isnt really a reason to have hot spares in a system not contributing to performance, unless the system will be particularly difficult to service or get parts to, such as on an oil rig or other ship, or a very remote location.
"There isnt really a reason to have hot spares in a system not contributing to performance"

That depends on whether you are happy running the server with unprotected drives for an extended period. And by extended period, I mean more than an hour or so.

As a DBA, my job is to protect the data assets, so I am less keen on this! Disks do fail, and I'd rather not have to get up and go into the office on a Sunday morning at 2AM to replace a failed disk.

And yes - Adding the additional 2 drives will give a small but measurable performance increase. In theory around 33%, but probably a little less in practice.

If the usable space requirement is higher than 50% of total capacity then you will need to consider other options.
...so we wont rely on just RAID as a backup

RAID is not a backup

...RAID 10 appears to be a better solution considering how many hdds we have.

if you can afford the space overhead then go for it
you'll have about 600gb usable if using all or roughly 420gb with hot spares so you have to look at capacity planning if you can afford the smaller space.  if you need the space from all 8 drives then suggest buying a couple additional drives as a cold spare that you can swap out immediately


As can be seen, RAID 1 and RAID 10 provide the highest reliability, which is why they are recommended for use for Hyper-V storage. 

Hyper-V Storage Best Practices

https://www.nakivo.com/blog/hyper-v-storage-best-practices/
We would set up as follows:
 Array 0: RAID 6 All Disks
  Logical Disk 0: 95GB OS
  Logical Disk 1: 1TB for Hyper-V VMs

Get the most out of the spinning rust by having them set up in one array. What's not mentioned is the RAID controller PERC model. That may have some influence on the final setup.

We won't do RAID 10. With RAID 6 we may see a decrease in performance during a rebuild but at least we won't lose a server if a failed drive's mirrored partner also fails. BTDT won't do it again.

EDIT: While IOPS will not be ideal at about 150 per disk throughput should hit about 800MB/Second in our experience with this setup.

I have two very thorough EE articles on all things Hyper-V:
Some Hyper-V Hardware and Software Best Practices
Practical Hyper-V Performance Expectations
RAID 6 is a valid solution that sets two disks out of the array as parity, and rotates which disks are parity. Consequently it can survive the failure of any two drives, and provides the capacity of 6/8 drives (~860GB). It rose to popularity after a much read article discussed the likelihood of a RAID-5 array encountering an inevitable un-recoverable read error based on probability and the Mean Time Between Failures of a disk when reconstructing a RAID 5 array (following a failure) when using large consumer grade disks (> 2TB SATA).

These drives are described as 146GB SAS, which have a MTBF at least one order of magnitude higher than the consumer grade, and are one order of magnitude smaller than the multi-terabyte drives considered in the paper. Therefore the likelihood of encountering an un-recoverable read error drops to below 1% per array rebuild.

From a data access perspective RAID-6 is one of the slowest solutions when writing, because every write requires a minimum of 3 writes (original and 2 parity) and potentially (depending on the implementation) up to 5 additional reads to calculate the parity. As you have tagged this question with "SQL Server" I am going to assume there is a database in this solution, and the performance on a large RAID-6 array will be significantly below a RAID 10 solution. Databases live and die on their storage solutions.

RAID-10 with 6 drives can THEORETICALLY support 3 drive failures, as long as they're the "right" drives, i.e. not both sides of a mirror pair. But with 2 warm spares the array will rebuild automatically when a drive fails and resilience is restored, hopefully without human intervention. Replacing the warm spare then becomes an urgent but non-critical task.

My alternative, if you require capacity, would be RAID-50. Two RAID-5 arrays of 4 drives each, providing 3 drives of capacity, striped together. Performance will comfortably outstrip a RAID-6 array. You can withstand up to 2 drives failing (again - they need to be the right drives, but compromises are compromises) and you get ~860GB usable storage.

RAID-6 would be my last choice, purely because the performance is so compromised, but it does offer resilience against any 2 drives failing almost simultaneously..
Point of clarification:
 * RAID 5 = 1 Disk's volume worth of parity data distributed across all platters
 * RAID 6 = 2 Disk's volume worth of parity data distributed across all platters

As far as RAID 5 goes a simple search for "Punctured Stripe" will suffice. We saw enough arrays lost over the years to jump on RAID 6 as soon as controllers offered the feature. Usually, it was an "add-on license" back then but we paid the fee.

A hot spare whether assigned to a specific array or globally so that any array that experiences a failure can pick it up is no guarantee that the array in question won't be lost. The rebuild process stresses the entire disk subsystem. If there are any other weak members in the array they could possibly die/drop out of the array thus in a RAID 5 configuration the array data would be lost.
Avatar of dev775
dev775

ASKER

For this purpose, we are likely going to implement RAID 10 with 2 hot spares. That said, we have 3 more R710's and R720's that will be almost identical, if not identical, in setup to this one. Judging by the responses, we may try RAID 6 on this next R710. Since both machines will be going through rigorous testing environments before production, we can stress test and run fire drills on them to see how much they can withstand based on simulated failed disk scenarios. RAID 50 looks very promising, too.  

This machine does indeed run SQL Server on one of the VMs, but the next server may have different roles depending on which next server we have to replace in the rack. The next server we migrate may be much more data-heavy, so we won't be able to sacrifice as much storage space between the 8 drives as we did on this one.

Thank you very much for the information, we will be back with more scenario-driven questions soon! Have a great rest of your week and enjoy your weekend.
@smilieface - just be aware that Andrew, Kevin, Seth and Philip are the experts on this stuff - they have been there done that, got the t-shirt !
I've been playing with (I mean using in large commercial production environments) RAID hardware (and software) for 30 years and have run everything from large SANs (EMC, Sun, Compaq and HP) through to local NVMe storage. I've written papers and blog posts (back in 2000) on the difference between RAID-10 and RAID-01. I don't spend that much time answering storage posts, because they (inevitably) end up in flame-wars with someone's pet solution being pushed, and I'm bored of that.

Like I said above, databases live and die on the performance and resilience of their storage. As a DBA I got fed up with snake oil, and decided to learn this stuff!

I had t-shirts... they got holes in them! No offense taken (or intended).

edit.
That said, I hope I treated them with the respect they feel is due. Mostly I agreed with what they said (or they agreed with what I said) and the differences were largely around emphasis.
@smilieface just letting you know the pitfalls of trying to teach your grandmother to suck eggs



Storage is indeed the #1 source of pain that we see in standalone, asymmetric clusters (Nodes and direct attached JBOD(s)), hyper-converged infrastructure clusters (HCI), and others.