Link to home
Start Free TrialLog in
Avatar of DonKwizote
DonKwizoteFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Server RAID advise

Hi,
I have 6 1TB hard drives that I intend to use in a RAID 5 (So 6TB minus 1TB for parity is 5TB usuable space.
On the 5TB, I am planning to install Windows 2012/2016 server and use the rest of the space to store hyper-V VM servers.
My question is, are there any likely problems with this disk configuration?
If so, what should I do instead to avoid problems?
ASKER CERTIFIED SOLUTION
Avatar of Member_2_231077
Member_2_231077

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Avatar of arnold
arnold
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
http://www.zdnet.com/article/why-raid-5-stops-working-in-2009/

article discusses, the deficiencies of raid 5 with large disks.
I have an EE article that explains how we set up the host: Some Hyper-V Hardware & Software Best Practices.

Suffice it to say, we always use RAID 6 with a RAID controller that has at least 1GB of non-volatile cache.

There are two logical disks set up:
75GB for host OS
Balance GB/TB for data

With the above array and logical disk configuration we can blow away the host and import the VMs on very short order. Methodology is in the article.
General advice is to setup your fault tolerance as high as possible within your budget. .
Sure it would be great for us to all run Raid 10 but in reality that isn't always affordable or practical.
That said if this is for a Production environment I agree with the other posters to try to use at least raid 6, if this configuration is for a test / development environment then your raid 5 configuration is probably sufficient.

Really what it boils down to is: it all depends on the level of tolerance and funds your organization has. if down time on the environment your setting up is tolerable to some extent you may be ok with Raid 5 (as it does still provide fault tolerance) however if you have a 99.99 OLA you may need to go with a higher Raid level.

I recommend you look at your business needs and what you have available for hardware and make your decision based on that.
For example if you only have those 6 drives and no way to add more (either financially or physically) you may not have an option to increase raid level if your plan will need all 5 TB of space.
What's the controller?
Is it a hardware caching controller?
Do not use software fake raid or you will be sorry.
As for disks ,use SAS instead of SATA if the controller supports it.
SAS has a much higher level of ECC as opposed to SATA and the pricing is just about the same on 7200 rpm disks.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of DonKwizote

ASKER

Thanks everyone. So RAID 10 seems like the popular choice.
raid has a higher cost on HW side, double read rate, quicker rebuild since it is a tripe of mirrors a failed disk, when replaced rebuilds at a speed of RAID 1 member to memeber without impacting other drives.

using an SD/USB stick to boot the system while the entirety of the storage is dedicated to the VM environmet as was pointed out. avoids disk contention between the host OS and the Vms.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
RAID 6 a double stripe has the same  foundational deficiency RAID 5..
http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/
the main issue is not performance while in optimal state, the issue is what the impact is on performance when the raid array is degraded and how long that lasts (rebuild duration)
@Lee Ingals That's per solid-state drive? ~32K IOPS with a mid to high endurance and performance SSD.

@Arnold That's why we deploy Storage Spaces in our cluster settings. We still deploy RAID in standalone servers, though with SFF (Small Form Factor) 2.5" drives not LFF (Large Form Factor) 3.5" so are not too worried about rebuild times. With the current densities a 10K SAS can sustain write around 250MB/Second. That gives us a two to four hour window for rebuild times depending on server load at the time.

As an FYI: A LFF disk failure in Storage Spaces can be rebuilt into a Pool's free space (#TB of 1 Disk + ~200GB). So, if there are 54x LFF disks that 10TB failed disk gets rebuilt into free space at 54x ~200MB/Second. In a 3-Way Mirror this maintains the two disk redundancy once the process completes. When the bad disk is replaced it will get rebuilt with the Pool storage being freed up after it is complete.
Avatar of Member_2_231077
Member_2_231077

Depending on the controller they may be able to do 3-way mirroring in hardware, gets pretty expensive in disks though.
Official request made to have Rob (eenookami) deleted as administrator.
@Philip,

Not sure I understand what the issue you addressed to me. The point of the RAID 5  and subsequently the article dealing with raid 6, projects the increased capacities of storage.
The transition of servers to 2.5 LFF HD merely extended.
The prevelance of the SSD's and ever declining costs of those, would extend the use of RAID 5/6 beyond the initial projection based on the 3.5 HD at the time
either the technology i.e. SSD reducing in cost, or the decline in cost of the HD's would eventually lead to use of raid 10.

The density of space for drives in servers is also.. ofcourse if one transition to 1.8" drives,......
@Arnold I don't understand the response?

Suffice it to say, the risk versus reward for RAID 6 is a reasonable one in SFF drive setups. SSDs do indeed make rebuild times a fraction of what spindles are capable of today.

RAID 10 is just too costly at 50% available storage to drive ratio.
Depends a lot on what the array is being used for.

If it is a heavily used SQL database, then RAID10 would be the obvious choice, speed and availability is paramount.
For a workstation doing complex mathematical simulations, which can run for an hour, then RAID 0 is fine (data gets kept on a server)
For a system drive on a server, I usually use RAID1, performance is not important, the drive is often not accessed much once booted.
For a disk array as part of a disk-disk-tape backup solution, I would look at RAID5, huge amounts of cheap space are needed.
@PE - RAID-6 the only way to go! Your example just shows its limited by bus-speed, not outright performance. Spinning rust is always going to be limited by IOPS v RAID10 and as @arnold said, performance when degraded is going to be much slower using RAID-6
@Mal Osborne, with that argument for D2D2T wouldnt RAID-0 be a better bet?
@Philip

I do not fully understand your analysis of utilization, cost,
n number of disks
Yes raid 10 is has the storage capacity of n/2* storage capacity. it also has n/2 points of failure tolerance.
ON failure of a single drive in each RAId1 pair, there is no performance hit.
Restore RAID to optimal state consumes resources to sync the replaced drive with its mirror counter part. Does not impact performance, faster rebuild.

raid 5 compared to raid 10 required extreme justification when the small space drives cost significant amount of money,
with the increased capacity for storage, while at the same the costs of those has declined. on gig/currency basis.
as the density of writing on magnetic .. head size reduced, with the precision, the sff increase in capacity... and have been increasing in use as higher densty of drives in a system has grown ......
@Arnold Rebuild is based on the drive's sustained throughput. Therefore, the same drive in a RAID 6 or RAID 10 drive would rebuild at the same rate.

We've dealt with many a failed disk over the years in RAID 6. Performance degradation was negligible to the workloads running on the system. Again, the days of old where disk limitations and RAID on Chip (RoC) horsepower were huge bottlenecks to performance. That is not the case today.
@Philip please clarify "rebuild rate"

A rebuild in a RAID 10 similar to a rebuild in a RAID 1.

rebuild of a RAID 6 is a computational having to spin every disk in the array to build the information on the new/replacement drive

Comparatively performance impact of a single disk failure on a RAID 10 volume versus RAID 6

Lets try it this way, which is easier/faster to copy data from a single sheet or aggregating information from four other pages to write the fifth one? one is a one for one copy/clone...
With all the advances in controllers, and their performance. Using the same equipment, one configured with RAID10 6 drives and the other with RAID 6 using 6 drives. Pull one drive, and replace it. Which will be done first?
The point of the articles on both RAID 5 and the RAID 6, is while the technological advances can extend the time, the issue remains the same as drive density expands, the amount of time needed to rebuild an array versus the probability of a second drive failure during that window.
I concur on the second drive failure window.

There is definitely additional duress on the array during the rebuild process that can induce a further failure as a result. BTDT in both RAID 10 and RAID 6 arrays. In one case, it was the RAID 10's mirror partner that failed and we lost the entire server as a result.

We have a setup at the shop that we can run this test on. It will go on the To Do list as this is something we've not tested against in a long time.
The possibility given the age of drives are commonly the same, is the same, but as noted in RAID5/6 all remaining drives in the array bare a load during rebuild, while in a RAID 10, the load is born by the single member drive.

at times, the load on the system during rebuild could cause the remaining drive to be kicked out. At times, forcing the last drive back online ..... is possibly sufficient (or requesting an emergency outage window) 30 minutes an hour depending on the drive size would commonly be enough to get a RAID 10 to optimal state, including, potentially replacing the just kicked out member and have that rebuild complete....
Thanks everyone for your input
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.

I have recommended this question be closed as follows:

Split:
-- andyalder (https:#a42263693)
-- arnold (https:#a42263695)
-- Mal Osborne (https:#a42264182)
-- Philip Elder (https:#a42266662)
-- Lee Ingalls (https:#a42266698)


If you feel this question should be closed differently, post an objection and the moderators will review all objections and close it as they feel fit. If no one objects, this question will be closed automatically the way described above.

seth2740
Experts-Exchange Cleanup Volunteer