Link to home
Start Free TrialLog in
Avatar of Matt Kendall
Matt KendallFlag for United States of America

asked on

Best way to replace drives in RAID5 array

Hi,

I'm running Windows 2008 Server R2 on my Dell Poweredge.  I've got a Perc300 controller and 3 WD500 RE4 in a RAID 5 array.  I just had drive 1 fail.  I know that it's a good rule of thumb to replace the other hard drives too as they may be failing soon.  The server and drives are about 5 years old.  I needed to get the server online and the only drive I had available for temporary replacement was a WD 1TB Red drive.  I've installed this drive and successfully rebuilt the RAID5 array.  I'm not comfortable leaving this drive in for long and I'd like to put back the WD500 RE4 drives.  So, I've ordered 3 WD500 RE4s to replace all three of the drives.  What's the best way to do this?  I was thinking that I needed to remove one drive at a time and perform a rebuild on that drive.  When that's done, I'll take out the next drive, perform a rebuild.  Then finally I'll remove the third drive, replace it and then do another rebuild.  This server is in service 7 am to 8 pm 6 days a week so I don't have enough down time during a normal day to take down the server for hours at a time so that's why I was thinking about this solution.  Will this work or is there a better way to get this done?  Thanks for your help!
ASKER CERTIFIED SOLUTION
Avatar of schaps
schaps
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Jim_Nim
Jim_Nim

I would avoid doing numerous rebuilds on a RAID5 for this purpose. If there are any bad blocks on the old drives that are encountered during rebuilds, you're going to end up with RAID stripes being "punctured" which can lead to data corruption.

I'd recommend getting a full validated backup of all the data, then create a completely new RAID set on the new drives to restore to.

You might want to go ahead and replace the failed drive with one of those new ones just to get the RAID5 in a healthy / non-degraded state, but only if you don't have the option to get a full backup and take the downtime needed to restore to a new RAID set.
The process you're suggesting should work fine *if* after each rebuild you verify that all hard disk drive members are properly accounted for, fully functional individually, and the RAID controller posts correctly and without errors.

Plan around the degraded performance you'll get during the rebuild. And think about adding an additional drive to that array as a online spare - some additional piece of mind for the future.

Good luck.
Avatar of David
A rebuild on 5 year old disks when one of them died is too high risk.  You are at risk of losing everything and are on borrowed time. You need to get a 1TB drive and do an image backup of the RAID5 (boot to unix and just use dd, or your favorite bare metal backup).

Then yank the drives and replace them with all new drives.  Initialize the RAID, then do an image restore from the 1TB drive.   This will not put ANY data at risk.
To clarify why some of us warn against the RAID5 rebuild... 2 big risks:
1. The additional I/O load on the remaining two drives may end up triggering a failure on one of them, causing the RAID set to fail completely
2. Any bad blocks encountered during the rebuild on the two drives are going to cause a "puncture" at minimum (which can strangely seems to become contagious and spread bad blocks to other drives, triggering drive failures and additional corruption), or might even cause the S300 (not the best RAID card around - cheap model that's prone to problems) to fail the rebuild, or even panic and fail the RAID set.

If your #1 priority is the safety of the data and you don't have a validated backup yet, you should do that before attempting anything else. Then if data availability is the next biggest priority, you could go ahead w/ trying the rebuild, and trust the card to do its job... if it works fine, maybe you can do the whole process online (though I don't know if the S300 supports expanding the RAID size live like the H-series controllers do) with more purposeful failure/rebuilds. Typically it's not recommended to try it that way anyway though, and live drive upgrades are only done with H-series controllers using the "replace disk" function that mirrors data to the replacement drive without putting the RAID set in a degraded state.

Good luck!
Avatar of Matt Kendall

ASKER

After three nights of rebuilding, it worked great.  Now I have a server with new RAID5 hard drives.  Thanks!