Best way to replace drives in RAID5 array


I'm running Windows 2008 Server R2 on my Dell Poweredge.  I've got a Perc300 controller and 3 WD500 RE4 in a RAID 5 array.  I just had drive 1 fail.  I know that it's a good rule of thumb to replace the other hard drives too as they may be failing soon.  The server and drives are about 5 years old.  I needed to get the server online and the only drive I had available for temporary replacement was a WD 1TB Red drive.  I've installed this drive and successfully rebuilt the RAID5 array.  I'm not comfortable leaving this drive in for long and I'd like to put back the WD500 RE4 drives.  So, I've ordered 3 WD500 RE4s to replace all three of the drives.  What's the best way to do this?  I was thinking that I needed to remove one drive at a time and perform a rebuild on that drive.  When that's done, I'll take out the next drive, perform a rebuild.  Then finally I'll remove the third drive, replace it and then do another rebuild.  This server is in service 7 am to 8 pm 6 days a week so I don't have enough down time during a normal day to take down the server for hours at a time so that's why I was thinking about this solution.  Will this work or is there a better way to get this done?  Thanks for your help!
Matt KendallTech / Business owner operatorAsked:
Who is Participating?
I assume you mean one disk at a time overnight? I would take the time to do a full backup before starting the replacement in case something goes wrong, but it's probably the best way to get the job done without having to spend a lot of after-hours time onsite.
Otherwise, plan to spend a few hours one night to do that full backup, replace all the drives, rebuild the array, and restore the data. However, doing it one drive at a time is pretty safe, it's just doing what the RAID system is designed to do. You can swap a drive each night and go home when the rebuild starts, and you can check the data integrity between those overnight swaps.
If you have a fourth slot, you might consider adding a fourth drive if you do the all-at-once method. And, in that case, consider whether RAID10 would do the job better for you (it depends on your needs, really). Even if you stick with RAID5, I like a four-drive array better than three.
Jim_NimSenior EngineerCommented:
I would avoid doing numerous rebuilds on a RAID5 for this purpose. If there are any bad blocks on the old drives that are encountered during rebuilds, you're going to end up with RAID stripes being "punctured" which can lead to data corruption.

I'd recommend getting a full validated backup of all the data, then create a completely new RAID set on the new drives to restore to.

You might want to go ahead and replace the failed drive with one of those new ones just to get the RAID5 in a healthy / non-degraded state, but only if you don't have the option to get a full backup and take the downtime needed to restore to a new RAID set.
Glenn MCommented:
The process you're suggesting should work fine *if* after each rebuild you verify that all hard disk drive members are properly accounted for, fully functional individually, and the RAID controller posts correctly and without errors.

Plan around the degraded performance you'll get during the rebuild. And think about adding an additional drive to that array as a online spare - some additional piece of mind for the future.

Good luck.
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

A rebuild on 5 year old disks when one of them died is too high risk.  You are at risk of losing everything and are on borrowed time. You need to get a 1TB drive and do an image backup of the RAID5 (boot to unix and just use dd, or your favorite bare metal backup).

Then yank the drives and replace them with all new drives.  Initialize the RAID, then do an image restore from the 1TB drive.   This will not put ANY data at risk.
Jim_NimSenior EngineerCommented:
To clarify why some of us warn against the RAID5 rebuild... 2 big risks:
1. The additional I/O load on the remaining two drives may end up triggering a failure on one of them, causing the RAID set to fail completely
2. Any bad blocks encountered during the rebuild on the two drives are going to cause a "puncture" at minimum (which can strangely seems to become contagious and spread bad blocks to other drives, triggering drive failures and additional corruption), or might even cause the S300 (not the best RAID card around - cheap model that's prone to problems) to fail the rebuild, or even panic and fail the RAID set.

If your #1 priority is the safety of the data and you don't have a validated backup yet, you should do that before attempting anything else. Then if data availability is the next biggest priority, you could go ahead w/ trying the rebuild, and trust the card to do its job... if it works fine, maybe you can do the whole process online (though I don't know if the S300 supports expanding the RAID size live like the H-series controllers do) with more purposeful failure/rebuilds. Typically it's not recommended to try it that way anyway though, and live drive upgrades are only done with H-series controllers using the "replace disk" function that mirrors data to the replacement drive without putting the RAID set in a degraded state.

Good luck!
Matt KendallTech / Business owner operatorAuthor Commented:
After three nights of rebuilding, it worked great.  Now I have a server with new RAID5 hard drives.  Thanks!
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.