Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 733
  • Last Modified:

2 drives fail in raid5 array - will they rebuild?

Hi,
I have a PowerEdge 2850, and this morning (at approximately 11:32) had not one but 2 drive fail. This is not a good thing - i know.

I have replaced single failed drives without issue, but am concerned about two.

When I replace the drives, will they 'rebuild' (automatically or thru bios) or am I in for more stressful drama?

If they won't rebuild, what is the best (and fastest) way to get the server back up and functioning properly?

Thank you for your prompt reply.as this server houses our Exchange server so minimum downtime is a must.

*peace*
angel
0
angel35
Asked:
angel35
  • 5
  • 3
  • 2
  • +3
1 Solution
 
John HurstBusiness Consultant (Owner)Commented:
How many drives are in the array? If only 3 drives, you are in for a rough ride and will probably need to fix the machine, reinstall the OS and recover from backups. If more than 3 drives, I would call Dell support and get their advice. ... Thinkpads_User
0
 
angel35Author Commented:
There are 3 drives n the array - 146g each
well... piffle!! - is an OS reinstall definately going to be needed? Or is it possible I might get incrediably lucky? lol

I could light a couple thousand candles tonight...
0
 
John HurstBusiness Consultant (Owner)Commented:
I don't think there is any way RAID can rebuild a 3-drive array with only one drive to start with. The third drive is, in effect a checksum type of thing (spread around) and it needs two out of 3 drives to rebuild.

At the risk of a dumb question, are you absolutely sure two drives are gone? both at the same time? .... Thinkpads_User
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
angel35Author Commented:
well...both LEDs are flashing amber (weren't like that when I left the office Friday afternoon) and in the bios both say 'fail' - so unless I am missing something (and I might be since I am brain-dead these days) I believe they both have.
Drive 3 I had to replace about 3 mo ago - it went also with no warning or errors in the logs.
I have determined I hate that brand of drive (but I will be nice and not name it here lol)
 
0
 
tigermattCommented:

RAID 5 can recover from a single drive failure, but not multiple drive failures. In losing two drives you have lost some of the data to the point whereby it cannot be recovered. It must be restored from backup or the system reinstalled.

This is a prime example of where having a Global Hot Spare (GHS) drive installed in the server would have saved you a lot. A drive failure would cause the array to instantly rebuild onto this second drive; if the two drives failed at different times with enough time in between for it to failover, you could now replace the two failed drives to restore redundancy, but still have a working array.

-Matt
0
 
angel35Author Commented:
ok - i appreciate your information. (although it really isn't what i wanted to hear lol)
Which drive should I try replacing 1st - on the off chance that both drives didnt fail?
Since it isn't booting up should I try drive0 first?
0
 
John HurstBusiness Consultant (Owner)Commented:
You can try drive0 first, but it will still not likely boot. ... T
0
 
tigermattCommented:

You don't know which drive to try first, as they all have an equal part to play in the RAID array. There isn't one "master" drive as such, since that wouldn't give redundancy.

You would want to consider installing a new drive with the failed ones still intact. If one of them hasn't actually failed, this act of installing a new drive should bump the controller into rebuilding onto the new drive.

If the controller is reporting both drives as failed though, that is what I would trust, and I also doubt it would continue to try to rebuild, since it will be able to determine the array has failed beyond repair.

-Matt
0
 
qualchoice-itCommented:
Just wanted to add my two cents, looksl like you will need to rebuild the machine and do a bare metal restore since RAID 5 will only handle a single drive failure (as stated) why not rebuild it, add one more drive and just mirror it, this will save you in the long run from multiple drive failures.  :)   Good luck.
0
 
andyalderSaggar makers bottom knockerCommented:
It all depends on whether the second drive to fail actually died or just went offline, you can try forcing it online in BIOS if you know which and hopefully it is in sync with the remaining disk, then you might get it to boot in which case back up very quickly. Assuming they didn't fail at exactly the same time the first disk to fail will be out of sync even though that may be able to force online, I wouldn't bother trying with that one.
0
 
sifueditionCommented:
Andyalder is spot on. With that particular controller, I have actually recovered (and lost) many mulitple drive failures while I worked for Dell. The controller is designed to be as secure with the data as it can be. Due to that, if one drive is having an issue, it may cause communication failures with other drives on the chain. When that happens, the controller may remove drives from the array as a precaution. These drives are honestly in just an "offline" status but for some reason, LSI decided to put the text as "failed". Bear in mind, this is not a diagnostic result. This is simply a statement of whether or not the drive is participating in the array it is configured to. You can force ONE drive online (not both) with the other removed. If the system boots and appears stable and valid, then test/replace the removed drive. If the system will not boot, remove that drive, replace the removed drive and force it online instead. Test the boot with this combination. If it still will not boot, then the data is beyond simple help. If you do find a combination that works, BACKUP. Then you can rebuild and test the consistency of the array.

If the system will not boot with either combination, you can take one last step. You can keep all the original drives in the system. Go to the raid bios and write down all of the raid configurations. Delete the array. Then, create a new array with the exact same configurations. Be sure that Fast Initialization is disabled in the array configurations before creating the array. Do NOT initialize the array or the data is gone. Sometimes this "retag" of the metadata on the array will allow you to recover the data. If it reaches this point and the drives stay online, I would try to boot with all three drives. If it doesn't play well, remove the drive you most believe to be failed and boot with just two. If that doesn't work, remove the other drive that had been offline first. Replace the drive that had been out and force it online.

If this data is critical, it is important to consider forensic recovery before any of this is attempted. These steps can all affect the success rate of those services.
0
 
angel35Author Commented:
Thank you all for your extremely helpful information!
I was exremely lucky!!
sifuedition, you rock!!
I followed your advice step by step, and was able to restore BOTH drives! Thank you ,  thank you, thank you!!
HIndsight is 20/20, and this little drama has driven home to my $$ tight boss that additional space is required for proper and complete backups.
I was fortunate enough to not lose any data, did not have to rebuild server from scratch, and most importantly did not have to cancel my long-planned mini-vacation to Nashville this weekend for my momma's birthday!
*peace*  to you all, and all those you hold dear...
angel
0
 
angel35Author Commented:
sifuedition - bless you!!!!
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 5
  • 3
  • 2
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now