Link to home
Start Free TrialLog in
Avatar of cntmis
cntmis

asked on

RAID 5 DATA RECOVERY QUESTION

I am not an expert with Raid 5 so someone looking at this may need more information but here it goes...

We had a server (Windows 2000 Advanced Server) with a SQL Server 2000 database on it.  There were 6 x 70 gig drives 5 of which were Raid 5 with an online hot swap.  Apparently, one drive failed in April and the online hotswap came "active".  Then a month later another drive failed and the entire array went offline.  Technology, ordered two replacement hard drives and inserted both, emphasize both, into the server.  The rebuild process started but failed at 22%.  It is my understanding replacing the one bad drive should have resulted in a complete rebuild and restoration as Raid 5 is supposed to work, however by inserting two new drives this may have caused data loss as it would have confused the controller.  Does anyone know the answer, did we loose all of the data and the possibility to recover?
Avatar of Callandor
Callandor
Flag of United States of America image

Even with a bad hard drive, the array should have still been usable in a degraded state until you replaced the drive and it rebuilt itself.  You may be able to recover the data using RAID Reconstructor from www.runtime.org.
No not if they inserted 2 drives at once.  I could not get a raid 5 to rebuild with 3 drives by inserting 1 new one, so with 5 drives, if they inserted 2 leaving only 3 originals, that is not enough to reconstruct the data from the parity spread across the drives, because there will be gaps in the parity sequence, and it would fail at about 22%, which is when it first encountered the drive with no expected parity bits.

So yes, inserting 2 drives at once was a mistake.  But I find this typical of RAID 5, I have found it fails on even the most trivial change.  I am now recommending people go with 4 drives -- two as RAID 0, then the other 2 as RAID 1, which is a mirror of the first 2 drives.

So after you go to www.runtime.org -- and make sure you get their RAID RECONSTRUCTOR utility -- this will get the raid mack for you, if anything can -- then copy the whole thing to an IDE drive.  Now REMAKE your RAID as RAID 10 -- which is 2 drives as RAID 0 and the next 2 drives as a RAID 1 mirror of the first two.  This is MUCH more stable than RAID 5, and allows for replacing 2 drives without problem.  You can make drives 5 and 6 as the failover for either RAID.  It works!!
raid back for you, sorry for the typo.. And then of course copy the data back from the IDE drive once the RAID 10 is made and formatted.
The question is why did the array go offline if one disk failed? In my point of view that shouldn't have happened, even if one member of the array broke. So the reason could be that another disk also is on the point of going bad, maybe there was a smart error on the disk that tells the system it is about to fail, therefore the array may have gone offline to ensure the data is safe. Now when the system was rebuilding, that other disk may have actually failed, and that would cause the rebuilding of the array to fail. This shows that when a drive in an array goes bad, you should allways act immediately and replace the bad drive, because others can allways go bad too. Maybe the problem was temporary though, and you can try rebuilding again, and if it still won't work, try Cal's raid reconstructor.
ASKER CERTIFIED SOLUTION
Avatar of scrathcyboy
scrathcyboy
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Normally with good controllers it isn't a problem. Also, the spare hot spare, doesn't belong to the array as long as it is just a spare, and since the rebuilding process started up fine the hot spare wouldn't have come into the picture at all. There must have been some event during the rebuilding process that stopped it, like another disk going or maybe a bad cable. Of course if 2 drives which are members of an array die, that is another situation, then the array is gone. But the hot spare isn't a member yet. But as I said earlier, if a drive goes bad, it is absolutely essential to change that drive immediately so the chances of another bad disk breaking the array slimmer. The array is allways in danger while it is rebuilding.
I agree with rindi, and the quality of the controller can play an important part.  There's a world of difference in performance (and price) between a top quality controller and a budget one.
Avatar of cntmis
cntmis

ASKER

Thank you to everyone answering here.  Currently our drives are on their way, via first class, to our in house forensics team.  Hopefully, as mentioned the array wen toffline to protect the data and it is still on the drives.  However, apparently the rebuild attempt with the two drives inserted at the same time may not have been the best of moves.  We are keeping our hopes up.  As we learn our fate, I will report it back here.  Well at a minimum we learned that RAID 5 is not the most reliable in the RAID family when looking a fault tolerance.  We will definitely make sure once we get our server back, with or without the data, that we request a much better controller and a different RAID configuration as well as regular tape backups, which by the way were not being performed and should have.  If they were being done we wouldn't be in this situation in the first place.  Thank you again everyone.
YES, there is a better configuration.  You can use RAID 1 to mirror drives, which is very reliable, then you can use RAID 0 to extend the array and striping across many drives to enlarge the array.  Or you can do it the other way, use RAID 0 to enlarge the array, and a second drive set to mirror the array, with a RAID 1 mirror -- the way you do it depends on the controller.  On an HP LP2000r for example, you make 2 containers of say 3 drives each, and then make the relationship in one container a raid 0 striping, and the other a raid 1 mirror of the first.   This is called RAID 10.  It turns out that RAID 5 is fairly reliable, as long as you never have to pull a drive, then it is not.