Link to home
Start Free TrialLog in
Avatar of TimFarren
TimFarren

asked on

RAID 5 - unique situation and question

Hey guys.. time is critical on this one.. any help appreciated.

Without too much explanation, here's what I have:

Dell Perc Raid Controller.  2 drives from a RAID 5 array that are perfectly intact....

EXCEPT...

The second drive (drive 3 is totally toast) was imaged from the original drive sector by sector with no errors, so it's an exact copy.

In theory, I should be able to use those 2 drives - and I have already been able to do so, but only in a limited fashion - the software I have to rebuild the array only lets me save files, or make an image file, but will not let me clone the array directly onto another drive for booting purposes.  The problem is that image creation takes about 14 hours.  I just finished a 14 hour stretch and after that was done, the destination disk that contained the image went belly up on me (it was a 3 TB drive).  We're talking about a large amount of data, so imaging and reimaging is just too time consuming and this server needs to be up by Monday morning!

I wanted to put the good drives back into the server, but the problem is that the RAID controller sees the perfectly good copy of the failed drive as a drive that doesn't belong to the array.  It's a different model and serial number drive than the original, and my guess is that the raid config is sensitive to this.  

IS THERE ANY WAY... to trick the raid controller into using this newly copied raid member and allow me to boot this server and perform a backup?  Hex editor on the drive to fix a serial number problem possible?

Please don't suggest I image it again.  I don't have that many hours left to do that.  Suggestions are welcome but please hurry!

Thanks!!!
Avatar of Lee W, MVP
Lee W, MVP
Flag of United States of America image

So first question (and I suspect I know the answer but I'll ask it anyway), why not restore from backups to a new RAID set?

Second, how many drives were in the original RAID 5 config?  3?  If so, I don't understand why you're going through all these hoops.  You should be running fine in a degraded state.  Replace the drive with ANY drive that's the same size or larger than the original and tell the controller to rebuild if it doesn't do it automatically.

Third, this data is CLEARLY important... understand that you're asking for tips and ideas concerning a LARGE amount of important data from the internet... we're good... but we're NOT THERE and I find it doubtful that ANY true expert if going to make promises that their recommendations have ZERO risk.  Anything you try that we might suggest COULD very well cause a catastrophic data loss.  If this data is as important as you say there should be backups and if there aren't then clearly once you have a method that works you should recover it using that method because any other method could lose EVERYTHING.  Once you recover everything, if you want to spend the time finding a better way, go right ahead, just in case this happens again.

So, third having been stated, I've had great success using the PAID version of RAID reconstructor to rebuild the data off failed RAID 5 sets onto a disk that was perfectly readable in NON-RAID form.

Lastly, to reiterate, you have a known successful method - use it... you may not get sleep tonight, but most experienced admins have had those nights (myself included).  Accept this and tomorrow (or the day after) take the lessons learned and plan for such failures in a way that will let you get sleep at night.

(I'm really not trying to be me offensive - I've been where you are - not EXACTLY the same circumstances, but similar ones with a software RAID and Microsoft... and I know how VITAL data can be.  So I'm VERY VERY conservative advising anyone to do anything that could permanently lose the data.  And I'm not there, I don't know the RAID controller firmware, the EXACT RAID config and frankly, I've not used many CURRENT Dell controllers for anything other than setup (they've been really good since old style SCSI died)).
Ok, I missed some details in my reading of that question in the email... It's a 3 drive RAID...OK.  BUT, why are you using software to recover the RAID... why are you not just letting the controller run the array in a degraded state?  And why are you using a duplicated drive (and HOW did you duplicate it?  What software did it?  dd?  Acronis? imagex? something else?  (Some of those MAY NOT WORK!))
Avatar of TimFarren
TimFarren

ASKER

Let me clarify a little:

1.  3 Disk array.
2.  Sequence of events:  Out of drives 0, 1, and 2 - disk 1 failed.  We replaced the drive with a new one and began a rebuild.  During the rebuild, disk #2 died before the rebuild could complete.  This left us with an inconsistent array.  
3.  NO BACKUP.. I know.. I know.. moving on...
4.  I used a high end disk imager to perform an exact sector copy from one drive to another.  I was able to recover all sectors to a new identical disk.
5.  I was able to rebuild the array using software and correct parameters, and I see good data - but I have no bootable server.  
6.  I attempted imaging the array onto an image file for later reimaging back to a fresh array, but that process takes about 14 hours.
7.  I did attempt to put drives 0 and 2 back into the server (leaving out the partially build disk 1) but the perc controller refuses to acknowledge the duplicated disk as the original (probably due to the serial number on the drive).
8.  I attempted to rebuild the raid array on the controller side, but I can't get the controller to cooperate. Is there a way to trick the controller into accepting the replacement?  If I can get the controller to do the heavy lifting of raid, I can start a backup onto a usb drive and then do a restore.
Assuming you have a forensic copy of all the disks that you can fall back to if needed I would check to see if the option to import the cloned disk using the import foreign option. In BIOS I think that's found by highlighting the controller and pressing F2 but they do hide it quite well. The timestamp on the metadata for this disk will be wrong so it should appear as foreign. This will not work if the OS has been run in between the disk being removed and cloned and being put back since the data on it will be stale.

A 2nd drive failing during a rebuild isn't at all unique by the way although it is lucky you managed to make a copy of it presumably sector-by-sector. As leew says RAID reconstructor can recover the data because it will ignore the metadata, but presumably you have used something similar which took the 14 hours.
I'm trying to avoid this (I've already tried it and have wasted over 28 hours):

Rebuild w/software RAID--> Image FILE --> Image file back to new RAID Array.

I'd rather do this:
Drives attached to server, boot with Usb or CD using software capable of dumping the rebuilt array onto a new array - image rebuilt array directly onto new array...

I have been unsuccessful in finding any software which will dump the recovered array to a disk rather than to an image - it seems like such a short distance between either method programmatically.  I'm willing to pay for software that will do it - but can't find it.

The suggestion about pulling the drive in as foreign - no, the controller doesn't see it as foreign.  My guess is that the drive has contradictory data on it (if the RAID info on the drive contains drive serial numbers, then it doesn't even have it's own serial number included in the config - I wondered if a hex editor would allow me to alter the raid config stored on the drives and fix this issue?)

anyway thanks for the continued help.  I'm running out of time, but I do appreciate everyone's input.
What does the controller see the drive as then? Can you post a screenshot (or photo) from BIOS with the drive visible.
It just sees it as a new disk.  It doesn't associate it with any array.
That is correct, the drive does not have a record about being part of the RAID at the very first track of the drive. And there is unfortunately no way to put this record there.
Normally there are two approaches of storing such info. First and rare one is that RAID controller has its own memory and stores the information about RAID group mates in it.
Second and which is widely spread - the info is stored on HDD itself. Thus if controller fails you can move the drives to another controller of the same model and you are good to go.
In your situation the only way out would be complete restore from the image.
I assume you were using RAID Reconstructor - right?
A forensic copy will copy the metadata as well as the data though, and a PERC 5 or 6 stores the config in the metadata and on the controller.
ASKER CERTIFIED SOLUTION
Avatar of TimFarren
TimFarren

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
My solution got the job done.