Link to home
Start Free TrialLog in
Avatar of Valde_Edius
Valde_EdiusFlag for United States of America

asked on

Raid-5 Array - rebuild gone wrong?

Hello all,

Last night I rebuilt my raid 5 array after one of the disks claimed "error" I took the following steps
Replaced offending disk with a similar one
Using the MediaShield utility in the BIOS (NVIDIA chipset mobo, hw configured RAID) I rebuilt the array.
Right here things get a little iffy. When I installed the new disk and went into my raid utility it showed two array, each with one of the disks from the 3-disk raid-5 array that had failed. Each had the option to rebuild by pressing 'r' but pressing 'r' would kick me back to the previous screen with no confirmation.

I deleted one of the arrays and kept the array that held my first disk in the series, then I hit rebuild and selected both my other disks (in order). In the operating system I can confirm that the MediaShield utility (software on OS not BIOS now) reports that my drive is rebuilding. I left it to run overnight and when I came in this morning it claimed it successfully rebuilt the array.

I try and open the drive which is now displayed in explorer, won't open. I open up the Disk Management snap-in and notice that 1) it is a 2-partition basic disk 2) one partition reports "unallocated" (one of the 1TB hdd's), the other partition reports RAW (another of the 1TB hdd's).

I have no doubt I can format them and have a functional array, but then my data would be kaput. Is there any way I can still recover my data? Where did I go wrong that I have this current setup?

Thank you,

Valde Edius
ASKER CERTIFIED SOLUTION
Avatar of David
David
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Valde_Edius

ASKER

Raid Reconstructor (RR) keeps giving me the error "This result is not significant" after the analysis. I did some research on it, and unless I can manually determine all the correct parameters, I would need to pay the makers of RR $300 USD to determine this information for me it seems. Any advice on how to manually determine block size, starting sector, and order?
In a degraded condition it is difficult because you just cant verify if parity is OK, there is no parity.  So what you have to do is painful.  If this is confusing then pay them or somebody else.

1. Get a binary editor that can look at the physical blocks in hex & ASCII.
2. Programmers calculator that has XOR capability + ASCII table.

The technique is to look at border between each possible block size, i.e say block size is 4, then blocks 0-3  on each physical disk are a stripe, blocks 4-7 are a stripe, then so on.

Now you look for a string of ASCII text characters that will start on one stripe, and end on another.   This will tell you where the strips start and end so you know block size.  
Then you have to see patterns to get the drive order.

But issue is now that due to missing disks you have to calculate parity and ASCII lookup, and remap the missing drive.  you need to figure out the ordering left, to right, where the hole is.  

Then you also have some bad blocks so you have to take statistical samples.  

Bottom line, if Info I gave you doesn't make you say, piece-o-cake, I'll crank out a write a program to automate this, thanks .... then pay somebody.

The data isn't worth the $300 to send it off to someone else, its all media. I can rescan my entire DVD collection again and all the music/pictures are backed up. However, since RR attempts to use a 'brute force' approach to figuring this info out, would it be worth it to try and plug in the old HDD that failed and put that back or is my parity already trash and that would produce false positives if anything? I found my block size to be 64k by repeating the process of creating the raid array because I know I used the default size which is 64k in this case. RR could not provide me with any positive even given that I can guarantee that and 1-2-3 ordering. The only thing that is an unknown is the start block.
I have found Raid Reconstructor to be pretty effective at evaluation the configuration.  This is with limited (around 5 different RAID sets) experience, though.
I presume that it is doing pretty much what dlethe is suggesting as this shouldn't be too tough for a good programmer to accomplish.  RR appears to try all sorts of combinations and then see if any turn up with data that "makes sense".  In the one instance where it was unsuccessful for me, others were similarly unsuccessful.

If I read the original post correctly, there are three disks in the original array with one failed.  The fact that the original controller doesn't even see the two disks as part of the same array sounds pretty bad to me.  I'd bet that something happened during rebuild to trash one of the two original disks.
The reason that RR fails to identify the configuration is a combination of the two.  I also somewhat oversimplified the algorithm out of professional courtesy as efficient algorithms to determine the topology are unpublished and considered proprietary intellectual property.  But I'll give you a little more and explain what is going on ...

1. Identifying topology is MUCH easier in non-degraded mode.  That is because you have redundant blocks, and can then utilize the parity blocks to insure that any given block of 512 bytes at offset #n has not been corrupted.   When you XOR the same block across the 5 disks, then every 512 x 8 bits will equal to all 1s or all 0s.    This provides a sanity check and tells you if you can trust the data in the first place.

W/o parity, then you must read a heck of a lot of data and take an average.

2. No way to determine if a stripe is looking at metadata or filesystem data w/o parity, unless they do some things that I won't get into.

3. W/O parity, then you can't easily identify the proper drive ordering, because it is difficult to figure out which stripe is the parity data for any given slice.   Parity moves from disk to disk, starting at an unknown disk # rotating to the next disk, going left or right, at the raid block size, also known.

Bottom line, the more sophisticated algorithms kick in here.   If you have the option, tell RR to search MUCH, MUCH longer, Several GB for example.   Or, if you are 100% sure of the drive ordering, and the blocksize, then just "teach it".    I.e.  Your raid controller should still be alive, so just look at the config and tell it the block size, and you certainly know which disk failed

You won't know the start/end of metadata, but if you teach it correctly, then figuring out where the filesystem partition begins by looking at the raw devices with a binary editor for the file system header.   You have a 60% probability of it being in a human-readable layout (i.e. not XORed reconstructed).  then if you get lucky, teach RR where the partition begins and you are recovered.
Ended up unable to actually recover my data.