Solved

Raid-5 Array - rebuild gone wrong?

Posted on 2011-03-25
7
826 Views
Last Modified: 2013-11-05
Hello all,

Last night I rebuilt my raid 5 array after one of the disks claimed "error" I took the following steps
Replaced offending disk with a similar one
Using the MediaShield utility in the BIOS (NVIDIA chipset mobo, hw configured RAID) I rebuilt the array.
Right here things get a little iffy. When I installed the new disk and went into my raid utility it showed two array, each with one of the disks from the 3-disk raid-5 array that had failed. Each had the option to rebuild by pressing 'r' but pressing 'r' would kick me back to the previous screen with no confirmation.

I deleted one of the arrays and kept the array that held my first disk in the series, then I hit rebuild and selected both my other disks (in order). In the operating system I can confirm that the MediaShield utility (software on OS not BIOS now) reports that my drive is rebuilding. I left it to run overnight and when I came in this morning it claimed it successfully rebuilt the array.

I try and open the drive which is now displayed in explorer, won't open. I open up the Disk Management snap-in and notice that 1) it is a 2-partition basic disk 2) one partition reports "unallocated" (one of the 1TB hdd's), the other partition reports RAW (another of the 1TB hdd's).

I have no doubt I can format them and have a functional array, but then my data would be kaput. Is there any way I can still recover my data? Where did I go wrong that I have this current setup?

Thank you,

Valde Edius
0
Comment
Question by:Valde_Edius
  • 3
  • 3
7 Comments
 
LVL 47

Accepted Solution

by:
dlethe earned 500 total points
Comment Utility
First RAID5 with NVIDIA ... you're just looking for trouble, and if you aren't using the enterprise-class SATA disks that have the proper firmware mods to deal with TLER, then you can pretty much expect to lose a lot of data, as it will never work right.

To get going, go to runtime.org download raid reconstructor.  free to try, pay to buy.    You will need a NON-RAID adapter to use it to see the individual disks.

But it is a lost cause if you continue to use that hardware combo, it will just happen again.  You can count on it.
0
 

Author Comment

by:Valde_Edius
Comment Utility
Raid Reconstructor (RR) keeps giving me the error "This result is not significant" after the analysis. I did some research on it, and unless I can manually determine all the correct parameters, I would need to pay the makers of RR $300 USD to determine this information for me it seems. Any advice on how to manually determine block size, starting sector, and order?
0
 
LVL 47

Expert Comment

by:dlethe
Comment Utility
In a degraded condition it is difficult because you just cant verify if parity is OK, there is no parity.  So what you have to do is painful.  If this is confusing then pay them or somebody else.

1. Get a binary editor that can look at the physical blocks in hex & ASCII.
2. Programmers calculator that has XOR capability + ASCII table.

The technique is to look at border between each possible block size, i.e say block size is 4, then blocks 0-3  on each physical disk are a stripe, blocks 4-7 are a stripe, then so on.

Now you look for a string of ASCII text characters that will start on one stripe, and end on another.   This will tell you where the strips start and end so you know block size.  
Then you have to see patterns to get the drive order.

But issue is now that due to missing disks you have to calculate parity and ASCII lookup, and remap the missing drive.  you need to figure out the ordering left, to right, where the hole is.  

Then you also have some bad blocks so you have to take statistical samples.  

Bottom line, if Info I gave you doesn't make you say, piece-o-cake, I'll crank out a write a program to automate this, thanks .... then pay somebody.

0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 

Author Comment

by:Valde_Edius
Comment Utility
The data isn't worth the $300 to send it off to someone else, its all media. I can rescan my entire DVD collection again and all the music/pictures are backed up. However, since RR attempts to use a 'brute force' approach to figuring this info out, would it be worth it to try and plug in the old HDD that failed and put that back or is my parity already trash and that would produce false positives if anything? I found my block size to be 64k by repeating the process of creating the raid array because I know I used the default size which is 64k in this case. RR could not provide me with any positive even given that I can guarantee that and 1-2-3 ordering. The only thing that is an unknown is the start block.
0
 
LVL 20

Expert Comment

by:CompProbSolv
Comment Utility
I have found Raid Reconstructor to be pretty effective at evaluation the configuration.  This is with limited (around 5 different RAID sets) experience, though.
I presume that it is doing pretty much what dlethe is suggesting as this shouldn't be too tough for a good programmer to accomplish.  RR appears to try all sorts of combinations and then see if any turn up with data that "makes sense".  In the one instance where it was unsuccessful for me, others were similarly unsuccessful.

If I read the original post correctly, there are three disks in the original array with one failed.  The fact that the original controller doesn't even see the two disks as part of the same array sounds pretty bad to me.  I'd bet that something happened during rebuild to trash one of the two original disks.
0
 
LVL 47

Expert Comment

by:dlethe
Comment Utility
The reason that RR fails to identify the configuration is a combination of the two.  I also somewhat oversimplified the algorithm out of professional courtesy as efficient algorithms to determine the topology are unpublished and considered proprietary intellectual property.  But I'll give you a little more and explain what is going on ...

1. Identifying topology is MUCH easier in non-degraded mode.  That is because you have redundant blocks, and can then utilize the parity blocks to insure that any given block of 512 bytes at offset #n has not been corrupted.   When you XOR the same block across the 5 disks, then every 512 x 8 bits will equal to all 1s or all 0s.    This provides a sanity check and tells you if you can trust the data in the first place.

W/o parity, then you must read a heck of a lot of data and take an average.

2. No way to determine if a stripe is looking at metadata or filesystem data w/o parity, unless they do some things that I won't get into.

3. W/O parity, then you can't easily identify the proper drive ordering, because it is difficult to figure out which stripe is the parity data for any given slice.   Parity moves from disk to disk, starting at an unknown disk # rotating to the next disk, going left or right, at the raid block size, also known.

Bottom line, the more sophisticated algorithms kick in here.   If you have the option, tell RR to search MUCH, MUCH longer, Several GB for example.   Or, if you are 100% sure of the drive ordering, and the blocksize, then just "teach it".    I.e.  Your raid controller should still be alive, so just look at the config and tell it the block size, and you certainly know which disk failed

You won't know the start/end of metadata, but if you teach it correctly, then figuring out where the filesystem partition begins by looking at the raw devices with a binary editor for the file system header.   You have a 60% probability of it being in a human-readable layout (i.e. not XORed reconstructed).  then if you get lucky, teach RR where the partition begins and you are recovered.
0
 

Author Closing Comment

by:Valde_Edius
Comment Utility
Ended up unable to actually recover my data.
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Storage devices are generally used to save the data or sometime transfer the data from one computer system to another system. However, sometimes user accidentally erased their important data from the Storage devices. Users have to know how data reco…
This article is an update and follow-up of my previous article:   Storage 101: common concepts in the IT enterprise storage This time, I expand on more frequently used storage concepts.
This tutorial will walk an individual through the steps necessary to configure their installation of BackupExec 2012 to use network shared disk space. Verify that the path to the shared storage is valid and that data can be written to that location:…
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now