Solved

Raid-5 Array - rebuild gone wrong?

Posted on 2011-03-25
7
835 Views
Last Modified: 2013-11-05
Hello all,

Last night I rebuilt my raid 5 array after one of the disks claimed "error" I took the following steps
Replaced offending disk with a similar one
Using the MediaShield utility in the BIOS (NVIDIA chipset mobo, hw configured RAID) I rebuilt the array.
Right here things get a little iffy. When I installed the new disk and went into my raid utility it showed two array, each with one of the disks from the 3-disk raid-5 array that had failed. Each had the option to rebuild by pressing 'r' but pressing 'r' would kick me back to the previous screen with no confirmation.

I deleted one of the arrays and kept the array that held my first disk in the series, then I hit rebuild and selected both my other disks (in order). In the operating system I can confirm that the MediaShield utility (software on OS not BIOS now) reports that my drive is rebuilding. I left it to run overnight and when I came in this morning it claimed it successfully rebuilt the array.

I try and open the drive which is now displayed in explorer, won't open. I open up the Disk Management snap-in and notice that 1) it is a 2-partition basic disk 2) one partition reports "unallocated" (one of the 1TB hdd's), the other partition reports RAW (another of the 1TB hdd's).

I have no doubt I can format them and have a functional array, but then my data would be kaput. Is there any way I can still recover my data? Where did I go wrong that I have this current setup?

Thank you,

Valde Edius
0
Comment
Question by:Valde_Edius
  • 3
  • 3
7 Comments
 
LVL 47

Accepted Solution

by:
dlethe earned 500 total points
ID: 35216850
First RAID5 with NVIDIA ... you're just looking for trouble, and if you aren't using the enterprise-class SATA disks that have the proper firmware mods to deal with TLER, then you can pretty much expect to lose a lot of data, as it will never work right.

To get going, go to runtime.org download raid reconstructor.  free to try, pay to buy.    You will need a NON-RAID adapter to use it to see the individual disks.

But it is a lost cause if you continue to use that hardware combo, it will just happen again.  You can count on it.
0
 

Author Comment

by:Valde_Edius
ID: 35218797
Raid Reconstructor (RR) keeps giving me the error "This result is not significant" after the analysis. I did some research on it, and unless I can manually determine all the correct parameters, I would need to pay the makers of RR $300 USD to determine this information for me it seems. Any advice on how to manually determine block size, starting sector, and order?
0
 
LVL 47

Expert Comment

by:dlethe
ID: 35218956
In a degraded condition it is difficult because you just cant verify if parity is OK, there is no parity.  So what you have to do is painful.  If this is confusing then pay them or somebody else.

1. Get a binary editor that can look at the physical blocks in hex & ASCII.
2. Programmers calculator that has XOR capability + ASCII table.

The technique is to look at border between each possible block size, i.e say block size is 4, then blocks 0-3  on each physical disk are a stripe, blocks 4-7 are a stripe, then so on.

Now you look for a string of ASCII text characters that will start on one stripe, and end on another.   This will tell you where the strips start and end so you know block size.  
Then you have to see patterns to get the drive order.

But issue is now that due to missing disks you have to calculate parity and ASCII lookup, and remap the missing drive.  you need to figure out the ordering left, to right, where the hole is.  

Then you also have some bad blocks so you have to take statistical samples.  

Bottom line, if Info I gave you doesn't make you say, piece-o-cake, I'll crank out a write a program to automate this, thanks .... then pay somebody.

0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 

Author Comment

by:Valde_Edius
ID: 35219132
The data isn't worth the $300 to send it off to someone else, its all media. I can rescan my entire DVD collection again and all the music/pictures are backed up. However, since RR attempts to use a 'brute force' approach to figuring this info out, would it be worth it to try and plug in the old HDD that failed and put that back or is my parity already trash and that would produce false positives if anything? I found my block size to be 64k by repeating the process of creating the raid array because I know I used the default size which is 64k in this case. RR could not provide me with any positive even given that I can guarantee that and 1-2-3 ordering. The only thing that is an unknown is the start block.
0
 
LVL 21

Expert Comment

by:CompProbSolv
ID: 35220315
I have found Raid Reconstructor to be pretty effective at evaluation the configuration.  This is with limited (around 5 different RAID sets) experience, though.
I presume that it is doing pretty much what dlethe is suggesting as this shouldn't be too tough for a good programmer to accomplish.  RR appears to try all sorts of combinations and then see if any turn up with data that "makes sense".  In the one instance where it was unsuccessful for me, others were similarly unsuccessful.

If I read the original post correctly, there are three disks in the original array with one failed.  The fact that the original controller doesn't even see the two disks as part of the same array sounds pretty bad to me.  I'd bet that something happened during rebuild to trash one of the two original disks.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 35222815
The reason that RR fails to identify the configuration is a combination of the two.  I also somewhat oversimplified the algorithm out of professional courtesy as efficient algorithms to determine the topology are unpublished and considered proprietary intellectual property.  But I'll give you a little more and explain what is going on ...

1. Identifying topology is MUCH easier in non-degraded mode.  That is because you have redundant blocks, and can then utilize the parity blocks to insure that any given block of 512 bytes at offset #n has not been corrupted.   When you XOR the same block across the 5 disks, then every 512 x 8 bits will equal to all 1s or all 0s.    This provides a sanity check and tells you if you can trust the data in the first place.

W/o parity, then you must read a heck of a lot of data and take an average.

2. No way to determine if a stripe is looking at metadata or filesystem data w/o parity, unless they do some things that I won't get into.

3. W/O parity, then you can't easily identify the proper drive ordering, because it is difficult to figure out which stripe is the parity data for any given slice.   Parity moves from disk to disk, starting at an unknown disk # rotating to the next disk, going left or right, at the raid block size, also known.

Bottom line, the more sophisticated algorithms kick in here.   If you have the option, tell RR to search MUCH, MUCH longer, Several GB for example.   Or, if you are 100% sure of the drive ordering, and the blocksize, then just "teach it".    I.e.  Your raid controller should still be alive, so just look at the config and tell it the block size, and you certainly know which disk failed

You won't know the start/end of metadata, but if you teach it correctly, then figuring out where the filesystem partition begins by looking at the raw devices with a binary editor for the file system header.   You have a 60% probability of it being in a human-readable layout (i.e. not XORed reconstructed).  then if you get lucky, teach RR where the partition begins and you are recovered.
0
 

Author Closing Comment

by:Valde_Edius
ID: 35245126
Ended up unable to actually recover my data.
0

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Workplace bullying has increased with the use of email and social media. Retain evidence of this with email archiving to protect your employees.
In this article we will learn how to backup a VMware farm using Nakivo Backup & Replication. In this tutorial we will install the software on a Windows 2012 R2 Server.
This tutorial will walk an individual through the process of installing of Data Protection Manager on a server running Windows Server 2012 R2, including the prerequisites. Microsoft .Net 3.5 is required. To install this feature, go to Server Manager…
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

832 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question