Raid-5 Array - rebuild gone wrong?

Posted on 2011-03-25
Last Modified: 2013-11-05
Hello all,

Last night I rebuilt my raid 5 array after one of the disks claimed "error" I took the following steps
Replaced offending disk with a similar one
Using the MediaShield utility in the BIOS (NVIDIA chipset mobo, hw configured RAID) I rebuilt the array.
Right here things get a little iffy. When I installed the new disk and went into my raid utility it showed two array, each with one of the disks from the 3-disk raid-5 array that had failed. Each had the option to rebuild by pressing 'r' but pressing 'r' would kick me back to the previous screen with no confirmation.

I deleted one of the arrays and kept the array that held my first disk in the series, then I hit rebuild and selected both my other disks (in order). In the operating system I can confirm that the MediaShield utility (software on OS not BIOS now) reports that my drive is rebuilding. I left it to run overnight and when I came in this morning it claimed it successfully rebuilt the array.

I try and open the drive which is now displayed in explorer, won't open. I open up the Disk Management snap-in and notice that 1) it is a 2-partition basic disk 2) one partition reports "unallocated" (one of the 1TB hdd's), the other partition reports RAW (another of the 1TB hdd's).

I have no doubt I can format them and have a functional array, but then my data would be kaput. Is there any way I can still recover my data? Where did I go wrong that I have this current setup?

Thank you,

Valde Edius
Question by:Valde_Edius
  • 3
  • 3
LVL 47

Accepted Solution

dlethe earned 500 total points
ID: 35216850
First RAID5 with NVIDIA ... you're just looking for trouble, and if you aren't using the enterprise-class SATA disks that have the proper firmware mods to deal with TLER, then you can pretty much expect to lose a lot of data, as it will never work right.

To get going, go to download raid reconstructor.  free to try, pay to buy.    You will need a NON-RAID adapter to use it to see the individual disks.

But it is a lost cause if you continue to use that hardware combo, it will just happen again.  You can count on it.

Author Comment

ID: 35218797
Raid Reconstructor (RR) keeps giving me the error "This result is not significant" after the analysis. I did some research on it, and unless I can manually determine all the correct parameters, I would need to pay the makers of RR $300 USD to determine this information for me it seems. Any advice on how to manually determine block size, starting sector, and order?
LVL 47

Expert Comment

ID: 35218956
In a degraded condition it is difficult because you just cant verify if parity is OK, there is no parity.  So what you have to do is painful.  If this is confusing then pay them or somebody else.

1. Get a binary editor that can look at the physical blocks in hex & ASCII.
2. Programmers calculator that has XOR capability + ASCII table.

The technique is to look at border between each possible block size, i.e say block size is 4, then blocks 0-3  on each physical disk are a stripe, blocks 4-7 are a stripe, then so on.

Now you look for a string of ASCII text characters that will start on one stripe, and end on another.   This will tell you where the strips start and end so you know block size.  
Then you have to see patterns to get the drive order.

But issue is now that due to missing disks you have to calculate parity and ASCII lookup, and remap the missing drive.  you need to figure out the ordering left, to right, where the hole is.  

Then you also have some bad blocks so you have to take statistical samples.  

Bottom line, if Info I gave you doesn't make you say, piece-o-cake, I'll crank out a write a program to automate this, thanks .... then pay somebody.

Free eBook: Backup on AWS

Everything you need to know about backup and disaster recovery with AWS, for FREE!


Author Comment

ID: 35219132
The data isn't worth the $300 to send it off to someone else, its all media. I can rescan my entire DVD collection again and all the music/pictures are backed up. However, since RR attempts to use a 'brute force' approach to figuring this info out, would it be worth it to try and plug in the old HDD that failed and put that back or is my parity already trash and that would produce false positives if anything? I found my block size to be 64k by repeating the process of creating the raid array because I know I used the default size which is 64k in this case. RR could not provide me with any positive even given that I can guarantee that and 1-2-3 ordering. The only thing that is an unknown is the start block.
LVL 21

Expert Comment

ID: 35220315
I have found Raid Reconstructor to be pretty effective at evaluation the configuration.  This is with limited (around 5 different RAID sets) experience, though.
I presume that it is doing pretty much what dlethe is suggesting as this shouldn't be too tough for a good programmer to accomplish.  RR appears to try all sorts of combinations and then see if any turn up with data that "makes sense".  In the one instance where it was unsuccessful for me, others were similarly unsuccessful.

If I read the original post correctly, there are three disks in the original array with one failed.  The fact that the original controller doesn't even see the two disks as part of the same array sounds pretty bad to me.  I'd bet that something happened during rebuild to trash one of the two original disks.
LVL 47

Expert Comment

ID: 35222815
The reason that RR fails to identify the configuration is a combination of the two.  I also somewhat oversimplified the algorithm out of professional courtesy as efficient algorithms to determine the topology are unpublished and considered proprietary intellectual property.  But I'll give you a little more and explain what is going on ...

1. Identifying topology is MUCH easier in non-degraded mode.  That is because you have redundant blocks, and can then utilize the parity blocks to insure that any given block of 512 bytes at offset #n has not been corrupted.   When you XOR the same block across the 5 disks, then every 512 x 8 bits will equal to all 1s or all 0s.    This provides a sanity check and tells you if you can trust the data in the first place.

W/o parity, then you must read a heck of a lot of data and take an average.

2. No way to determine if a stripe is looking at metadata or filesystem data w/o parity, unless they do some things that I won't get into.

3. W/O parity, then you can't easily identify the proper drive ordering, because it is difficult to figure out which stripe is the parity data for any given slice.   Parity moves from disk to disk, starting at an unknown disk # rotating to the next disk, going left or right, at the raid block size, also known.

Bottom line, the more sophisticated algorithms kick in here.   If you have the option, tell RR to search MUCH, MUCH longer, Several GB for example.   Or, if you are 100% sure of the drive ordering, and the blocksize, then just "teach it".    I.e.  Your raid controller should still be alive, so just look at the config and tell it the block size, and you certainly know which disk failed

You won't know the start/end of metadata, but if you teach it correctly, then figuring out where the filesystem partition begins by looking at the raw devices with a binary editor for the file system header.   You have a 60% probability of it being in a human-readable layout (i.e. not XORed reconstructed).  then if you get lucky, teach RR where the partition begins and you are recovered.

Author Closing Comment

ID: 35245126
Ended up unable to actually recover my data.

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SSD mSata 250GB 37 97
Amazon Glacier backup 6 37
How to search efficiently within zip files for specific file  or folder 6 95
Cannot unmount datastore 5 65
In this article we will learn how to backup a VMware farm using Nakivo Backup & Replication. In this tutorial we will install the software on a Windows 2012 R2 Server.
Employees depend heavily on their PCs, and new threats like ransomware make it even more critical to protect their important data.
In this Micro Tutorial viewers will learn how to restore single file or folder from Bare Metal backup image of their system. Tutorial shows how to restore files and folders from system backup. Often it is not needed to restore entire system when onl…
This tutorial will walk an individual through locating and launching the BEUtility application and how to execute it on the appropriate database. Log onto the server running the Backup Exec database. In a larger environment, this would generally be …

713 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question