Link to home
Start Free TrialLog in
Avatar of Cerekay
Cerekay

asked on

How do I determine parity bits in a HP Delayed Parity RAID 5 image

I have some data from an old HP server I want to recover.  I was able to get an image of 3 of 4 disks and I am working from those images.  I am working under the assumption it is a RAID5, 64KB block size with 512B sectors, HP backward Delayed Parity.  Downloading free recovery software or buying recovery software is not an option.  I am working on a C++ program to recover the data.  I found some good blog posts with some code for recovering data from normal RAID 5, and I modified it to work with Delayed Parity, if you know all of the variables that describe the RAID.  What I am trying to figure out now is what the 1st delay and delay values are, a well as indentifying the parity blocks.  Once those are figured out I have to get the code to write a new file where I remove the parity bits and write the blocks into a single file, which I can then try to make mountable and recover the files.

My question has a couple parts, but the core problem is that I can read and write the blocks, but I don't know which blocks are parity blocks.

Since this is an HP RAID and they wanna be individuals, I have to find out how to determine the 1st delay and delay values?  I am assuming the drives are formatted using NTFS under the RAID, and I am assuming the data is not corrupt and there would be valid information in the data blocks.  I have read posts on other sties about doing a manual inspection, but I am not an expert on the disk layout of NTFS and I don't know what to look for to identify data blocks vs parity blocks.

If there is no easy way to do a manual inspection to figure out the delay, is there a good way to figure out which blocks are parity programmatically?  I would like to write a small program that analyzes some of the data and try to determine which blocks are parity so that I can then run it over several stripes and calculate the delay values.  I can then also use it to search for the end of the header by looking for parity and assuming the first stripe with parity is the first stripe beyond the header.  Maybe there is some clever math one can do to test for parity bits, or maybe visually inspect a few lines of data?

Lastly, If there is a better way to figure out those values, without buying or downloading software, and I am not asking the right question, what would be a good way to approach this problem?  I reiterate that for the scope of this question downloading or purchasing software is not an option.
Avatar of David
David
Flag of United States of America image

As somebody who has actually written the very software you desire to code, let me first just be brutally blunt and tell you that if you don't already know the answers to these questions, then you simply do not have the skill set to undertake the task at hand.

You are not considering or even asking about dealing with unrecoverable read errors, location of metadata which is put in reserved areas of HDDs that are not exposed to end-user data.     Nor are you considering the possibility that a HDD may not be consistent because XOR parity might not be correct for any specific range of blocks in the first place.  You are ignoring consistency errors.  (Remember the HDD didn't have all zeros to begin with).

You also have to do pass-through to a non-raid Controller to access all the physical blocks. This means the array is offline, of course.
 Without addressing these as well, your recovery guarantees failure.  

Now that I am off the soapbox - don't worry about the delay. You do not need to care about  NTFS at all.   The controller certainly doesn't know the file system settings, neither do the disk drives.  Ignore all of that.

In order to determine parity,  I personally look for ASCII text strings at the beginning and end of  each physical disk at 4kb boundaries to see where a string begins and ends. Do this over a large range and you can do a statistical sample to see not only the stripe size, but how the disks are organized.   In R5, for example, you might have (P=parity)
P 0 1 2
3 P 4 5
6 7 P 8

or
0 1 2 P
P 3 4 5
6 P 7 8

or any other combination.    Take an XOR of every HDD at each offset and see if total is = FFFF or all zeros, depending on (assuming this is RAID5) you are using odd or even parity.  

Once you know for any given physical block# whether it is a parity, then you know where you have to output the stripe AS-IS, or as an XOR of the other two disks in the stripe and when to output the next chunk as if it is a new disk drive with the desired recovered data, or odd/even parity you calculate

There is more to this, but I really just want to discourage the effort, so many corner cases I haven't brought up that will cause you problems later on.
Avatar of Cerekay
Cerekay

ASKER

If I had the skill set to undertake the problem beforehand, I wouldn't be wasting my money asking questions on a paid forum which implies some type of information exchange by experts to people asking questions.  I would have simply written the code already and been done with it.  
Secondly, asking these questions is how you find out the information to build the skill set.  That is why my last question was a general, "Am I asking the wrong questions" question.  Even if I can't get all my data I will still develop a deeper knowledge of file structures than I had previously.
I am trying to get a specific instance of a program to solve the problem of a single data set using images of hard drives.  I don't care about writing software to handle attaching any random hard drive to a random controller and searching every use case of a recovery program.
Lastly, I already found open source code that can do most of the work.  I was able to compile and test the code on a software built RAID and both reconstruct, and mount the image I got afterward.  I merely need to modify it to work with the specific layout of the HDD I want to use it on.  If I run into other challenges I will address those as well, but I'm not going to quit trying to understand how this works because its "too hard".  
Trying to discourage someone from building a skillset is not helpful, a useful reply would be to call out the challenges you are referring to and set me on the right track to researching the solution.
ASKER CERTIFIED SOLUTION
Avatar of David
David
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial