dead raid 5

Not sure anyone can help on this one. What I am about to post goes against anything that I was trained to do.

I got a call from a new (in a panic) client about their dead HP Proliant ML350 server. I go to the site and this server was configured with a RAID 5 built around 4 18.2 GB SCSI hard drives.

The bad news is that 2 of the hard drives appear to be dead. The worse news is that the client pulled the drives out and forgot the order they were originally setup in (!).

Any suggestions on where I might be able to begin in reparing the array? Or at least some suggest on how to pull the data from the drives?

And the icing on the cake.... no one at this site has been running the backup tape!

bnrtechAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

jakethecatukCommented:
What happens when you put the two 'dead' drives back into the array?  Do they spin up are they still showing as 'dead'.

If memory serves, it shouldn't make any difference where you put them because the raid controller will look at the disk signatures to work out what position that disk has in the array.
0
jakethecatukCommented:
BTW - if both drives are 'dead' - then you've lost all  your data as a raid 5 array can only sustain the loss of one drive.

You could try sticking the drives in a fridge overnight to cool them down to a very low temperature, this may allow them to spin up for about 30 mins or so to give you a slim (very slim) chance of getting a backup.
0
StinkyPeteCommented:

If you can identify and confirm as dead the two drives, try the old favourite, Spinrite at GRC.com

Its file system independent, and has saved many a dead drive for us - this is your last option before sending to data recovery guys.

Expect anything up to 5 days or so for Spinrite to complete (It tries up to 1000 reads per sector)



0
Powerful Yet Easy-to-Use Network Monitoring

Identify excessive bandwidth utilization or unexpected application traffic with SolarWinds Bandwidth Analyzer Pack.

Paul MacDonaldDirector, Information SystemsCommented:
As [jakethecatuk] mentions, I'd try just putting the drives back in to see what happens.  At this point you have nothing to lose.  
Then I'd ask the customer how long the alarm had been going off telling them one of the drives in the array was faulty.  You know both drives didn't die at the same time...
If you can't get the array up at all, you may be able to send the disks off for recovery.  On the up side, this is a good time to talk to the customer about value added services like maintenance and backups.  Both are cheaper than losing all your data.
0
ClintSwineyCommented:
Not to be all doomy and gloomy.... But I have worked with several situations that are almost exactly described to a "T" as you explained. I have spent hours working with and attempting to recover the data. I have only been partially successful in the past.

One lesson I learned through all this was that while the data recovery is important it should be put on the back burner and the server should be brought back up or another server be brought in and setup so the users can work. Even with a total data loss at least they can start somewhere, which will have to happen anyway. Once that is completed then begin diagnosing the Data Loss.

If the RAID card holds the configuration of the drives and you don't know the order they were originally installed I do not know of any way to reassemble them other than by trial and error. As long as you don't erase the RAID config in the card you can mix and match till you get it right. But with 2 drives bad it may not be possible.

Now it's time to contact the Data Recovery Specialists.

It will be expensive to recover the data but it may be your only hope to ever see that data again. I do not believe that there are proper tools available to do this in the field. Although, there may be a way to read the beginning sectors of the drives and somehow extract the bit of data that contains the drive number. That being said, the Data Recovery Specialists would be your best bet even if you did have the drives in the proper order. If there is more than one failed drive, in my opinion, it should be left to the professionals. If not you could FUBAR the data
forever.
0
DavidPresidentCommented:
First, the firmware supports drive roaming, so the metadata on the disks and in the NVRAM on controller take care of drives that get around.   Do you have any access to event log info?

Most likely one drive failed, the other had some errors, and so the controller took it offline to protect against getting things worse.   If the customer is willing to spend $1000+ then contact somebody like gillware or ontrack or seagate recovery, and get a quote and let them just handle it. If you are in a major city, you may even have a local company where you can drop the disks off 24x7, and typically get a reconstructed volume back within 24 hours.  

There are software tools (runtime.org reconstructor), but since these are 18.2GB SCSI disks, they were probably manufactured around 2002-2003 so there would be a lot of useful diagnostic info in the log pages that would assist with a more professional recovery process.  Brute force rebuilds, when 2 drives have failed, on ancient drives is not a good idea, this is a high-risk recovery for such software.   Pay the money (at least it is not yours), and I have little doubt that your customer will be online with minimal data loss in 24-48 hours.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
bnrtechAuthor Commented:
Thanks for all the good feedback. More tidbits I can offer....

- I can not gain access to any logs. The post process starts up but there is not access beyone the post process.

- During the post process it warns that the order of the drives have changed. I have shuffled the drives and this prompt continues.

- 2 of the 4 hard drives have red led indicators the remain on during the entire post process

- I am trying the jakethecatuk's suggestion in putting the drive in the fridge and see where that might get me (if anything)

- Also researching stinkypete's suggestion on Spinrite at GRC.com. Very interesting tool that I never used.

- In the end I think it will be dlethe's suggestion about contacting a disaster recovery firm. We are in Baltimore, Maryland. So if anyone has any suggestions local to our area please let me know
0
DavidPresidentCommented:
TURN OFF COMPUTER . Every minute disks are powered up risks further damage (unless you have the software and equipment to assess the nature of the failure, and it confirms that powering disks is "safe")

  Do not run spinrite - it is not an appropriate solution for a RAID environment.  
0
StinkyPeteCommented:
dlethe - Very interested to hear about your comment of inappropriateness of sprinrite in a RAID environment. I understand the tool works at a fairly low level, below the file system, I suppose the implication is that the RAID controller tracks, or is aware of the low order of bits (flux) on the drive? Please can you point me to a source which explains why this is not appropriate? If you want points I shall turn this into a question.
0
DavidPresidentCommented:
Here is a generalization, w/o going into vendor/controller specifics.

It is vital for a controller to know what blocks and disks are unreadable, and there are no surprises.    Spinrite will gladly "repair" an area of the HDD reserved for both filesystem data and metadata.   In a failure scenario, you could be running in degraded mode on a now 1-disk RAID1, and you rewrite data.. If you take out a disk, cure that particular bad block, you will restore it by recovering the stale data.  Now the same logical block has different values for the 2 physical disks.  the RAID does not see any errors, because the software fixed it.  Which disk is correct?   What if you are doing a rebuild and you had some bad blocks that are now magically good.  The low end controllers don't have non-volatile memory or enough space to maintain a table of unreadable disks/blocks,  it learns it has a bad block and that it needs to calculate the correct data from parity by encountering a read error.  If spinrite gets rid of the read-errors, the controller thinks data is good.  Before with a read error, it knew it had to repair the data and rewrite the parity.  IF the disk fixes itself, then stale data will get introduced .

In a disaster situation where you have bad blocks, rebuilds, timeouts, restriping, etc, then the controller has to rely on reading the nature of a I/O error from the hardware, so it can react properly. If spinrite "restores" data so that a disk is no longer "dead" or degraded in the eyes of the controller, then stale data can make its way into production data.  If a drive was bad, and a controller was running in degraded mode because of that, and spinrite "repairs" the disk .. then you can see how this would hurt.

0
StinkyPeteCommented:
bnrtech any update/progress ?
0
babbahotepCommented:
try r-studio

http://www.r-studio.com/

they have a free trial and i was able to fully recover a defunct 3 TB raid 5 with this softwrae in a day or so...

it even works if one or more drives are missing (you wont be able to recover the data thats on the stripe of the second drive though)
0
bnrtechAuthor Commented:
Sorry for the delay. I proceeded with the advice that dlethe offered and sent it off to ontrack. I had heard good things about ontrack.

However, so far they do not yet have a resolution and the case management process with OnTrack has been (to say the least) a bit of a nightmare.

Hopefully I will have a confirmed update shortly.
0
DavidPresidentCommented:
If ontrack can't get it ... the rest assured it is gone forever.
0
bnrtechAuthor Commented:
Unfortunately I see your point. Last time I spoke to an OnTrack tech the guy was asking me about the two seperate arrays that he noticed setup on the hard drives and why didn't I send in the 5th drive.

To which I replied there were only 4 drive and they were configured as a single array (RAID 5).

I am guessing the outcome is not going to be pretty.
0
DavidPresidentCommented:
you never know .. perhaps this one disk was once part of a RAID5, and he is looking at some old metadata that is on a parity stripe, so it may not have changed.
So if they can get back the 2nd drive to fail, plus the 2 disks that didn't fail are good, you are home free.   Maybe they are just trying to find out which of these 2 "failed" drives has the stale data (the one he asked about probably had it), to use as the disk to rebuild the RAID.

Don't give up yet.
0
babbahotepCommented:
I ontrack fails I would hook up all drives to a sata controller and run r-studio. depending on the controller you can even leave them on the original raid controller.

for me r-studio figured out the block size, stripe order and disk order so I could recover all the files...
0
bnrtechAuthor Commented:
So in the end OnTrack was able to restore most data. But some data was damaged beyond repair.

End of the day, if these jokers would have simply maintained the tape backup they would not have been in this mess. I have since set them up with mozy.com because they have proven they are undependable to rotate tapes (sad but true)
0
DavidPresidentCommented:
curious ... what did ontrack charge?  it would be great to establish what they typically charge for a small raid reconstruction
0
babbahotepCommented:
i asked local recovery comapnies in the bay area and they usually had free inspection and did an upfront estimate on my description $700 - &1000 for 2TB.


what I did was extendt he raid and the the new hard drive failed....rendering the raid unusable
0
babbahotepCommented:
i did it myself then with r-studio... took a while but worked...
0
bnrtechAuthor Commented:
ontrack cost was a little higher then what I expected (I was expecting around $2500-3000).

In the end they charges $500 for the discovery/analysis process. Once we agreed to recover the data it was an additional $3500. If I understood correctly, the charge is based up level of effort and size of data.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Storage Hardware

From novice to tech pro — start learning today.