need advice - recovering raid 5 array member disk

Hi All -
For everyone who wants to point out the obvious, I'll start with that.  I'm an idiot, I'm a bonehead.  I thought it could never happen to me.  I have no excuses and I'm trying to dig myself out of a hole of my own making.


At this point I'm waiting to hear from as to the options they can provide.

ESXi hypervisor 4.0 U1  

3ware 90650se-4LPML controller
Raid 5 - 4 drives 1tb each
vmfs3 file system
1 important vm - windows 2003 sbs 2 vmdk 207 gb 465 gb ~850 gb information total on the drive array
All files except the vmdk have been recovered

Situation -
1 drive in the array failed, array went into recovery mode.  Recovery was not successful paused at 6%, 9%, 62% in that order
Array shows drive as OK, though will move it to WARNING or DEVICE-ERROR during recovery of RAID or attempts to copy data

Datastore and File Directory visible by:
- Mounting the data store in ESXi 5.0
- Mounting datastore using tw_cli 3ware command line tool and Open Source VMFS Driver can see the file structure and was able to download all files except the vmdks
- Smart data says all rives OK.  Drives or seagate barracuda 7200 RPM, I believe SMART data is suspect

IMHO - 1 drive on the remaining array is suspect.  I believe if the drive can be recovered, the data can be recovered.  Potentially issues with the RAID controller

Attempted, in order, with Rebuild array / copy files through Open Source VMFS Driver attempted throughout
Rebuild array from degraded mode through array rebuild process via tw_cli by adding new 4th drive.
Recovery of data from within ESXi - copy vmdk from datastore. Copy on vmdk halts. Attempt abandoned
- Mounting datastore using tw_cli 3ware command line tool and Open Source VMFS Driver can see the file structure and was able to download all files except the vmdks
Recovery of raid via raid reconstructor.  Automatic settings could not determine raid.  Said needed manual configuration, Attempt abandoned
Repair of drive with spinrite spin right would not attempt - says the partition size reported by the BIOS is different than the one reported by the drive. Attempt abandoned
Clone of drive via clonezilla with VMFS support --> Clonezilla would not attempt  Attempt halted
Recover VMDK  / files via disk internals VMFS Recovery.  While still running, after 6 hours and still showing CPU activity, progress bar had halted.
Who is Participating?
DavidConnect With a Mentor PresidentCommented:
Here is the deal .
1.  It isn't that data is suspect on at least some of the surviving disks, it is that the disks are in deep recovery.   But that recovery is working because the rebuild isn't failing.  So good news.
2. Spinrite absolutely will result in at least partial data corruption in event it recovers any blocks that were already marked bad.  NEVER run that on disks in a RAID array.
3.  You have made things worse by what you have attempted.  
4. Those 'cuda drives aren't enterprise class, which is root cause. They are consumer disks that are designed to go into deep recovery rather than give up in 1-2 seconds so that the RAID controller can extrapolate the correct data and resector the bad block, rewrite and move on.
5. Those 'cuda drives CAUSED the predicament you are in right now, because deep recovery causes drives to lock up an the 3ware firmware failed the disk.

1. If you don't want this to happen any more, you are just going to have to replace ALL disks with enterprise class, or go to RAID10 (lesser evil than RAID5, but at least RAID10 is lower risk).

2. Pay a pro to get this data back. You do not have the software to recover from partial rebuilds, and such software is not available retail anyway.  

3. If you are willing to live with partial recovery, and risk 100% data loss due to continued use of the disks, then just copy what you can while it is degraded and prioritize your most important files first.

Bottom line, I am not going to sugar coat it .. You are in over your head and do not have the experience and the software necessary to recover data, or to assess the likelihood that the act of recovering with your plan would cause irrevocable data loss.
I have successfully used Raid Reconstructor from

The process is that you install the RAID-5 drives in a non-RAID controller and analyze them there.  You can do them all at once or individually.  It attempts to reconstruct the proper order of drives and stripe size.

If you have enough disk space, the next step is to have the program make an image of each of the drives to your local hard drive.  From those images, the program will allow you to recover data using the organizational information from the first step.

What I think is especially important to your issue is the step of copying the drives individually.  If a second drive is failing (as you suspect), you will find out during the copy process.

There is a free trial version and the full program runs about $100.

As always with hard drive problems, be aware that the longer you work on the drive attempting to recover data, the more likely the failure will become worse and be more difficult (=more expensive) for a professional recovery shop to recover your data.

Depending on the value of the data, you may want to shut it all off and have the pros deal with it.  Otherwise, I'd try Raid Reconstructor.
dpedersen13Author Commented:
Thanks for the detail and help on understanding what is happening right now.  Especially the details on the 'cuda drives and the reason for the RAID card lockup.  It makes sense, figuring trying to figure out that root cause was really bothering meI. 've stopped all attempts and I am impatiently awaiting a call back from ontrack.

In the meantime I'me recovering data files from other sources Fortunately most the data is in week old backups.

Thanks again,
Get 10% Off Your First Squarespace Website

Ready to showcase your work, publish content or promote your business online? With Squarespace’s award-winning templates and 24/7 customer service, getting started is simple. Head to and use offer code ‘EXPERTS’ to get 10% off your first purchase.

no, prob, hope I didn't rub you the wrong way.  If I was, accept my apology.  Some drives are totally unacceptable for RAID5, and those disks are one of them. The firmware is effectively incompatible.

In fact, dig up the WD data sheets and they even tell you that those disks are not warrantied for use on RAID5.
P.S. RAID reconstructor would not have helped in this scenario, and would have likely made things worse.
"would have made things worse"
What I suggested should have been read-only.  How would it have made things worse?
When you read a HDD you run risk of stressing the drive to the point of failure, as well as turning what a controller thinks is an unrecoverable read error into a recovered block.  

Depending on the specifics, you could corrupt data by turning a known bad block which is already being handled into a repaired block which would invalidate all data on a stripe.
dpedersen13Author Commented:
Let's hope I didn't do that during the recover process on the controller.  Disks go out tomorrow.  Thanks again!

Thanks for the info.  It is appreciated.

As far as "run the risk of stressing..." I did try to cover that with my comment about "..  the more likely the failure will become worse".

I had not considered the conversion of the unrecoverable read error into a recovered block and am curious about this.  I recognize how the enterprise drives have shorter timeouts for proper use with RAID configurations, but am not aware of differences in reallocation of bad blocks.  If I read your comments correctly, an enterprise drive will not do the reallocation but will rely on the RAID controller to perform that function.  Do I have this correct?

If that is correct, does this mean that an enterprise drive used with a non-RAID controller will never reallocate bad blocks or is it only when instructed not to do so by the RAID controller?

If you had any references I could use to get better educated on this I'd appreciate links to them.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.