Link to home
Start Free TrialLog in
Avatar of Dahoe
DahoeFlag for Ireland

asked on

RAID0 stripe Drive failed how to recover files?

Have a desktop, Dell XPS Studio, that came with 2 1TB hard drives set up as Raid 0. One of them is now failing & it won't boot, get the error message "AHCI Port0 Device Error Press F2 to Continue". F2 brings you to the BIOS setup and it's just stuck in that loop.
The client said it boots up every now and then but i've tried it at least 20 times & no joy so reckon it's after getting worse.
Any way of extracting the files from this as they've no backup?
ASKER CERTIFIED SOLUTION
Avatar of Joseph Hornsey
Joseph Hornsey
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
A hard drive recovery service might be able to do something for you, but if this is a home PC it probably isn't worth the expense.

For that matter, even a true RAID won't always allow you to recover data. If something was deleted or corrupted, RAID doesn't always save you.

Sounds like this is something you are working on for a client? This is your chance to put them on to the idea of proper backups.
Avatar of Dahoe

ASKER

Hi Joseph, thanks for the quick reply.
I understand RAID but was wondering is there any procedure i can go through with any software to recover the stripe as both drives are still detected by Windows so not mechanically dead yet.
When i have a drive that fails i can recover the files with software if i catch it in time, is there any similar software for rebuilding RAID arrrays to recover files?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Note, the runtime.org software will NOT work to clone disks with I/O errors.  It doesn't have the intelligence to deal with it and will just cause further data loss.

Your immediate concern if you won't pay for professional recovery and are willing to lose all hope of getting your data back is to first do a bit-level clone of your disks to scratch drives, and then work with the copy.
Dahoe,

David's comments are interesting, and I'm tempted to do some of that in a lab... if it works, that'd be really cool.

Sorry about the explanation of RAID... we don't always know the skill level of those who post.  I didn't mean to be condescending.

Good Luck!
Well, understand what i wrote is not a tutorial.  There are more things that need to be done to mitigate risk.  I skipped them because there is no way you have the necessary hardware and software so it is moot.

But for benefit of others . here are a few other things that one needs to be aware of

1.  Don't even clone the disk unless you have decent test bench hardware/software.  You need to assess the likelihood that the source drive will survive the copy.   Techniques vary, but conceptually if the disk is in degrading condition for whatever reason, you need to get the bunny suit out and put it on your $20K test bench.   If problem is related to motors or voltage or something like that, then reasonably safe.  

2. Whether SAS, SATA, SCSI, Fibre channel, . there are numerous programmable parameters that deal with read/write error retry counters and thresholds and automated actions.   One needs to reprogram the source drive for a lot of things.   Consider how aggressively do you re-read till you give up?  How big is the native I/O size?  What if your filesystem is built on 4KB chunks and you are reading 8KB chunks to clone, and you have read error on the 2nd 8KB chunk that you throw away anyway.  

If you read 4KB at a time you would be successful, 8KB read and the whole I/O fails so you lose that 4KB.  (Or did you get really good software that starts with the 8KB read and if the read fails it tries to break it up)..

When cloning disk you want to minimize stress, and clonezilla is better than some at varying I/O size to minimize I/O count .. but it doesn't give you a lot of flexibility when you have partial successful reads.  (That's why those of us in the business write in-house code)

3. You need to map out bad block locations before you start, so you know what chunks of data that has known bad information so that can be factored into filesystem recovery and if anything uses those blocks then you can manually deal with it.

Conversely, your recovery might repair blocks that the RAID system already wrote off as bad years ago.  If you repair them then your filesystem recovery may interpret those blocks as live data and mess you up.   (Behavior different depending on RAID level).

4.  Knowing the I/O chunk sizes of the filesystem, RAID controller (or software RAID settings), and native I/O sizes of disks important too.  
Even if you successfully get errors, then there is a whole lot of hurt and steps one needs to recombine .. like that 4K chunk?  Was it in the middle of a file or in space not being used, or did I now "recover" data that just happens to be wrong and so it mucks up a file.

5.  All the consumer stuff is going to risk total meltdown of the disk because it recovers parts of the disk that you aren't using, and worse, those parts of the disk probably haven't been read for years and those are the ones that have lots of ECC errors.  The cloning software is going to clone 100% of the drive, it won't clone only the parts of the drive that are associated with a file.  

Best way to have successful recovery is to do bare minimum amount of I/O on a drive in stress.  We read the blocks associated with files (unless nature of problem has to do with the components that deal with moving heads to a new location) rather than read the whole drive and recover filesystem.   The good software doesn't have problems determining if the chunk of data that is known bad even needs to be read.

Not only that, depending on the drive, one can query it and see where the bad blocks are before you start.  Then recovery can skip over them entirely...


There isn't anything you can buy that will make correct decisions when it comes to making decisions about partial bad blocks and known bad blocks that is filesystem aware.  These are reasons why you pay somebody thousands of dollars for RAID recoveries and most of the time you have nearly 100% recovery.

Off the soapbox now.  As I said before if the data is worth a lot of money, take it to a pro. RAID0 recovery can be highly successful because it is easy to figure out exactly what blocks need to be read and where to put them so your damage is going to be limited to specific files.  (Unless the disk that died has physical block that logical block 0 was mapped to, and all the filesystem and directory entries .. if that is the case, then recovery prospects are grim )

P.S. forgot about the RAID controller,  if software RAID or hardware RAID, or fake RAID, you're going to have to deal with the actual physical/logical mapping of where the first block and subsequent blocks of data are.  Physical block #0 where the GPT/EFI / whatever partitioning map is NOT going to necessarily be at physical block 0 on either of the drives.  But that is filesystem/volume related recovery and you're not even there yet.