Link to home
Start Free TrialLog in
Avatar of epichero22
epichero22Flag for United States of America

asked on

Need help solving a hard drive failure in RAID 1

Hi,

Running 2 x WD1003FBYZ-010FB in a RAID 1 configuration, using the onboard RAID controller in a Gigabyte GA-970A-D3P.  Installed both hard drives, configured them in RAID 1, installed Windows 7, and everything was working fine (WD Lifeguard found no errors).

Recently, one of the hard drives malfunctioned and it broke the RAID.  Booting into Windows now shows two physical hard drives.  I ran WD Lifeguard again, and the first of the two has bed sectors, while the other one tested fine.  But here's the problem: the one with the bad sectors is the computer's primary boot disk and what the client has been using for both programs and data (both of which work just fine as well).

I'm thinking about simply imaging the first disk to a known good disk, rebuilding the array, and then running SFC once it's loading again, unless there's a better approach you can recommend.  After that I will RMA the hard drive and have an extra one for future local backups.

Your thoughts?  Can you recommend a software?  I was thinking about using the GParted for this one.
Avatar of Joseph O'Loughlin
Joseph O'Loughlin
Flag of Ireland image

If the good drive is the second one, that's the one to clone.
Check the raid controllers documentation for any nuances to the rebuild process.
Should not need to clone it. You would just need to boot with the good drive and then replace the bad drive with the good one.  

Rebuilding an Array:
Rebuilding is the process of restoring data to a hard drive from other drives in the array. Rebuilding applies
only to fault-tolerant arrays such as RAID 1, RAID 5, and RAID 10 arrays. To replace the old drive, make sure
to use a new drive of equal or greater capacity. The procedures below assume a new drive is added to replace
a failed drive to rebuild a RAID 1 array.

While      in      the      operating      system,      make      sure      the      Chipset      driver      has      been      installed      from      the      motherboard      driver      
disk.      

Then      install      the      AMD      RAID      Utility      (go      to      Application      Software\Install      Application      Software      and      select      AMD

RAID Utility      to      install).      Then      launch      the      AMD      RAIDXpert      from      All      Programs      in      the      Start      Menu.

Step 1:
Enter      the      login      ID      and      password      (default:      "admin"),      
and then click Sign in to launch AMD RAIDXpert.

Step 2:
Select the RAID array to be rebuilt under Logical
Drive View and click the Rebuild tab in the Logical
Drive Information pane.

Step 3:
Select one available drive and click Start Now to
start the rebuilding process.

Step 4:
The rebuilding progress is displayed on the screen
and you can select Pause/Resume/Abort during
the rebuilding process.

Step 5:
When      done,      the      array's      status      on      the      Information page
in the Logical Drive Information pane will display as
Functional.
Avatar of epichero22

ASKER

thanks for the comments, and my only concern is that the hard drive with the bad sectors has had changes made to it, while the HD thats still in tact has not.  remeber that the bad HD is still being used as the primary, and this has continued for an unknown amoutn of time.  and suggestions with regards to this?
SOLUTION
Avatar of nobus
nobus
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ahhh ok. Your best best to take out the drive with the bad sectors and try to clone it to a good drive. You could first try to repair the bad sectors and hopefully it will be able to read off the sectors to known good sectors and just mark them bad. You could then clone it to a good drive and then recreate the mirror afterwards.

One word of advice though, depending how bad those sectors are, it may take awhile to try to repair. I had a drive that took all night to recover, but i was at least able recover and copy over to another drive afterwards.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'm trying to fix this problem but have been getting mixed results.  It seems that LifeGuard says different things when changing the SATA controller type (sometimes it's OK, sometimes bad sectors).  Should I set it to Legacy IDE, RAID or AHCI?  I plan on rebuilding the RAID 1 in Windows.
Set it to AHCI, not RAID. If lifeguard shows problems with the disk, replace it unless the utility asks you whether it should try repairing the problems, and a further scan shows no issues.
If set to AHCI, the computer wont boot: i get a blue screen during the windows starburst.  I tried installing the Ahci driver from Gigabyte but to no avail.  Your thoughts?
That's why I earlier suggested doing a fresh OS installation. During that you can load the controller's drivers.

Alternatively you could use a non-free utility from paragon to backup, then restore to different hardware where you can also include the drivers:

http://www.paragon-software.com/home/brh/
I actually was able to load AHCI drivers without a reformat on my other computer when i upgraded to SSD.  There's some article somewhere about editing the msachi registry entry and that will allow Windows to use the drivers.

Anyways, I guess my question at this point is is it worth formatting to get the GPT file system?  Or should i call it a day.
GPT? Only for a disk/Array that is larger than 2TB and will only hold data, not the OS.
The reason you are being advised to avoid motherboard embedded raid is because of the limitations with rebuild.  I believe if you are trying to recover from a second drive failure subsequent actions, such as toggling off the mb raid (which had the configuration stored) are compounding problems.
check your business continuity insurance
Document the 'recovery' steps already tried
Send the lot to a disaster recovery specialist company like Ontrack.  It will be time consuming and expensive.
It's not just limitations with rebuild. Mainly it is because fake-RAID is notoriously unreliable, performs badly, etc. etc.

Actually the only reason it exists is for marketing purposes. At one stage a manufacturer included such a chip on it's board, and so all the others had to follow. besides, the chip is very cheap so it hardly adds any extra investment for the manufacturer. The customers are then just lured into thinking it actually is worth having.

Of course there are some benefits provided you don't use it in RAID mode. Many boards include more than one such chip and that adds up to more SATA ports, so you can attach more than 4 internal disks without adding a separate controller.
OK, I disabled the RAID on the motherboard, but left the Controller type as RAID.  It's either that mode or Legacy as AHCI doesn't work.


I'll log into Windows and setup the RAID 1 using the Disk Manager.

Since we're on the topic, can you recommend a good RAID card?
I don't really think it is worth buying hardware RAID controllers, particularly not for desktop PC's, as the price is usually pretty high, and OS built-in RAID already is very good. For servers it's a different issue, as there, with hardware RAID controllers you can easily swap a failed disk without needing to shut the server down first (hot-swap). But as good server hardware should already come with a hardware RAID controller it shouldn't be necessary buy anything else.

But generally Adaptec builds good controllers. One thing to also remember, if a RAID controller is used you will also need enterprise class disks. Consumer grade disks don't work properly or reliably with RAID controllers.
Consumer grade disks don't work properly or reliably with RAID controllers.


Why is that?
Because they are optimised to work by themselves, rather than as a group behind a RAID controller, the most obvious difference is in Error recovery. On detecting an error consumer drives goes into what is known as deep recovery mode, which can take tens of seconds to complete, whereas an Enterprise disk doesnt do this and signals (to the RAID controller) very quickly that it has an error, so that the RAID controller can then go through its error recovery process.
Consumer grade disks have a long timeout when they encounter bad sectors, during which they retry again and again until it is finally marked as bad. Enterprise class disks have a much shorter timeout. RAID controllers, if they have to wait for so long for a disk will think the whole disk is bad and therefore will tell you it has failed even if it is still OK. This doesn't happen with enterprise class disks.
In my experience LSI raid controllers are easier to recover from in the event of disk failures.  It's best if the original configuration and sequence of failures is known, so the ability to pull the raid controller's logs is key.  Consumer desktop raid on motherboard don't offer this.  

Again I advise getting specialist intervention.  Aside from the Disaster Recovery companies, what channel did you use to purchase the system originally.  Is there a company who sell your servers product with the necessary expertise?
I ended up breaking the RAID, updating the Windows registry to AHCI, and rebuilt the array in Windows.  Now everything works fine - no more complaints about bad sectors and SFC turned out OK.  Not the exact solution I was looking for, but thanks for the help and advice.
Was the rebuild done using windows native or gigabyte raid utilities?
I ended up using Windows.  Didn't want to chance it with the on-board per the bad reviews.
Glad you're back up and running