• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 55
  • Last Modified:

Raid 60 failure

I have a 30 drive array that is broken up into a raid 6+0 arrays.  The raid controller software was reporting that a drive had failed, so I put in a new drive and during the rebuild process two more drives in the same array "failed".  The controller reports them as having failed, but I would guess that they are just giving ecc errors and it has dropped them from the array.  

Does anyone have any experience with forcing a drive that failed online, or would it be possible to take a drive out and dd/sector clone it to another drive so it will rebuild correctly.  I only ask as having to restore the 40+Tb of data from tape is not something I am looking forward to.

I am using an LSI MegaRaid adapter on an Ubuntu Server with the MegaRaid storage manager software.
0
AggieTex
Asked:
AggieTex
  • 2
1 Solution
 
DavidCommented:
If you don't want 100% data loss, then you really need to hire a pro.  Whomever does it is going to have to do some parity/xor testing, and take structures apart to look for time stamps on the data. Also the disks themselves have to be assessed for health.  The Megaraid event log would be of use, so do NOT blow that away by clearing anything out.

If you don't want 100% data loss (which is what you get with RAID60), then don't even think of doing this yourself.

If you force disk(s) online it guarantees data corruption of all 40TB, due to the errors you already have.  The amount of damage can be assessed but that takes a lot of experience that certainly can't be transferred in this forum

No you can not dd it.  The metadata won't be right. You also want to preserve the log pages internal of each disk before you dd, and also figure out what blocks are unreadable and do diagnostics.

It may not be that bad to PROPERLY reconstruct just that one stripe, and hope that the rest of the 40TB is OK.  

The rebuild failed due to multiple error scenarios.  A combination of unrecoverable read errors on the surviving disks and/or possible parity mismatches.  No way could I walk somebody through this, as you wouldn't even have the software you need.  You also certainly need a JBOD controller and scratch drives.

Hire a pro to help you, if you don't want to restore all 40 TB.
0
 
DavidCommented:
The question was basically, "Does anyone have any experience with forcing a drive that failed online, or would it be possible to take a drive out and dd/sector clone it to another drive so it will rebuild correctly."

Answered in #38389031, along with many reasons  -- don't do it.  Points should be rewarded as this is also valuable information for others.  Forcing a drive online in failed array is not a viable choice.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now