Using mdadm to recover from a multi-drive RAID5 failure

I have a Centos4.2 server that has a software RAID-5 volume consisting of 4 disks, /dev/sd{a-d}7.  The other day it reported a failure on /dev/sdc7.  I was able to start rebuilding it but during the rebuild it reported another failure.

With the older raidtools I knew how to edit /etc/raidtab and set the known bad disk to failed-disk and then force a rebuild of the array.  I'd like to do that with this machine to see if I can recover the array, but I've never done it using mdadm.  /etc/mdadm.conf doesn't have much useful information in it, only:

ARRAY /dev/md0 super-minor=0
ARRAY /dev/md1 super-minor=1
...

How can I go about trying to tell mdadm that /dev/sdc7 is the truely failed disk and to try to rebuild using the other 3 disks?

-Bruce
brucepennypackerAsked:
Who is Participating?
 
ravenplCommented:
If the array is out of sync - You can't. If it is, just plug the disk - kernel will find new disk and use in array.

Or try assembling array from scratch
mdadm -A /dev/mdX -YourOptions -l5 -n4 /dev/sda7 /dev/sdb7 missing /dev/sdd7
but it will propably fail, if disks are unsyc.
0
 
ravenplCommented:
If You have two disks failed in raid5 volume You can't recover. That's the design of raid5.

If only sdc7 is failed, then
mdadm /dev/mdX -f /dev/sdc7 # hot-fail
mdadm /dev/mdX -r /dev/sdc7 # hot-remove sdc7 from mdX
mdadm /dev/mdX -a /dev/sdc7 # hot-add and start rebuilding
0
 
nociSoftware EngineerCommented:
Be aware that if /dev/sdc7 failed,
other partitions on /dev/sdc might also fail...., it's a bit depending on the error, if it's just bad block you might get away with it for now...

Also have a look at the smartmontools, these can help diagnose health of disks before failure.
(disks that is, not partitions) /dev/sd? , /dev/hd? etc.

Better prepare for other partitions on /dev/sdc failing.
0
 
brucepennypackerAuthor Commented:
ravenpl - As I said in my original post I have successfully recovered from multiple-disk RAID5 failures using raidtools.  It's possible to have multiple disks fail simultaneously if a drive controller fails, if a cable that multiple drives are on is loose, if an external disk array loses power, etc.  Here's a web page that describes how to do this using raidtools:

http://software.cfht.hawaii.edu/linuxpc/RAID_recovery.html

What I would like to know is how to do it using mdadm since that's replaced raidtools.
0
 
brucepennypackerAuthor Commented:
You were close.  I just had to do the following:

mdadm --assemble  --force /dev/md5 /dev/sda7 /dev/sdb7 /dev/sdc7 /dev/sdd7

This has recreated the array successfully, still with one failed disk.  I was able to mount it, and after it recovered its journal I was able to copy all the data off before replacing the drive & rebuilding the array.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.