Solved

Using mdadm to recover from a multi-drive RAID5 failure

Posted on 2006-07-20
5
4,197 Views
Last Modified: 2013-12-16
I have a Centos4.2 server that has a software RAID-5 volume consisting of 4 disks, /dev/sd{a-d}7.  The other day it reported a failure on /dev/sdc7.  I was able to start rebuilding it but during the rebuild it reported another failure.

With the older raidtools I knew how to edit /etc/raidtab and set the known bad disk to failed-disk and then force a rebuild of the array.  I'd like to do that with this machine to see if I can recover the array, but I've never done it using mdadm.  /etc/mdadm.conf doesn't have much useful information in it, only:

ARRAY /dev/md0 super-minor=0
ARRAY /dev/md1 super-minor=1
...

How can I go about trying to tell mdadm that /dev/sdc7 is the truely failed disk and to try to rebuild using the other 3 disks?

-Bruce
0
Comment
Question by:brucepennypacker
  • 2
  • 2
5 Comments
 
LVL 43

Expert Comment

by:ravenpl
ID: 17152425
If You have two disks failed in raid5 volume You can't recover. That's the design of raid5.

If only sdc7 is failed, then
mdadm /dev/mdX -f /dev/sdc7 # hot-fail
mdadm /dev/mdX -r /dev/sdc7 # hot-remove sdc7 from mdX
mdadm /dev/mdX -a /dev/sdc7 # hot-add and start rebuilding
0
 
LVL 40

Assisted Solution

by:noci
noci earned 20 total points
ID: 17153252
Be aware that if /dev/sdc7 failed,
other partitions on /dev/sdc might also fail...., it's a bit depending on the error, if it's just bad block you might get away with it for now...

Also have a look at the smartmontools, these can help diagnose health of disks before failure.
(disks that is, not partitions) /dev/sd? , /dev/hd? etc.

Better prepare for other partitions on /dev/sdc failing.
0
 

Author Comment

by:brucepennypacker
ID: 17153830
ravenpl - As I said in my original post I have successfully recovered from multiple-disk RAID5 failures using raidtools.  It's possible to have multiple disks fail simultaneously if a drive controller fails, if a cable that multiple drives are on is loose, if an external disk array loses power, etc.  Here's a web page that describes how to do this using raidtools:

http://software.cfht.hawaii.edu/linuxpc/RAID_recovery.html

What I would like to know is how to do it using mdadm since that's replaced raidtools.
0
 
LVL 43

Accepted Solution

by:
ravenpl earned 30 total points
ID: 17153905
If the array is out of sync - You can't. If it is, just plug the disk - kernel will find new disk and use in array.

Or try assembling array from scratch
mdadm -A /dev/mdX -YourOptions -l5 -n4 /dev/sda7 /dev/sdb7 missing /dev/sdd7
but it will propably fail, if disks are unsyc.
0
 

Author Comment

by:brucepennypacker
ID: 17187474
You were close.  I just had to do the following:

mdadm --assemble  --force /dev/md5 /dev/sda7 /dev/sdb7 /dev/sdc7 /dev/sdd7

This has recreated the array successfully, still with one failed disk.  I was able to mount it, and after it recovered its journal I was able to copy all the data off before replacing the drive & rebuilding the array.
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
wifi not working on Raspberry Pi 3? 2 56
mcrypt_create_iv() is deprecated 4 161
How to Remove files with a Date in the Filename with Linux Scripting 3 45
Linux VM 6 90
Daily system administration tasks often require administrators to connect remote systems. But allowing these remote systems to accept passwords makes these systems vulnerable to the risk of brute-force password guessing attacks. Furthermore there ar…
I am a long time windows user and for me it is normal to have spaces in directory and file names. Changing to Linux I found myself frustrated when I moved my windows data over to my new Linux computer. The problem occurs when at the command line.…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question