[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now

x
?
Solved

Using mdadm to recover from a multi-drive RAID5 failure

Posted on 2006-07-20
5
Medium Priority
?
4,210 Views
Last Modified: 2013-12-16
I have a Centos4.2 server that has a software RAID-5 volume consisting of 4 disks, /dev/sd{a-d}7.  The other day it reported a failure on /dev/sdc7.  I was able to start rebuilding it but during the rebuild it reported another failure.

With the older raidtools I knew how to edit /etc/raidtab and set the known bad disk to failed-disk and then force a rebuild of the array.  I'd like to do that with this machine to see if I can recover the array, but I've never done it using mdadm.  /etc/mdadm.conf doesn't have much useful information in it, only:

ARRAY /dev/md0 super-minor=0
ARRAY /dev/md1 super-minor=1
...

How can I go about trying to tell mdadm that /dev/sdc7 is the truely failed disk and to try to rebuild using the other 3 disks?

-Bruce
0
Comment
Question by:brucepennypacker
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 43

Expert Comment

by:ravenpl
ID: 17152425
If You have two disks failed in raid5 volume You can't recover. That's the design of raid5.

If only sdc7 is failed, then
mdadm /dev/mdX -f /dev/sdc7 # hot-fail
mdadm /dev/mdX -r /dev/sdc7 # hot-remove sdc7 from mdX
mdadm /dev/mdX -a /dev/sdc7 # hot-add and start rebuilding
0
 
LVL 40

Assisted Solution

by:noci
noci earned 60 total points
ID: 17153252
Be aware that if /dev/sdc7 failed,
other partitions on /dev/sdc might also fail...., it's a bit depending on the error, if it's just bad block you might get away with it for now...

Also have a look at the smartmontools, these can help diagnose health of disks before failure.
(disks that is, not partitions) /dev/sd? , /dev/hd? etc.

Better prepare for other partitions on /dev/sdc failing.
0
 

Author Comment

by:brucepennypacker
ID: 17153830
ravenpl - As I said in my original post I have successfully recovered from multiple-disk RAID5 failures using raidtools.  It's possible to have multiple disks fail simultaneously if a drive controller fails, if a cable that multiple drives are on is loose, if an external disk array loses power, etc.  Here's a web page that describes how to do this using raidtools:

http://software.cfht.hawaii.edu/linuxpc/RAID_recovery.html

What I would like to know is how to do it using mdadm since that's replaced raidtools.
0
 
LVL 43

Accepted Solution

by:
ravenpl earned 90 total points
ID: 17153905
If the array is out of sync - You can't. If it is, just plug the disk - kernel will find new disk and use in array.

Or try assembling array from scratch
mdadm -A /dev/mdX -YourOptions -l5 -n4 /dev/sda7 /dev/sdb7 missing /dev/sdd7
but it will propably fail, if disks are unsyc.
0
 

Author Comment

by:brucepennypacker
ID: 17187474
You were close.  I just had to do the following:

mdadm --assemble  --force /dev/md5 /dev/sda7 /dev/sdb7 /dev/sdc7 /dev/sdd7

This has recreated the array successfully, still with one failed disk.  I was able to mount it, and after it recovered its journal I was able to copy all the data off before replacing the drive & rebuilding the array.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I am a long time windows user and for me it is normal to have spaces in directory and file names. Changing to Linux I found myself frustrated when I moved my windows data over to my new Linux computer. The problem occurs when at the command line.…
SSH (Secure Shell) - Tips and Tricks As you all know SSH(Secure Shell) is a network protocol, which we use to access/transfer files securely between two networked devices. SSH was actually designed as a replacement for insecure protocols that sen…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question