Solved

Fix a Linux RAID Configuration Problem

Posted on 2011-09-12
11
445 Views
Last Modified: 2016-06-26
I am having an issue with a Linux RAID array I use as a NAS server.  Everything seems to work with no issue.  No files seem to be corrupted and everything is accessible, but I have found a problem with the configuration.

Hardware Configuration:
Ubuntu Server 10.10 (No GUI)
(1) 250 GB Western Digital HDD (Ubuntu)
      /dev/sda
(11) 1TB Western Digital HDDs (RAID5 Array)
      /dev/sd[bcdefghijkl]1

I used Wemin for the basic setup of the RAID and I believe that is where my problem began.

mdadm.conf:
DEVICE partitions /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1
CREATE owner=root group=disk mode=0660 auto=yes
HOMEHOST <system>
DEVICE /dev/sdb1 /dev/sdc1 /dev/sdd1
ARRAY /dev/md0 level=raid5 devices=/dev/sdb1,/dev/sdc1,/dev/sdd1,/dev/sde1,/dev/sdf1,/dev/sdg1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdk1,/dev/sdl1

Open in new window


sudo mdadm --examine –scan:
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=e6f6ec4e:fefe1ee2:e40a34ba:0ca07f7d
ARRAY /dev/md0 level=raid5 num-devices=11 UUID=e7d757df:a4dd8c94:e40a34ba:0ca07f7d

Open in new window


cat /proc/mdstat:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid5 sdj1[7] sdh1[5] sdg1[6] sdi1[8] sdl1[10] sdk1[9] sdd1[2] sdb1[0] sde1[3] sdc1[1] sdf1[4]
      9767599360 blocks level 5, 64k chunk, algorithm 2 [11/11] [UUUUUUUUUUU]

Open in new window


I'm concerned because the mdadm.conf file has two DEVICE lines (not necessarily a problem), but only one ARRAY line that uses partitions from both DEVICE lines.  The mdadm scan showed two different arrays.  I would like to fix the mdadm.conf file, but am concerned about loosing data.

Can I simply combine the two DEVICE lists?  Normally I would just guess and check, but I really don't want to loose this data.

Thanks for the help.
0
Comment
Question by:WickedShamrock
  • 6
  • 5
11 Comments
 
LVL 76

Expert Comment

by:arnold
Comment Utility
run
mdadm --examine --brief --scan --config=partitions
mdadm -Ebsc partitions
http://linux.die.net/man/8/mdadm

post the output of the above.
There are other examples for mdadm to get info.

It should tell you which array is degraded and which is the failed disk.
0
 

Author Comment

by:WickedShamrock
Comment Utility
Thanks for the response.  The following are the results of those commands.  Hope that helps.  As far as I understand it nothing is degraded, but if I really knew I wouldn't be posting on this forum!  Thanks again.

sudo mdadm --examine --brief --scan --config=partitions
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=e6f6ec4e:fefe1ee2:e40a34ba:0ca07f7d
ARRAY /dev/md0 level=raid5 num-devices=11 UUID=e7d757df:a4dd8c94:e40a34ba:0ca07f7d

Open in new window


sudo mdadm -Ebsc partitions
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=e6f6ec4e:fefe1ee2:e40a34ba:0ca07f7d
ARRAY /dev/md0 level=raid5 num-devices=11 UUID=e7d757df:a4dd8c94:e40a34ba:0ca07f7d

Open in new window

0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
you have two devices with the same name /dev/md0

you should make sure that each device is unique /dev/md0 /dev/md1.

0
 

Author Comment

by:WickedShamrock
Comment Utility
That's what I was originally thinking, but the output of df makes me think that it's not two separate devices.  The filesystem size is correct for an 11 1TB disk RAID5 array.

(11 - 1) * 1024 = 10240

vs.

(3 - 1) * 1024 = 2048
(8 - 1) * 1024 = 7168 = 9216


df /mnt/raid/:
Filesystem     1K-blocks    Used         Available    Use%   Mounted on
/dev/md0       9614330456   8049836092   1076114404   89%   /mnt/raid

Open in new window



0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
10*1024*1024*1024=10737418240kB=10TB

/var/log/messages should tell you that there is an error when you try to start /dev/md0 as a raid 5 consisting of 3 devices as not possible i.e. /dev/md0 is a raid5 made up of 11 devices.
i.e. the order of your RAID creation was 3 device RAID 5 and then 11 device RAID5 The last one is the one that is enforced.

0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 

Author Comment

by:WickedShamrock
Comment Utility
That makes a lot of sense.  I looked at the log file (attached)– I don't see anything glaringly wrong (nothing states failed), but I would like to go ahead and try combining the two DEVICE lines and restarting the array.  If we are wrong, will that damage anything automatically? ie Will it trigger some kind of repair that would potentially cause me to loose data?  

I'm not sure if I am blowing this whole thing out of proportion.  I consider myself computer savvy, but this is my first experience with RAID and don't fully understand how mdadm functions.  Thanks again.
messages
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
Comment out the /dev/md0 that assembles the /dev/md0 as raid 5 of three devices.
If you try it the other way /dev/md0 as a raid 5 of three devices it will fail. The data will be lost if you reinitiate/recreate the /dev/md0 as a raid 5 of three devices.

The error will show up during bootup when /dev/md0 is reassembly attempting three device.

the command dmesg will also show an error when /dev/md0 assembly is attempted with the raid 5 three devices.

But all good practice is to have a good backup.
Do not erase/delete/initialize the array.
0
 

Author Comment

by:WickedShamrock
Comment Utility
Thanks again for the help.  I am in the process of backing up my data before I try anything.  I will keep you posted.
0
 

Accepted Solution

by:
WickedShamrock earned 0 total points
Comment Utility
I finished backing up my data and have fixed the problem.  I somehow had two sets of superblocks.  It was probably the remnants of my initial "testing" period when I first started playing around with mdadm.  I the following superblocks assigned:

1. /dev/sd[bcd]
2. /dev/sd[bcdefghijkl]1

I had a superblock assigned to the raw drives of group 1.  All I had to do was run --zero-superblock on group 1 and I was good.

I also removed the DEVICE lines from my mdadm.conf file and changed the ARRAY line to use the UUID, but it was working fine prior to changing it.
0
 

Author Closing Comment

by:WickedShamrock
Comment Utility
I appreciate the ideas provided, but the actual problem was not fixed by anything proposed.
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
Pointed out that you had one device /dev/md0 referenced by two raid setup groups.
Suggested Backups before trying anything which is a requirement since if I told you to try something and the RAID became corrupt .... loss of data.
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Previously, we learned about how professional data recovery experts can recover lost data from a damaged or compromised hard drive (https://www.experts-exchange.com/articles/28564/What's-Possible-with-Modern-Data-Recovery.html). Below, you will find…
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now