Fix a Linux RAID Configuration Problem

I am having an issue with a Linux RAID array I use as a NAS server.  Everything seems to work with no issue.  No files seem to be corrupted and everything is accessible, but I have found a problem with the configuration.

Hardware Configuration:
Ubuntu Server 10.10 (No GUI)
(1) 250 GB Western Digital HDD (Ubuntu)
      /dev/sda
(11) 1TB Western Digital HDDs (RAID5 Array)
      /dev/sd[bcdefghijkl]1

I used Wemin for the basic setup of the RAID and I believe that is where my problem began.

mdadm.conf:
DEVICE partitions /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1
CREATE owner=root group=disk mode=0660 auto=yes
HOMEHOST <system>
DEVICE /dev/sdb1 /dev/sdc1 /dev/sdd1
ARRAY /dev/md0 level=raid5 devices=/dev/sdb1,/dev/sdc1,/dev/sdd1,/dev/sde1,/dev/sdf1,/dev/sdg1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdk1,/dev/sdl1

Open in new window


sudo mdadm --examine –scan:
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=e6f6ec4e:fefe1ee2:e40a34ba:0ca07f7d
ARRAY /dev/md0 level=raid5 num-devices=11 UUID=e7d757df:a4dd8c94:e40a34ba:0ca07f7d

Open in new window


cat /proc/mdstat:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid5 sdj1[7] sdh1[5] sdg1[6] sdi1[8] sdl1[10] sdk1[9] sdd1[2] sdb1[0] sde1[3] sdc1[1] sdf1[4]
      9767599360 blocks level 5, 64k chunk, algorithm 2 [11/11] [UUUUUUUUUUU]

Open in new window


I'm concerned because the mdadm.conf file has two DEVICE lines (not necessarily a problem), but only one ARRAY line that uses partitions from both DEVICE lines.  The mdadm scan showed two different arrays.  I would like to fix the mdadm.conf file, but am concerned about loosing data.

Can I simply combine the two DEVICE lists?  Normally I would just guess and check, but I really don't want to loose this data.

Thanks for the help.
WickedShamrockAsked:
Who is Participating?
 
WickedShamrockConnect With a Mentor Author Commented:
I finished backing up my data and have fixed the problem.  I somehow had two sets of superblocks.  It was probably the remnants of my initial "testing" period when I first started playing around with mdadm.  I the following superblocks assigned:

1. /dev/sd[bcd]
2. /dev/sd[bcdefghijkl]1

I had a superblock assigned to the raw drives of group 1.  All I had to do was run --zero-superblock on group 1 and I was good.

I also removed the DEVICE lines from my mdadm.conf file and changed the ARRAY line to use the UUID, but it was working fine prior to changing it.
0
 
arnoldCommented:
run
mdadm --examine --brief --scan --config=partitions
mdadm -Ebsc partitions
http://linux.die.net/man/8/mdadm

post the output of the above.
There are other examples for mdadm to get info.

It should tell you which array is degraded and which is the failed disk.
0
 
WickedShamrockAuthor Commented:
Thanks for the response.  The following are the results of those commands.  Hope that helps.  As far as I understand it nothing is degraded, but if I really knew I wouldn't be posting on this forum!  Thanks again.

sudo mdadm --examine --brief --scan --config=partitions
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=e6f6ec4e:fefe1ee2:e40a34ba:0ca07f7d
ARRAY /dev/md0 level=raid5 num-devices=11 UUID=e7d757df:a4dd8c94:e40a34ba:0ca07f7d

Open in new window


sudo mdadm -Ebsc partitions
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=e6f6ec4e:fefe1ee2:e40a34ba:0ca07f7d
ARRAY /dev/md0 level=raid5 num-devices=11 UUID=e7d757df:a4dd8c94:e40a34ba:0ca07f7d

Open in new window

0
 
arnoldCommented:
you have two devices with the same name /dev/md0

you should make sure that each device is unique /dev/md0 /dev/md1.

0
 
WickedShamrockAuthor Commented:
That's what I was originally thinking, but the output of df makes me think that it's not two separate devices.  The filesystem size is correct for an 11 1TB disk RAID5 array.

(11 - 1) * 1024 = 10240

vs.

(3 - 1) * 1024 = 2048
(8 - 1) * 1024 = 7168 = 9216


df /mnt/raid/:
Filesystem     1K-blocks    Used         Available    Use%   Mounted on
/dev/md0       9614330456   8049836092   1076114404   89%   /mnt/raid

Open in new window



0
 
arnoldCommented:
10*1024*1024*1024=10737418240kB=10TB

/var/log/messages should tell you that there is an error when you try to start /dev/md0 as a raid 5 consisting of 3 devices as not possible i.e. /dev/md0 is a raid5 made up of 11 devices.
i.e. the order of your RAID creation was 3 device RAID 5 and then 11 device RAID5 The last one is the one that is enforced.

0
 
WickedShamrockAuthor Commented:
That makes a lot of sense.  I looked at the log file (attached)– I don't see anything glaringly wrong (nothing states failed), but I would like to go ahead and try combining the two DEVICE lines and restarting the array.  If we are wrong, will that damage anything automatically? ie Will it trigger some kind of repair that would potentially cause me to loose data?  

I'm not sure if I am blowing this whole thing out of proportion.  I consider myself computer savvy, but this is my first experience with RAID and don't fully understand how mdadm functions.  Thanks again.
messages
0
 
arnoldCommented:
Comment out the /dev/md0 that assembles the /dev/md0 as raid 5 of three devices.
If you try it the other way /dev/md0 as a raid 5 of three devices it will fail. The data will be lost if you reinitiate/recreate the /dev/md0 as a raid 5 of three devices.

The error will show up during bootup when /dev/md0 is reassembly attempting three device.

the command dmesg will also show an error when /dev/md0 assembly is attempted with the raid 5 three devices.

But all good practice is to have a good backup.
Do not erase/delete/initialize the array.
0
 
WickedShamrockAuthor Commented:
Thanks again for the help.  I am in the process of backing up my data before I try anything.  I will keep you posted.
0
 
WickedShamrockAuthor Commented:
I appreciate the ideas provided, but the actual problem was not fixed by anything proposed.
0
 
arnoldCommented:
Pointed out that you had one device /dev/md0 referenced by two raid setup groups.
Suggested Backups before trying anything which is a requirement since if I told you to try something and the RAID became corrupt .... loss of data.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.