Solved

Fix a Linux RAID Configuration Problem

Posted on 2011-09-12
11
452 Views
Last Modified: 2016-12-08
I am having an issue with a Linux RAID array I use as a NAS server.  Everything seems to work with no issue.  No files seem to be corrupted and everything is accessible, but I have found a problem with the configuration.

Hardware Configuration:
Ubuntu Server 10.10 (No GUI)
(1) 250 GB Western Digital HDD (Ubuntu)
      /dev/sda
(11) 1TB Western Digital HDDs (RAID5 Array)
      /dev/sd[bcdefghijkl]1

I used Wemin for the basic setup of the RAID and I believe that is where my problem began.

mdadm.conf:
DEVICE partitions /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1
CREATE owner=root group=disk mode=0660 auto=yes
HOMEHOST <system>
DEVICE /dev/sdb1 /dev/sdc1 /dev/sdd1
ARRAY /dev/md0 level=raid5 devices=/dev/sdb1,/dev/sdc1,/dev/sdd1,/dev/sde1,/dev/sdf1,/dev/sdg1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdk1,/dev/sdl1

Open in new window


sudo mdadm --examine –scan:
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=e6f6ec4e:fefe1ee2:e40a34ba:0ca07f7d
ARRAY /dev/md0 level=raid5 num-devices=11 UUID=e7d757df:a4dd8c94:e40a34ba:0ca07f7d

Open in new window


cat /proc/mdstat:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid5 sdj1[7] sdh1[5] sdg1[6] sdi1[8] sdl1[10] sdk1[9] sdd1[2] sdb1[0] sde1[3] sdc1[1] sdf1[4]
      9767599360 blocks level 5, 64k chunk, algorithm 2 [11/11] [UUUUUUUUUUU]

Open in new window


I'm concerned because the mdadm.conf file has two DEVICE lines (not necessarily a problem), but only one ARRAY line that uses partitions from both DEVICE lines.  The mdadm scan showed two different arrays.  I would like to fix the mdadm.conf file, but am concerned about loosing data.

Can I simply combine the two DEVICE lists?  Normally I would just guess and check, but I really don't want to loose this data.

Thanks for the help.
0
Comment
Question by:WickedShamrock
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
11 Comments
 
LVL 78

Expert Comment

by:arnold
ID: 36525218
run
mdadm --examine --brief --scan --config=partitions
mdadm -Ebsc partitions
http://linux.die.net/man/8/mdadm

post the output of the above.
There are other examples for mdadm to get info.

It should tell you which array is degraded and which is the failed disk.
0
 

Author Comment

by:WickedShamrock
ID: 36525296
Thanks for the response.  The following are the results of those commands.  Hope that helps.  As far as I understand it nothing is degraded, but if I really knew I wouldn't be posting on this forum!  Thanks again.

sudo mdadm --examine --brief --scan --config=partitions
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=e6f6ec4e:fefe1ee2:e40a34ba:0ca07f7d
ARRAY /dev/md0 level=raid5 num-devices=11 UUID=e7d757df:a4dd8c94:e40a34ba:0ca07f7d

Open in new window


sudo mdadm -Ebsc partitions
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=e6f6ec4e:fefe1ee2:e40a34ba:0ca07f7d
ARRAY /dev/md0 level=raid5 num-devices=11 UUID=e7d757df:a4dd8c94:e40a34ba:0ca07f7d

Open in new window

0
 
LVL 78

Expert Comment

by:arnold
ID: 36526141
you have two devices with the same name /dev/md0

you should make sure that each device is unique /dev/md0 /dev/md1.

0
 

Author Comment

by:WickedShamrock
ID: 36526419
That's what I was originally thinking, but the output of df makes me think that it's not two separate devices.  The filesystem size is correct for an 11 1TB disk RAID5 array.

(11 - 1) * 1024 = 10240

vs.

(3 - 1) * 1024 = 2048
(8 - 1) * 1024 = 7168 = 9216


df /mnt/raid/:
Filesystem     1K-blocks    Used         Available    Use%   Mounted on
/dev/md0       9614330456   8049836092   1076114404   89%   /mnt/raid

Open in new window



0
 
LVL 78

Expert Comment

by:arnold
ID: 36526482
10*1024*1024*1024=10737418240kB=10TB

/var/log/messages should tell you that there is an error when you try to start /dev/md0 as a raid 5 consisting of 3 devices as not possible i.e. /dev/md0 is a raid5 made up of 11 devices.
i.e. the order of your RAID creation was 3 device RAID 5 and then 11 device RAID5 The last one is the one that is enforced.

0
 

Author Comment

by:WickedShamrock
ID: 36526661
That makes a lot of sense.  I looked at the log file (attached)– I don't see anything glaringly wrong (nothing states failed), but I would like to go ahead and try combining the two DEVICE lines and restarting the array.  If we are wrong, will that damage anything automatically? ie Will it trigger some kind of repair that would potentially cause me to loose data?  

I'm not sure if I am blowing this whole thing out of proportion.  I consider myself computer savvy, but this is my first experience with RAID and don't fully understand how mdadm functions.  Thanks again.
messages
0
 
LVL 78

Expert Comment

by:arnold
ID: 36526788
Comment out the /dev/md0 that assembles the /dev/md0 as raid 5 of three devices.
If you try it the other way /dev/md0 as a raid 5 of three devices it will fail. The data will be lost if you reinitiate/recreate the /dev/md0 as a raid 5 of three devices.

The error will show up during bootup when /dev/md0 is reassembly attempting three device.

the command dmesg will also show an error when /dev/md0 assembly is attempted with the raid 5 three devices.

But all good practice is to have a good backup.
Do not erase/delete/initialize the array.
0
 

Author Comment

by:WickedShamrock
ID: 36545014
Thanks again for the help.  I am in the process of backing up my data before I try anything.  I will keep you posted.
0
 

Accepted Solution

by:
WickedShamrock earned 0 total points
ID: 36594146
I finished backing up my data and have fixed the problem.  I somehow had two sets of superblocks.  It was probably the remnants of my initial "testing" period when I first started playing around with mdadm.  I the following superblocks assigned:

1. /dev/sd[bcd]
2. /dev/sd[bcdefghijkl]1

I had a superblock assigned to the raw drives of group 1.  All I had to do was run --zero-superblock on group 1 and I was good.

I also removed the DEVICE lines from my mdadm.conf file and changed the ARRAY line to use the UUID, but it was working fine prior to changing it.
0
 

Author Closing Comment

by:WickedShamrock
ID: 36813448
I appreciate the ideas provided, but the actual problem was not fixed by anything proposed.
0
 
LVL 78

Expert Comment

by:arnold
ID: 36594187
Pointed out that you had one device /dev/md0 referenced by two raid setup groups.
Suggested Backups before trying anything which is a requirement since if I told you to try something and the RAID became corrupt .... loss of data.
0
Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Over the last few years, video game fans have found a new favorite pastime: Watching and creating live-streaming gaming sessions. Twitch.TV has emerged as an industry leader in making game streaming painless for those looking to share their Starcraf…
Let’s face it, any data that’s not backed up frequently is in danger. And when data loss happens to you, it can seem like the end of the world.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question