• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 953
  • Last Modified:

how to start/stop /dev/md0 RAID devices

I have set up a RAID-6 on my Linux host using four, 2TB drives. I've also copied about 1.8TB of data to the RAID and it has been a live production NAS device on the office LAN for the past 32 hours.

I followed the well written and extensive instructions at https://raid.wiki.kernel.org/index.php/RAID_setup. However, while this wiki drills down into mind numbing benchmark stats and numerous esoteric configuration options, it doesn't actually give the fundamental, straightforward HOWTO on starting and stopping the RAID at boot and shutdown time.

Here's what I've done so far:

mdadm --create /dev/md0 --metadata 1.2 --verbose --level=6 --raid-devices=4 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

mke2fs -v -m .1 -b 4096 -E stride=128,stripe-width=384 /dev/md0

mdadm --assemble --scan --uuid=39edeb69:297e340f:0e3f4469:81f51a6c

In /etc/fstab I have the following entry, and I have mounted the RAID via `mount /mnt/RAID`:
/dev/md0        /mnt/RAID        ext2        defaults,uid=99,gid=99,umask=0660,dmask=0771  1   1


WHAT NEXT?

How do I start this at boot time? As configured, the fstab entry will automatically mount the /dev/md0 device, but I've gleaned from the docs that the RAID has to be "assembled" each time it is started (and perhaps "assemble" is a synonym for "start"?). I've created a boot script as:

/sbin/mdadm --assemble --scan --uuid=39edeb69:297e340f:0e3f4469:81f51a6c

which will run at boot time. Is this correct? If so, is this sufficient? Should I do this step before or after the device is mounted? I don't remember whether I did the mdadm --assemble or mount first when I did all the by-hand (that was 3 days ago). If I should "assemble" first, then  I'll change the fstab entry to 'noauto' and have the startup script mount after assembling

What about shutdown? I can put the command `/sbin/mdadm --stop /dev/md0` into a shutdown script, but the same question about the mount: should I unmount /dev/md0 before doing the "stop", or just do the stop and let the system handle the unmount? It seems to me the unmounts happen after all shutdown scripts have been run, so does stopping the RAID device before the unmount mess up the flush?

This is relatively urgent because I am very reluctant to shut down this computer right now. I am worried that 3 days work of data copying might be corrupted or lost.

PLEASE ADVISE! THX.
0
jmarkfoley
Asked:
jmarkfoley
  • 11
  • 11
2 Solutions
 
arnoldCommented:
You should have the entry in /etc/mdadm.conf
Chkconfig to make sure mdadm starts at boot.


The start/stop for mdalert/msadm should deal with this.

The issue you may face is the additions the md device into /etc/fstab which deals with mounting the resource.

Which Linux distro are you using?
0
 
jmarkfoleyAuthor Commented:
I'm using slackware, so no chkconfig

The only entry in /etc/mdadm.conf is:

ARRAY /dev/md0 metadata=1.2 name=OHPRSstorage:0 UUID=39edeb69:297e340f:0e3f4469:81f51a6c

per the instructions referenced in the link in my original post: https://raid.wiki.kernel.org/index.php/RAID_setup#Saving_your_RAID_configuration

> The start/stop for mdalert/msadm should deal with this.

Yes, I know how to stop/start, the question is how to start/stop at boot/shutdown and what needs to be done with the mount/umount of the /dev/md0 (if anything) and in what order.

Redhat also has boot/shutdown scripts, albeit in /etc/init.d, I believe. Do you have Redhat examples for the mdadm RAID system you could post?
0
 
jmarkfoleyAuthor Commented:
The example shown at http://www.tcpdump.com/kb/os/linux/starting-and-stopping-raid-arrays.html does say to unount the filesystem first. So perhaps my shutdown script should be:

/sbin/umount -f /dev/md0
/sbin/mdadm --stop /dev/md0

The example doesn't mention mounting at startup, but if the shutdown example is correct (which it might not be!), it seems that the reverse should be true at startup:

/sbin/mdadm --assemble --scan --uuid=39edeb69:297e340f:0e3f4469:81f51a6c
mount /dev/md0

I believe all auto mounts in /etc/fstab are mounted before startup scripts are run. Therefore, if the above startup script is correct, I need to make the /dev/md0 entry in /fstab noauto and let the start up script do the mounting.

I'm soooooo confused!!!!

Somebody out there has done this, right?
0
Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

 
arnoldCommented:
There are start/stop scripts that deal with mounting/unmountng file systems using fstab. As well as assembling the raid devices.

http://www.slackware.com/~mrgoblin/raid1-slackware-12.php

Includes referencs to start up scripts.

There should not be any manua action on your part.
A shutdown should do all the necessary things.
As does the startup should assemble the md0 device and then mount it based on the fstab settings.
0
 
jmarkfoleyAuthor Commented:
arnold: thanks for the post. Your link gives a start/stop script for RAID monitoring, but not starting/stopping the RAID. Odd that no one talks about that. Your link scripts are a bit dated and reference older Slackware distro and older mdadm version, but the monitoring script looks very useable so I will save that. Thanks!

I came across another link, http://ubuntuforums.org/showthread.php?t=872092,  where a person was reporting that same issue as me, but he actually tried rebooting and the RAID drive did in fact fail to auto mount, supporting my fear. The thread gets unclear toward the end, but I think he concluded that he needed to assemble the rate *before* mounting the filesystem.

If that's the case, I think the safe thing would be to NOT automount the RAID from fstab at boot time and do it in the init script. So, I would end up with:

fstab entry:
/dev/md0        /mnt/RAID       ext2        noauto,exec,async,uid=99,gid=99,umask=0660,dmask=0771  1   1

etc/rc.d/rc.RAID
case $1 in
"start" )
    echo Starting RAID
    /sbin/mdadm --assemble --scan --uuid=`/usr/bin/grep "^ARRAY /dev/md0" /etc/mdadm.conf | \
        /usr/bin/awk 'BEGIN{RS=" "}{print $0}' | /usr/bin/grep UUID= | /usr/bin/cut -d= -f2`
    /sbin/mount /dev/md0
    ;;

"stop" )
    x=`/usr/bin/df | grep /mnt/RAID`

    if [ -n "$x" ]
    then
        echo Stopping RAID
        /sbin/umount -f /dev/md0
        /sbin/mdadm --stop /dev/md0
    else
        echo RAID not started
    fi
    ;;

* )
    echo "Syntax: $0 [ start | stop ]"
    ;;
esac

Open in new window


Another bit that might be a factor is setting the actual /dev/sd[1-n] individual drive partitions to type  (fd) Linux Raid Autodetect, which I read about after creating and loading the array. I haven't come across anything that explains what this does or what it is needed for. My system seems to work just fine having used the usual partition type 83. Could this be a factor in start up? I fear to change these at this point (would that generate a different UUID?)

I'm going to leave this question open until at least the weekend which is when I will try to shut down and restart the computer (and RAID). Meanwhile, I look forward to more comments. Surely someone (arnold?) in the EE community has actually implemented a mdadm RAID system and has actual init scripts?
0
 
arnoldCommented:
83 is not a valid Software RAID partition type.
could you post the output fdisk -l /dev/sda and similarly for /dev/sdb?
0
 
jmarkfoleyAuthor Commented:
Thanks for the link. The author doesn't say you *can't* use 83 as the partition type, just that he prefers using FD and the auto-detect method. My worry now is that if I change the partition type after the fact, and after the raid has been in extensive use for a week, that I might mess up the UUID or otherwise screw something up. I have not stopped/started the RAID yet. I plan on doing so tomorrow. At this point I plan on using the init scripts in my previous post.

I'm really unclear about the start/stop procedure ... whether to mount before or after starting the RAID, what "autodetect" actually does (starting? mounting?). While there is a lot of excellent md RAID documentation out there, this fundamental procedure is completely left out of all instructions I have found. Apparently, it should be dead-obvious and not worth explaining. Do you have a RAID setup? If so, what's in your fstab? init scripts?

Do you think I should change the partition type? Do you think it will cause a problem or not?

Here is my fdisk -l for sda and sdb:

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x3127f1d2

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048  3907029167  1953513560   83  Linux

Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0e360d03

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048  3907029167  1953513560   83  Linux
0
 
arnoldCommented:
In the absence of the FD type, you have to make sure that you have the device defined in /etc/mdadm.conf


You could do the following.
Break the md device by removing one /dev/sda
Then repartition it and alter its type to FD, then add it back in.
Once the rebuild is complete, break the device again by pulling /dev/sdb out and making the changes and then adding it back in.
Make sure the rebuild is complete as well as make sure to write the master boot record.
Neither in your example is indicated as a boot device.

You are using the entire disk which I tend not to do.
I create multiple raid devices rather a single disk as you have.
The multiple raid device process provides more flexibility.  i.e. you can add a third, fourth, fifth, etc. and then distribute the various raid volumes accross multiple disks.  In your case, you are bound by the two disks for this.

/boot primary raid volume
swap is its own partition no raid needed
/ /usr/ var/ /var/log can be on a single RAID volume with LVM overlay
/home

etc.
0
 
jmarkfoleyAuthor Commented:
The RAID is not on the boot drive.

Actually, I *am* using multiple raid devices:

$ cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md0 : active raid6 sdd1[3] sdc1[2] sdb1[1] sda1[0]
      3907023872 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

> You are using the entire disk which I tend not to do.

I did create one partition, but I don't think I used the *entire* disk. Using the instructions in https://raid.wiki.kernel.org/index.php/Linux_Raid I believe I left 1% unused (-m .1):

mke2fs -v -m .1 -b 4096 -E stride=128,stripe-width=384 /dev/md0

or maybe we're not talking about the same thing.

I don't have an actual DEVICE configured in /etc/mdadm.conf, but I do have the array configured. Is this a problem?

ARRAY /dev/md0 metadata=1.2 name=OHPRSstorage:0 UUID=39edeb69:297e340f:0e3f4469:81f51a6c

Also from mdadm --detail:

$ mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Wed May 29 20:34:53 2013
     Raid Level : raid6
     Array Size : 3907023872 (3726.03 GiB 4000.79 GB)
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Fri Jun  7 13:39:38 2013
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : OHPRSstorage:0  (local to host OHPRSstorage)
           UUID : 39edeb69:297e340f:0e3f4469:81f51a6c
         Events : 19

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1

> You could do the following. Break the md device by removing one /dev/sda

Yeah, well, that sounds like a heart attack inducer! I have a spare lab-rat computer. I'll try this there and see what happens before I potentially lose 3TB of files!

You wrote, "You are using the entire disk which I tend not to do." This statement implies that you are actually using a RAID. What do you do to start/stop it at boot/shutdown time? Could you share your init script[s] and /etc/fstab? I have to take this down for a hardware upgrade tomorrow!
0
 
arnoldCommented:
The type if fd and it automatically detected and assembled by I believe mdmonitor script.

I also have the entry in /dev/mdadm.conf that deals with assembling.
It should create the md0
http://www.linuxmanpages.com/man5/mdadm.conf.5.php
Your 1 1 on the in fstab might be what delays the mounting.

Use dmesg
your UUID reference might be incorrect such that you may need in mdadm.conf use devices=the devices in order.

 as simple as  /dev/sd[a-d]1
ARRAY /dev/md0 level=6 devices=/dev/sd[a-d]1
Or
ARRAY /dev/md0 level=6 raid-devices=4 /dev/sd[a-d]1
0
 
jmarkfoleyAuthor Commented:
I'll check out your link

> Your 1 1 on the in fstab might be what delays the mounting.

Actually, this is a new setup. I haven't rebooted yet, so the fstab entry hasn't done anything. I'm trying to tweak the fstab. The 1 1 is for fsck'ing. Since I've built a ext2 filesystem on top of the RAID, don't I still need to fsck every so often?

my mdadm.conf ARRAY entry was created by `mdadm --detail --scan >>/etc/mdadm.conf` per the instructions I've reference. According to that website, using the UUID means I don't have to explicity list the devices. All theory at this point.

> The type if fd and it automatically detected and assembled by I believe mdmonitor script.

So that's it? You have no fstab entry and no init scripts other than one that runs mdmonitor?
0
 
arnoldCommented:
The fstab entry for where you want the partition mounted has to be there.  My understanding of your question dealt with the assembly of the RAID device (md0) at bootup.

So long as there is an entry in mdadm.conf mdmonitor script should start the mdadm --monitor --scan.
in the absence of fd partition types, /etc/mdadm.conf is used for assembly.

ps -ef | grep mdadm do you have a mdamd --monito --scan running?
0
 
jmarkfoleyAuthor Commented:
After much experimentation, I've finally got a resolution that works. It turns out that what you've stated is correct, I do not need to do anything special at boot except to have an entry in fstab. I do have the partitions types now set to FD, and I do have an ARRAY entry in /etc/mdadm.conf. So, the system is apparently using one or the other to automatically start the RAID at boot time. The fstab entry gets it mounted. My settings are:

/etc/mdadm.conf:
ARRAY /dev/md0 metadata=1.2 name=OHPRSstorage:0 UUID=39edeb69:297e340f:0e3f4469:81f51a6c

/etc/fstab:
/dev/md0        /mnt/RAID       ext2        defaults         1   1

I am keeping the 1 1 fsck parameters at the end because this is an ext2 filesytem after all, and should probably be routinely checked.

Shutting down, however,  is a different story. Since Samba and NFS are using this share, I get a cannot stop md0 message when rebooting (and maybe a cannot umount message as well, I can't recall). I mildly panicked when I saw that, but when it rebooted everything seemed fine, so perhaps it doesn't matter if things aren't stopped and umounted properly. Nevertheless, to be safe, I decided to  shut things down cleanly. I have the following script in /etc/rc.d/rc.RAID which is invoked by /etc/rc.d/rc.local_shutdown (Slackware):

case $1 in
"start" )
    echo Starting RAID
    /sbin/mdadm --assemble --scan --uuid=`/usr/bin/grep "^ARRAY /dev/md0" /etc/mdadm.conf | \
        /usr/bin/awk 'BEGIN{RS=" "}{print $0}' | /usr/bin/grep UUID= | /usr/bin/cut -d= -f2`
    /sbin/mount /dev/md0
    ;;

"stop" )
    x=`/usr/bin/df | grep /mnt/RAID`

    if [ -n "$x" ]
    then
        echo Stopping RAID

        # Check for RAID in use by samba
        x=`/usr/bin/lsof /mnt/RAID | grep -i smbd`

        if [ -n "$x" ]
        then
            echo $0 RAID in use by samba, stopping
            /etc/rc.d/rc.samba stop
        fi

        echo Stopping nfsd from $0
        /etc/rc.d/rc.nfsd stop
        /sbin/umount -f /dev/md0
        /sbin/mdadm --stop /dev/md0
    else
        echo RAID not started
    fi
    ;;

* )
    echo "Syntax: $0 [ start | stop ]"
    ;;
esac

Open in new window


The 'start' option is not used by rc.local at boot time, as I said. It is there in case I want to start up the md device after manually shutting it down.

That does the trick! Mission accomplished. I have a 3.6T RAID-6 that everyone in the office seems to be using w/o problem. Eventually I will add the following to /etc/rc.d/rc.local to start the monitoring, but I haven't gotten to that yet:

mdadm --monitor --scan --mail=user@somehost

Thanks for your help!
0
 
arnoldCommented:
The notice about the unmount is the same you overcome with the umount -f in your script. Which is run anyway.
You can change the partition type from ext2 to ext3 without having to reformat.
I.e. mount /dev/md0 ext3 /mnt/raid
Adds journaling. I.e, ext3 is xt2 +journaling.
0
 
jmarkfoleyAuthor Commented:
I don't remember, but I think I tried umount -f and it didn't work with Samba and NFS clients still active. The man page mentions an unreachable NFS host ... anyway, I'd have to put the umount -f into a shutdown script, so might as well kill of Samba and NFS for cleanliness.

Not really sure what journaling buys me -- just seems like more overhead. What's th benefit?
0
 
arnoldCommented:
Double check that your service start/stop scripts Start and Kend in reverse sequence. I.e. a service that starts last, must be terminated first
S90 K10
S10 K90

Have not dealt with Slackware for a long time.
/etc/rc2.d/
Should have a start and an stop script reference to/etc/init.d/service as a symbolic link.
0
 
jmarkfoleyAuthor Commented:
As I mentioned, I don't need a start script for this setup. It appears that NFS starts before Samba. I'll check out the start-order thing you mentioned, but I don't think Samba and NFS inter-depend.

I've used Slackware since mid-90's, after using 386BSD and FreeBSD and before all the other distros were invented. I've liked it because its init and config setup were more along the traditional System V and BSD lines (which I used in the 80's!) than the others, but lately its getting hard to find recent packages already built for Slackware. I guess Patrick Volkerding is getting too busy and the Slackware community is apparently shrinking. Often, I don't even see it listed on web distro lists. I had to use a lot of elbow grease to get SpamAssassin installed recently. I've found that RedHat packages work with little problem except for the location of the init scripts.

Slackware keeps most of its init scripts in /etc/rc.d and, as a matter of fact, /etc/init.d is symbolically linked to /etc/rc.d, not visa-versa. There is no 'service' file in that folder, but there is one in /usr/lib/pm-putils/bin/service which has the comment: "Handle service invocation on distros that do not have a "service" command. It handles LSB by default, and other distros that the maintainer is aware of." This is the first time I've ever noticed the existance of that file (thanks to your prompting). I'll check it out and see what it does, though I've pretty much gotten used to starting/stopping the rc.d scripts directly.

Thanks for the feedback.
0
 
arnoldCommented:
NFS is a service that usually starts near the end of the process rc3.d rc2.d
samba is similar.
You have to make sure that within the same that samba/nfs are stopped first.
the same for clients.
rc.d is top level. do you have rc1.d rc2.d rc3.d
the others are runlevels and usually each run level has its own services that run within.
i.e. certain services do not start until they get into multi-user mode.
If it works for you, that is great.
0
 
jmarkfoleyAuthor Commented:
rc.d is top level. do you have rc1.d rc2.d rc3.d

Yes, the rc.d folder has all those. rc.local is only run at multi-user level (rc.M) and  rc.local_shutdown is run by rc.0, rc.6 and rc.K.

I think I've got things in the right order and it does appear to be working OK. I've checked shutdown messages as it's going down and it looks good.

Again, thanks for your help and feedback.
0
 
arnoldCommented:
Good.  rc.local is run last and is often used for specific commands to be added within.  I.e. items that are not services.
0
 
jmarkfoleyAuthor Commented:
Final solution.
0

Featured Post

Fill in the form and get your FREE NFR key NOW!

Veeam is happy to provide a FREE NFR server license to certified engineers, trainers, and bloggers.  It allows for the non‑production use of Veeam Agent for Microsoft Windows. This license is valid for five workstations and two servers.

  • 11
  • 11
Tackle projects and never again get stuck behind a technical roadblock.
Join Now