Link to home
Start Free TrialLog in
Avatar of Mr-sark
Mr-sark

asked on

MDADM doesn't give error when RAID FAILS

After getting some great pointer about  managing my RAID 5 array with "mdadm" I ran into a problem.

When I pull out a random disk from the /dev/md0 array - I should get an error on my display. When i'm behind the terminal and I pull out a disk Red Hat keeps spitting that a device is missing! e.a. /dev/sdd1.

When I do a "mdadm --detail /dev/md0" there doesn't seem to be anything wrong while i pulled out 1 disk. ? how is that ? is it a bug ?
I can't seem to get this fixed, but I really want to get it working so I can do monitoring from home.

Here is my "mdadm --detail /dev/md0" ouput:

 Version : 00.90.01
  Creation Time : Mon Feb 14 15:23:01 2005
     Raid Level : raid5
     Array Size : 142238976 (135.65 GiB 145.65 GB)
    Device Size : 35559744 (33.91 GiB 36.41 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu May 26 15:29:59 2005
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-asymmetric
     Chunk Size : 64K

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1
       4       8       81        4      active sync   /dev/sdf1
           UUID : 6820c415:e457679b:ff59e0ba:0e72128a
         Events : 0.4993
Avatar of idmisk
idmisk
Flag of Austria image

> When I pull out a random disk from the /dev/md0 array
how? physical, while the system is running? as i can see, you are using scsi. but are the drives/cases also hot swappable?
Avatar of Mr-sark
Mr-sark

ASKER

That is right i've pulled out the harddisk  physical. I use an old Dell Powerecge 2400 and the harddisks are hot swappable. Btw i use a software RAID.
> Btw i use a software RAID.
yep, i see. can you please exec
cat /proc/mdstat
and post the output

it sound for me like some scsi bus/driver/controller troubles, otherwise it should scream about failed drive.
any raid/scsi messages in output of 'dmesg' or /var/log/{syslog,messages}?
Avatar of Mr-sark

ASKER

i can find this error in my log files, this is also the error the screen was filling up with:

May 26 13:57:48 ferrari  -- root[2506]: ROOT LOGIN ON tty2
May 26 13:58:14 ferrari kernel: SCSI error : <0 0 1 0> return code = 0x10000
May 26 13:58:14 ferrari kernel: end_request: I/O error, dev sdb, sector 71119551

But I find it kind of strange mdadm doesn't show the missing disk.

Personalities : [raid5]
md0 : active raid5 sdc1[1] sdf1[4] sde1[3] sdd1[2] sdb1[0]
      142238976 blocks level 5, 64k chunk, algorithm 0 [5/5] [UUUUU]

unused devices: <none>


thnx
ASKER CERTIFIED SOLUTION
Avatar of idmisk
idmisk
Flag of Austria image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Mr-sark

ASKER

would it help if i stopped the arrar /dev/md0 and restart it ?
if it is not mounted ... wait.
is your root located on it or it is just a data storage?
Avatar of Mr-sark

ASKER

the whole array is running at the moment. I have a sperated disk for my O.S.
if it is possible umount and mount it again. but i don't think this will change something without stopping and starting it.
Avatar of Mr-sark

ASKER

what i can try ( correct me if  i'm wrong ) is the follow

umount /dev/md0
mdadm --stop /dev/md0
mdadm --run /dev/md0
mount /dev/md0 /data
right. the only important point is that no application should access it, if you umount, othewise it will fail. to check:

lsof | grep -- "/data"