asked on

MDADM doesn't give error when RAID FAILS

After getting some great pointer about managing my RAID 5 array with "mdadm" I ran into a problem.

When I pull out a random disk from the /dev/md0 array - I should get an error on my display. When i'm behind the terminal and I pull out a disk Red Hat keeps spitting that a device is missing! e.a. /dev/sdd1.

When I do a "mdadm --detail /dev/md0" there doesn't seem to be anything wrong while i pulled out 1 disk. ? how is that ? is it a bug ?
I can't seem to get this fixed, but I really want to get it working so I can do monitoring from home.

Here is my "mdadm --detail /dev/md0" ouput:

Version : 00.90.01
Creation Time : Mon Feb 14 15:23:01 2005
Raid Level : raid5
Array Size : 142238976 (135.65 GiB 145.65 GB)
Device Size : 35559744 (33.91 GiB 36.41 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Thu May 26 15:29:59 2005
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0

Layout : left-asymmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 8 49 2 active sync /dev/sdd1
3 8 65 3 active sync /dev/sde1
4 8 81 4 active sync /dev/sdf1
UUID : 6820c415:e457679b:ff59e0ba:0e72128a
Events : 0.4993

idmisk

> When I pull out a random disk from the /dev/md0 array
how? physical, while the system is running? as i can see, you are using scsi. but are the drives/cases also hot swappable?

Mr-sark

ASKER

That is right i've pulled out the harddisk physical. I use an old Dell Powerecge 2400 and the harddisks are hot swappable. Btw i use a software RAID.

idmisk

> Btw i use a software RAID.
yep, i see. can you please exec
cat /proc/mdstat
and post the output

it sound for me like some scsi bus/driver/controller troubles, otherwise it should scream about failed drive.
any raid/scsi messages in output of 'dmesg' or /var/log/{syslog,messages}?

Mr-sark

ASKER

i can find this error in my log files, this is also the error the screen was filling up with:

May 26 13:57:48 ferrari -- root[2506]: ROOT LOGIN ON tty2
May 26 13:58:14 ferrari kernel: SCSI error : <0 0 1 0> return code = 0x10000
May 26 13:58:14 ferrari kernel: end_request: I/O error, dev sdb, sector 71119551

But I find it kind of strange mdadm doesn't show the missing disk.

Personalities : [raid5]
md0 : active raid5 sdc1[1] sdf1[4] sde1[3] sdd1[2] sdb1[0]
142238976 blocks level 5, 64k chunk, algorithm 0 [5/5] [UUUUU]

unused devices: <none>

thnx

ASKER CERTIFIED SOLUTION

idmisk

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Mr-sark

ASKER

would it help if i stopped the arrar /dev/md0 and restart it ?

idmisk

if it is not mounted ... wait.
is your root located on it or it is just a data storage?

Mr-sark

ASKER

the whole array is running at the moment. I have a sperated disk for my O.S.

idmisk

if it is possible umount and mount it again. but i don't think this will change something without stopping and starting it.

Mr-sark

ASKER

what i can try ( correct me if i'm wrong ) is the follow

umount /dev/md0
mdadm --stop /dev/md0
mdadm --run /dev/md0
mount /dev/md0 /data

idmisk

right. the only important point is that no application should access it, if you umount, othewise it will fail. to check:

lsof | grep -- "/data"