Mr-sark
asked on
MDADM doesn't give error when RAID FAILS
After getting some great pointer about managing my RAID 5 array with "mdadm" I ran into a problem.
When I pull out a random disk from the /dev/md0 array - I should get an error on my display. When i'm behind the terminal and I pull out a disk Red Hat keeps spitting that a device is missing! e.a. /dev/sdd1.
When I do a "mdadm --detail /dev/md0" there doesn't seem to be anything wrong while i pulled out 1 disk. ? how is that ? is it a bug ?
I can't seem to get this fixed, but I really want to get it working so I can do monitoring from home.
Here is my "mdadm --detail /dev/md0" ouput:
Version : 00.90.01
Creation Time : Mon Feb 14 15:23:01 2005
Raid Level : raid5
Array Size : 142238976 (135.65 GiB 145.65 GB)
Device Size : 35559744 (33.91 GiB 36.41 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu May 26 15:29:59 2005
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-asymmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 8 49 2 active sync /dev/sdd1
3 8 65 3 active sync /dev/sde1
4 8 81 4 active sync /dev/sdf1
UUID : 6820c415:e457679b:ff59e0ba :0e72128a
Events : 0.4993
When I pull out a random disk from the /dev/md0 array - I should get an error on my display. When i'm behind the terminal and I pull out a disk Red Hat keeps spitting that a device is missing! e.a. /dev/sdd1.
When I do a "mdadm --detail /dev/md0" there doesn't seem to be anything wrong while i pulled out 1 disk. ? how is that ? is it a bug ?
I can't seem to get this fixed, but I really want to get it working so I can do monitoring from home.
Here is my "mdadm --detail /dev/md0" ouput:
Version : 00.90.01
Creation Time : Mon Feb 14 15:23:01 2005
Raid Level : raid5
Array Size : 142238976 (135.65 GiB 145.65 GB)
Device Size : 35559744 (33.91 GiB 36.41 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu May 26 15:29:59 2005
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-asymmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 8 49 2 active sync /dev/sdd1
3 8 65 3 active sync /dev/sde1
4 8 81 4 active sync /dev/sdf1
UUID : 6820c415:e457679b:ff59e0ba
Events : 0.4993
ASKER
That is right i've pulled out the harddisk physical. I use an old Dell Powerecge 2400 and the harddisks are hot swappable. Btw i use a software RAID.
> Btw i use a software RAID.
yep, i see. can you please exec
cat /proc/mdstat
and post the output
it sound for me like some scsi bus/driver/controller troubles, otherwise it should scream about failed drive.
any raid/scsi messages in output of 'dmesg' or /var/log/{syslog,messages} ?
yep, i see. can you please exec
cat /proc/mdstat
and post the output
it sound for me like some scsi bus/driver/controller troubles, otherwise it should scream about failed drive.
any raid/scsi messages in output of 'dmesg' or /var/log/{syslog,messages}
ASKER
i can find this error in my log files, this is also the error the screen was filling up with:
May 26 13:57:48 ferrari -- root[2506]: ROOT LOGIN ON tty2
May 26 13:58:14 ferrari kernel: SCSI error : <0 0 1 0> return code = 0x10000
May 26 13:58:14 ferrari kernel: end_request: I/O error, dev sdb, sector 71119551
But I find it kind of strange mdadm doesn't show the missing disk.
Personalities : [raid5]
md0 : active raid5 sdc1[1] sdf1[4] sde1[3] sdd1[2] sdb1[0]
142238976 blocks level 5, 64k chunk, algorithm 0 [5/5] [UUUUU]
unused devices: <none>
thnx
May 26 13:57:48 ferrari -- root[2506]: ROOT LOGIN ON tty2
May 26 13:58:14 ferrari kernel: SCSI error : <0 0 1 0> return code = 0x10000
May 26 13:58:14 ferrari kernel: end_request: I/O error, dev sdb, sector 71119551
But I find it kind of strange mdadm doesn't show the missing disk.
Personalities : [raid5]
md0 : active raid5 sdc1[1] sdf1[4] sde1[3] sdd1[2] sdb1[0]
142238976 blocks level 5, 64k chunk, algorithm 0 [5/5] [UUUUU]
unused devices: <none>
thnx
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
would it help if i stopped the arrar /dev/md0 and restart it ?
if it is not mounted ... wait.
is your root located on it or it is just a data storage?
is your root located on it or it is just a data storage?
ASKER
the whole array is running at the moment. I have a sperated disk for my O.S.
if it is possible umount and mount it again. but i don't think this will change something without stopping and starting it.
ASKER
what i can try ( correct me if i'm wrong ) is the follow
umount /dev/md0
mdadm --stop /dev/md0
mdadm --run /dev/md0
mount /dev/md0 /data
umount /dev/md0
mdadm --stop /dev/md0
mdadm --run /dev/md0
mount /dev/md0 /data
right. the only important point is that no application should access it, if you umount, othewise it will fail. to check:
lsof | grep -- "/data"
lsof | grep -- "/data"
how? physical, while the system is running? as i can see, you are using scsi. but are the drives/cases also hot swappable?