Linux software RAID 1 locking to read-only mode
The setup:
Centos 5.2, 2x 320 GB sata drives in RAID 1.
- /dev/md0 (/dev/sda1 + /dev/sdb1) is
/boot
- /dev/md1 (/dev/sda1 +
/dev/sdb1) is an LVM partition which
contains /, /data and swap partitions
All filesystems other than swap are ext3
We've had problem on several systems where a fault on one drive has locked the root filesystem as readonly, which obviously causes problems.
[root@myserver /]# mount | grep Root
/dev/mapper/VolGroup00-Log
VolRoot on / type ext3 (rw)
[root@myserver /]# touch /foo
touch: cannot touch `/foo': Read-only file system
I can see that one of the partitions in the array is faulted:
[root@myserver /]# mdadm --detail /dev/md1
/dev/md1:
[...]
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
[...]
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 18 1 active sync /dev/sdb2
2 8 2 - faulty spare /dev/sda2
Remounting as rw fails:
[root@myserver /]# mount -n -o remount /
mount: block device /dev/VolGroup00/LogVolRoot
is write-protected, mounting read-only
The LVM tools give an error unless --ignorelockingfailure is used (because they can't write to /var) but show the volume group as rw:
[root@myserver /]# lvm vgdisplay
Locking type 1 initialisation failed.
[root@myserver /]# lvm pvdisplay --ignorelockingfailure
--- Physical volume ---
PV Name /dev/md1
VG Name VolGroup00
PV Size 279.36 GB / not usable 15.56 MB
Allocatable yes (but full)
[...]
[root@myserver /]# lvm vgdisplay --ignorelockingfailure
--- Volume group ---
VG Name VolGroup00
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 4
VG Access read/write
VG Status resizable
[...]
[root@myserver /]# lvm lvdisplay /dev/VolGroup00/LogVolRoot
--ignorelockingfailure
--- Logical volume ---
LV Name /dev/VolGroup00/LogVolRoot
VG Name VolGroup00
LV UUID PGoY0f-rXqj-xH4v-WMbw-jy6I
-nE04-yZD3
Gx
LV Write Access read/write
[...]
In this case /boot (seperate RAID meta-device) and /data (a different logical volume in the same volume group) are still writtable. From the previous occurances I know that a restart will bring the system back up with a read/write root filesystem and a properly degraded RAID array.
So, I have two questions:
1) When this occurs, how can I get the root filesystem back to read/write without a system restart?
2) What needs to be changed to stop this filesystem locking? With a RAID 1 failure on a single disk we don't want the filesystems to lockup, we want the system to keep running until we can replace the bad disk.