Kong
asked on
"Device sdd not ready" in /var/log/messages
Hi all,
We restarted the 2-node cluster (both nodes and SAN) and got a lot of Device not ready messages in /var/log/messages:
Jul 2 04:02:30 lu3cduddb1 syslogd 1.4.1: restart.
Jul 2 04:02:31 lu3cduddb1 kernel: Device sdb not ready.
Jul 2 04:02:31 lu3cduddb1 kernel: Device sdd not ready.
Jul 2 04:02:31 lu3cduddb1 kernel: Device sdf not ready.
Jul 2 04:02:31 lu3cduddb1 kernel: Device sdh not ready.
Jul 2 04:02:31 lu3cduddb1 kernel: Device sdj not ready.
...
How do I track down and resolve this problem? Thanks in advance.
We restarted the 2-node cluster (both nodes and SAN) and got a lot of Device not ready messages in /var/log/messages:
Jul 2 04:02:30 lu3cduddb1 syslogd 1.4.1: restart.
Jul 2 04:02:31 lu3cduddb1 kernel: Device sdb not ready.
Jul 2 04:02:31 lu3cduddb1 kernel: Device sdd not ready.
Jul 2 04:02:31 lu3cduddb1 kernel: Device sdf not ready.
Jul 2 04:02:31 lu3cduddb1 kernel: Device sdh not ready.
Jul 2 04:02:31 lu3cduddb1 kernel: Device sdj not ready.
...
How do I track down and resolve this problem? Thanks in advance.
ASKER
Wow, thanks for the very quick response!
Here's the output, I'm not a linux admin by any stretch of the imagination so can't tell what's wrong from the output:
fstab:
[root@lu3cduddb1 ~]# cat /etc/fstab
# This file is edited by fstab-sync - see 'man fstab-sync' for details
LABEL=/ / ext3 defaults 1 1
LABEL=/boot /boot ext3 defaults 1 2
none /dev/pts devpts gid=5,mode=620 0 0
none /dev/shm tmpfs defaults 0 0
LABEL=/home /home ext3 defaults 1 2
LABEL=/opt /opt ext3 defaults 1 2
none /proc proc defaults 0 0
none /sys sysfs defaults 0 0
LABEL=/tmp /tmp ext3 defaults 1 2
LABEL=/usr /usr ext3 defaults 1 2
LABEL=/var /var ext3 defaults 1 2
LABEL=SW-cciss/c0d0p2 swap swap defaults 0 0
#
# OCFS RAC File Systems
#
/dev/mpath/mpath0p1 /oracle/RACConfig ocfs2 _netdev,datavolume 0 0
/dev/mpath/mpath1p1 /oracle/oradata/TEST_DBs ocfs2 _netdev,datavolume 0 0
/dev/mpath/mpath2p1 /oracle/oradata/prod ocfs2 _netdev,datavolume 0 0
#
/dev/hda /media/cdrom auto pamconsole,exec,noauto,man aged 0 0
/dev/fd0 /media/floppy auto pamconsole,exec,noauto,man aged 0 0
------
[root@lu3cduddb1 ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: HP Model: HSV100 Rev: 3028
Type: RAID ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 01
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 02
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 03
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: HP Model: HSV100 Rev: 3028
Type: RAID ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 01
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 02
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 03
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: HP Model: HSV100 Rev: 3028
Type: RAID ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 01
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 02
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 03
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 00
Vendor: HP Model: HSV100 Rev: 3028
Type: RAID ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 01
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 02
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 03
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
-----
[root@lu3cduddb1 ~]# cdrecord -scanbus
Cdrecord-Clone 2.01-dvd (i686-pc-linux-gnu) Copyright (C) 1995-2004 Jörg Schilling
Note: This version is an unofficial (modified) version with DVD support
Note: and therefore may have bugs that are not present in the original.
Note: Please send bug reports or support requests to http://bugzilla.redhat.com/bugzilla
Note: The author of cdrecord should not be bothered with problems in this version.
scsidev: 'ATA'
devname: 'ATA'
scsibus: -2 target: -2 lun: -2
Linux sg driver version: 3.5.27
Using libscg version 'schily-0.8'.
cdrecord: Warning: using inofficial libscg transport code version (schily - Red Hat-scsi-linux-sg.c-1.83-R H '@(#)scsi-linux-sg.c 1.83 04/05/20 Copyright 1997 J. Schilling').
scsibus0:
0,0,0 0) 'TEAC ' 'DV-28E-C ' 'B.4F' Removable CD-ROM
0,1,0 1) *
0,2,0 2) *
0,3,0 3) *
0,4,0 4) *
0,5,0 5) *
0,6,0 6) *
0,7,0 7) *
Here's the output, I'm not a linux admin by any stretch of the imagination so can't tell what's wrong from the output:
fstab:
[root@lu3cduddb1 ~]# cat /etc/fstab
# This file is edited by fstab-sync - see 'man fstab-sync' for details
LABEL=/ / ext3 defaults 1 1
LABEL=/boot /boot ext3 defaults 1 2
none /dev/pts devpts gid=5,mode=620 0 0
none /dev/shm tmpfs defaults 0 0
LABEL=/home /home ext3 defaults 1 2
LABEL=/opt /opt ext3 defaults 1 2
none /proc proc defaults 0 0
none /sys sysfs defaults 0 0
LABEL=/tmp /tmp ext3 defaults 1 2
LABEL=/usr /usr ext3 defaults 1 2
LABEL=/var /var ext3 defaults 1 2
LABEL=SW-cciss/c0d0p2 swap swap defaults 0 0
#
# OCFS RAC File Systems
#
/dev/mpath/mpath0p1 /oracle/RACConfig ocfs2 _netdev,datavolume 0 0
/dev/mpath/mpath1p1 /oracle/oradata/TEST_DBs ocfs2 _netdev,datavolume 0 0
/dev/mpath/mpath2p1 /oracle/oradata/prod ocfs2 _netdev,datavolume 0 0
#
/dev/hda /media/cdrom auto pamconsole,exec,noauto,man
/dev/fd0 /media/floppy auto pamconsole,exec,noauto,man
------
[root@lu3cduddb1 ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: HP Model: HSV100 Rev: 3028
Type: RAID ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 01
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 02
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 03
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: HP Model: HSV100 Rev: 3028
Type: RAID ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 01
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 02
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 03
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: HP Model: HSV100 Rev: 3028
Type: RAID ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 01
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 02
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 03
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 00
Vendor: HP Model: HSV100 Rev: 3028
Type: RAID ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 01
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 02
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 03
Vendor: HP Model: HSV100 Rev: 3028
Type: Direct-Access ANSI SCSI revision: 02
-----
[root@lu3cduddb1 ~]# cdrecord -scanbus
Cdrecord-Clone 2.01-dvd (i686-pc-linux-gnu) Copyright (C) 1995-2004 Jörg Schilling
Note: This version is an unofficial (modified) version with DVD support
Note: and therefore may have bugs that are not present in the original.
Note: Please send bug reports or support requests to http://bugzilla.redhat.com/bugzilla
Note: The author of cdrecord should not be bothered with problems in this version.
scsidev: 'ATA'
devname: 'ATA'
scsibus: -2 target: -2 lun: -2
Linux sg driver version: 3.5.27
Using libscg version 'schily-0.8'.
cdrecord: Warning: using inofficial libscg transport code version (schily - Red Hat-scsi-linux-sg.c-1.83-R
scsibus0:
0,0,0 0) 'TEAC ' 'DV-28E-C ' 'B.4F' Removable CD-ROM
0,1,0 1) *
0,2,0 2) *
0,3,0 3) *
0,4,0 4) *
0,5,0 5) *
0,6,0 6) *
0,7,0 7) *
ASKER
Not sure if this output helps, but there shouldn't be any [failed] lines:
[root@lu3cduddb1 ~]# multipath -l
mpath2 (3600508b40010260000006000 00420000)
[size=150 GB][features="0"][hwhandle r="0"]
\_ round-robin 0 [active]
\_ 0:0:0:3 sdc 8:32 [active][ready]
\_ round-robin 0 [enabled]
\_ 0:0:1:3 sdf 8:80 [failed][faulty]
\_ round-robin 0 [enabled]
\_ 1:0:0:3 sdi 8:128 [active][ready]
\_ round-robin 0 [enabled]
\_ 1:0:1:3 sdl 8:176 [failed][faulty]
mpath1 (3600508b40010260000006000 003b0000)
[size=20 GB][features="0"][hwhandle r="0"]
\_ round-robin 0 [enabled]
\_ 0:0:0:2 sdb 8:16 [failed][faulty]
\_ round-robin 0 [active]
\_ 0:0:1:2 sde 8:64 [active][ready]
\_ round-robin 0 [enabled]
\_ 1:0:0:2 sdh 8:112 [failed][faulty]
\_ round-robin 0 [enabled]
\_ 1:0:1:2 sdk 8:160 [active][ready]
mpath0 (3600508b40010260000006000 00340000)
[size=1 GB][features="0"][hwhandle r="0"]
\_ round-robin 0 [active]
\_ 0:0:0:1 sda 8:0 [active][ready]
\_ round-robin 0 [enabled]
\_ 0:0:1:1 sdd 8:48 [failed][faulty]
\_ round-robin 0 [enabled]
\_ 1:0:0:1 sdg 8:96 [active][ready]
\_ round-robin 0 [enabled]
\_ 1:0:1:1 sdj 8:144 [failed][faulty]
[root@lu3cduddb1 ~]# multipath -l
mpath2 (3600508b40010260000006000
[size=150 GB][features="0"][hwhandle
\_ round-robin 0 [active]
\_ 0:0:0:3 sdc 8:32 [active][ready]
\_ round-robin 0 [enabled]
\_ 0:0:1:3 sdf 8:80 [failed][faulty]
\_ round-robin 0 [enabled]
\_ 1:0:0:3 sdi 8:128 [active][ready]
\_ round-robin 0 [enabled]
\_ 1:0:1:3 sdl 8:176 [failed][faulty]
mpath1 (3600508b40010260000006000
[size=20 GB][features="0"][hwhandle
\_ round-robin 0 [enabled]
\_ 0:0:0:2 sdb 8:16 [failed][faulty]
\_ round-robin 0 [active]
\_ 0:0:1:2 sde 8:64 [active][ready]
\_ round-robin 0 [enabled]
\_ 1:0:0:2 sdh 8:112 [failed][faulty]
\_ round-robin 0 [enabled]
\_ 1:0:1:2 sdk 8:160 [active][ready]
mpath0 (3600508b40010260000006000
[size=1 GB][features="0"][hwhandle
\_ round-robin 0 [active]
\_ 0:0:0:1 sda 8:0 [active][ready]
\_ round-robin 0 [enabled]
\_ 0:0:1:1 sdd 8:48 [failed][faulty]
\_ round-robin 0 [enabled]
\_ 1:0:0:1 sdg 8:96 [active][ready]
\_ round-robin 0 [enabled]
\_ 1:0:1:1 sdj 8:144 [failed][faulty]
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
No, an engineer came around and apparently the fiber channel was configured incorrectly, the HP Eva (3000?) SAN doesn't support dual Active channels, it's now set to Active-Passive however the display still shows:
[root@lu3cduddb2 ~]# multipath -l
mpath2 (3600508b40010260000006000 00420000)
[size=150 GB][features="0"][hwhandle r="0"]
\_ round-robin 0 [active]
\_ 0:0:0:3 sdc 8:32 [active][ready]
\_ round-robin 0 [enabled]
\_ 0:0:1:3 sdf 8:80 [failed][faulty]
\_ round-robin 0 [enabled]
\_ 1:0:0:3 sdi 8:128 [active][ready]
\_ round-robin 0 [enabled]
\_ 1:0:1:3 sdl 8:176 [failed][faulty]
mpath1 (3600508b40010260000006000 003b0000)
[size=20 GB][features="0"][hwhandle r="0"]
\_ round-robin 0 [enabled]
\_ 0:0:0:2 sdb 8:16 [failed][faulty]
\_ round-robin 0 [active]
\_ 0:0:1:2 sde 8:64 [active][ready]
\_ round-robin 0 [enabled]
\_ 1:0:0:2 sdh 8:112 [failed][faulty]
\_ round-robin 0 [enabled]
\_ 1:0:1:2 sdk 8:160 [active][ready]
mpath0 (3600508b40010260000006000 00340000)
[size=1 GB][features="0"][hwhandle r="0"]
\_ round-robin 0 [active]
\_ 0:0:0:1 sda 8:0 [active][ready]
\_ round-robin 0 [enabled]
\_ 0:0:1:1 sdd 8:48 [failed][faulty]
\_ round-robin 0 [enabled]
\_ 1:0:0:1 sdg 8:96 [active][ready]
\_ round-robin 0 [enabled]
\_ 1:0:1:1 sdj 8:144 [failed][faulty]
The engineer thinks the driver is faulty because it shouldn't show the [failed][faulty] paths, says it's working as expected...
We're still getting spammed with "Device not ready" messages in /var/log/messages, but at least the two nodes can see the SAN now and I can create my database - for how long, I don't know... It looks a bit suss to me, but I'm not a sys admin...
[root@lu3cduddb2 ~]# multipath -l
mpath2 (3600508b40010260000006000
[size=150 GB][features="0"][hwhandle
\_ round-robin 0 [active]
\_ 0:0:0:3 sdc 8:32 [active][ready]
\_ round-robin 0 [enabled]
\_ 0:0:1:3 sdf 8:80 [failed][faulty]
\_ round-robin 0 [enabled]
\_ 1:0:0:3 sdi 8:128 [active][ready]
\_ round-robin 0 [enabled]
\_ 1:0:1:3 sdl 8:176 [failed][faulty]
mpath1 (3600508b40010260000006000
[size=20 GB][features="0"][hwhandle
\_ round-robin 0 [enabled]
\_ 0:0:0:2 sdb 8:16 [failed][faulty]
\_ round-robin 0 [active]
\_ 0:0:1:2 sde 8:64 [active][ready]
\_ round-robin 0 [enabled]
\_ 1:0:0:2 sdh 8:112 [failed][faulty]
\_ round-robin 0 [enabled]
\_ 1:0:1:2 sdk 8:160 [active][ready]
mpath0 (3600508b40010260000006000
[size=1 GB][features="0"][hwhandle
\_ round-robin 0 [active]
\_ 0:0:0:1 sda 8:0 [active][ready]
\_ round-robin 0 [enabled]
\_ 0:0:1:1 sdd 8:48 [failed][faulty]
\_ round-robin 0 [enabled]
\_ 1:0:0:1 sdg 8:96 [active][ready]
\_ round-robin 0 [enabled]
\_ 1:0:1:1 sdj 8:144 [failed][faulty]
The engineer thinks the driver is faulty because it shouldn't show the [failed][faulty] paths, says it's working as expected...
We're still getting spammed with "Device not ready" messages in /var/log/messages, but at least the two nodes can see the SAN now and I can create my database - for how long, I don't know... It looks a bit suss to me, but I'm not a sys admin...
1. That the device is not connected:
2. That the device takes time to 'spin up to speed'
3. That the device is not accessable for some other reason...rebuilding a RAID array etc.
4. That the device is a removeable device such as a CD.
Fault-finding:
1. Look at /etc/fstab #Can you identify the device?
2. cat /proc/scsi/scsi # May give you something useful?
3. cdrecord -scanbus #CHecks what's on the scsi bus
The reason for the 3rd one is that I was wondering if you had a CD tower of some sort connected. Any drive without a CD in it could be declared 'not ready'
( (()
(`-' _\
'' ''