Solved

"Device sdd not ready" in /var/log/messages

Posted on 2006-07-03
5
477 Views
Last Modified: 2013-12-16
Hi all,

We restarted the 2-node cluster (both nodes and SAN) and got a lot of Device not ready messages in /var/log/messages:

Jul  2 04:02:30 lu3cduddb1 syslogd 1.4.1: restart.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdb not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdd not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdf not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdh not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdj not ready.
...

How do I track down and resolve this problem? Thanks in advance.
 
0
Comment
Question by:Kong
  • 3
  • 2
5 Comments
 
LVL 22

Expert Comment

by:pjedmond
ID: 17030377
Number of possible reasons behind this:

1.   That the device is not connected:
2.   That the device takes time to 'spin up to speed'
3.   That the device is not accessable for some other reason...rebuilding a RAID array etc.
4.   That the device is a removeable device such as a CD.

Fault-finding:

1.    Look at /etc/fstab                      #Can you identify the device?
2.    cat /proc/scsi/scsi                     # May give you something useful?
3.    cdrecord -scanbus                     #CHecks what's on the scsi bus

The reason for the 3rd one is that I was wondering if you had a CD tower of some sort connected. Any drive without a CD in it could be declared 'not ready'

(   (()
(`-' _\
 ''  ''
0
 
LVL 2

Author Comment

by:Kong
ID: 17030473
Wow, thanks for the very quick response!

Here's the output, I'm not a linux admin by any stretch of the imagination so can't tell what's wrong from the output:

fstab:

[root@lu3cduddb1 ~]# cat /etc/fstab
# This file is edited by fstab-sync - see 'man fstab-sync' for details
LABEL=/                 /                        ext3    defaults        1 1
LABEL=/boot             /boot                    ext3    defaults        1 2
none                    /dev/pts                 devpts  gid=5,mode=620  0 0
none                    /dev/shm                 tmpfs   defaults        0 0
LABEL=/home             /home                    ext3    defaults        1 2
LABEL=/opt              /opt                     ext3    defaults        1 2
none                    /proc                    proc    defaults        0 0
none                    /sys                     sysfs   defaults        0 0
LABEL=/tmp              /tmp                     ext3    defaults        1 2
LABEL=/usr              /usr                     ext3    defaults        1 2
LABEL=/var              /var                     ext3    defaults        1 2
LABEL=SW-cciss/c0d0p2   swap                     swap    defaults        0 0
#
# OCFS RAC File Systems
#
/dev/mpath/mpath0p1     /oracle/RACConfig        ocfs2   _netdev,datavolume 0 0
/dev/mpath/mpath1p1     /oracle/oradata/TEST_DBs ocfs2   _netdev,datavolume 0 0
/dev/mpath/mpath2p1     /oracle/oradata/prod     ocfs2   _netdev,datavolume 0 0
#

/dev/hda                /media/cdrom            auto    pamconsole,exec,noauto,managed 0 0
/dev/fd0                /media/floppy           auto    pamconsole,exec,noauto,managed 0 0

------

[root@lu3cduddb1 ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02

-----

[root@lu3cduddb1 ~]# cdrecord -scanbus
Cdrecord-Clone 2.01-dvd (i686-pc-linux-gnu) Copyright (C) 1995-2004 Jörg Schilling
Note: This version is an unofficial (modified) version with DVD support
Note: and therefore may have bugs that are not present in the original.
Note: Please send bug reports or support requests to http://bugzilla.redhat.com/bugzilla
Note: The author of cdrecord should not be bothered with problems in this version.
scsidev: 'ATA'
devname: 'ATA'
scsibus: -2 target: -2 lun: -2
Linux sg driver version: 3.5.27
Using libscg version 'schily-0.8'.
cdrecord: Warning: using inofficial libscg transport code version (schily - Red Hat-scsi-linux-sg.c-1.83-RH '@(#)scsi-linux-sg.c      1.83 04/05/20 Copyright 1997 J. Schilling').
scsibus0:
        0,0,0     0) 'TEAC    ' 'DV-28E-C        ' 'B.4F' Removable CD-ROM
        0,1,0     1) *
        0,2,0     2) *
        0,3,0     3) *
        0,4,0     4) *
        0,5,0     5) *
        0,6,0     6) *
        0,7,0     7) *
0
 
LVL 2

Author Comment

by:Kong
ID: 17030492
Not sure if this output helps, but there shouldn't be any [failed] lines:

[root@lu3cduddb1 ~]# multipath -l
mpath2 (3600508b4001026000000600000420000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:3 sdc 8:32  [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:3 sdf 8:80  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:3 sdi 8:128 [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:3 sdl 8:176 [failed][faulty]

mpath1 (3600508b40010260000006000003b0000)
[size=20 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 0:0:0:2 sdb 8:16  [failed][faulty]
\_ round-robin 0 [active]
 \_ 0:0:1:2 sde 8:64  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:2 sdh 8:112 [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:1:2 sdk 8:160 [active][ready]

mpath0 (3600508b4001026000000600000340000)
[size=1 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:1 sda 8:0   [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:1 sdd 8:48  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:1 sdg 8:96  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:1 sdj 8:144 [failed][faulty]
0
 
LVL 22

Accepted Solution

by:
pjedmond earned 500 total points
ID: 17030692
Woooooooooooooohooooooo! It *may* be time to panic here!!!!

What this looks like is a number of RAID arrays. Where you have a lot of dead drives!!!!!!! I recommend stopping everything and making a complete backup - It looks as if you may be in quite a serious situation with the chance of loosing data! I don't know how your scsi is configured, but you've got 2 failed drives in each group of 4. Normally for high performance you'd aim for RAID 5:

http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks

This uses 3 drives to store the data. If 1 drive fails, then the data can still be recovered. In your situation, I would suspect that there were 4 drives, (3 of which made up the array, and one was a 'hot spare'). I'd also suspect that the first drive has failed, and been replaced by a hot spare, and now a second drive has failed, meaning that the system is still ...just....functioning with reduced performance.

Not knowing exactly how your setup is configured, and the fact that you have 2 failed/faulty drives in each mpath, does however suggest that I might be wrong, and that the setup was in fact configured as RAID 1 or something else, and perhaps the [failed][faulty] are mis-reported when in fact they are hot-swaps or something else.

Either why, I'd be checking very carefully the hardware specs for my scsi RAID and where it stated [failed][faulty] I'd want to check what the other half of the cluster is doing to it. Perhaps try multipath -l fro mthe other 'half' of the cluster and check that it agrees. In which case, start changing hardware. If the other half doesn't agree, then it is probably a configuration issue.

What you need to ask yourself is - "How important is your oracle database?"...and if it's important, get someone in to assist!

(   (()
(`-' _\
 ''  ''

0
 
LVL 2

Author Comment

by:Kong
ID: 17049259
No, an engineer came around and apparently the fiber channel was configured incorrectly, the HP Eva (3000?) SAN doesn't support dual Active channels, it's now set to Active-Passive however the display still shows:

[root@lu3cduddb2 ~]# multipath -l
mpath2 (3600508b4001026000000600000420000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:3 sdc 8:32  [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:3 sdf 8:80  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:3 sdi 8:128 [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:3 sdl 8:176 [failed][faulty]

mpath1 (3600508b40010260000006000003b0000)
[size=20 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 0:0:0:2 sdb 8:16  [failed][faulty]
\_ round-robin 0 [active]
 \_ 0:0:1:2 sde 8:64  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:2 sdh 8:112 [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:1:2 sdk 8:160 [active][ready]

mpath0 (3600508b4001026000000600000340000)
[size=1 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:1 sda 8:0   [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:1 sdd 8:48  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:1 sdg 8:96  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:1 sdj 8:144 [failed][faulty]

The engineer thinks the driver is faulty because it shouldn't show the [failed][faulty] paths, says it's working as expected...

We're still getting spammed with "Device not ready" messages in /var/log/messages, but at least the two nodes can see the SAN now and I can create my database - for how long, I don't know... It looks a bit suss to me, but I'm not a sys admin...
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

How many times have you wanted to quickly do the same thing to a list but found yourself typing it again and again? I first figured out a small time saver with the up arrow to recall the last command but that can only get you so far if you have a bi…
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now