Solved

"Device sdd not ready" in /var/log/messages

Posted on 2006-07-03
5
484 Views
Last Modified: 2013-12-16
Hi all,

We restarted the 2-node cluster (both nodes and SAN) and got a lot of Device not ready messages in /var/log/messages:

Jul  2 04:02:30 lu3cduddb1 syslogd 1.4.1: restart.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdb not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdd not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdf not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdh not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdj not ready.
...

How do I track down and resolve this problem? Thanks in advance.
 
0
Comment
Question by:Kong
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 22

Expert Comment

by:pjedmond
ID: 17030377
Number of possible reasons behind this:

1.   That the device is not connected:
2.   That the device takes time to 'spin up to speed'
3.   That the device is not accessable for some other reason...rebuilding a RAID array etc.
4.   That the device is a removeable device such as a CD.

Fault-finding:

1.    Look at /etc/fstab                      #Can you identify the device?
2.    cat /proc/scsi/scsi                     # May give you something useful?
3.    cdrecord -scanbus                     #CHecks what's on the scsi bus

The reason for the 3rd one is that I was wondering if you had a CD tower of some sort connected. Any drive without a CD in it could be declared 'not ready'

(   (()
(`-' _\
 ''  ''
0
 
LVL 2

Author Comment

by:Kong
ID: 17030473
Wow, thanks for the very quick response!

Here's the output, I'm not a linux admin by any stretch of the imagination so can't tell what's wrong from the output:

fstab:

[root@lu3cduddb1 ~]# cat /etc/fstab
# This file is edited by fstab-sync - see 'man fstab-sync' for details
LABEL=/                 /                        ext3    defaults        1 1
LABEL=/boot             /boot                    ext3    defaults        1 2
none                    /dev/pts                 devpts  gid=5,mode=620  0 0
none                    /dev/shm                 tmpfs   defaults        0 0
LABEL=/home             /home                    ext3    defaults        1 2
LABEL=/opt              /opt                     ext3    defaults        1 2
none                    /proc                    proc    defaults        0 0
none                    /sys                     sysfs   defaults        0 0
LABEL=/tmp              /tmp                     ext3    defaults        1 2
LABEL=/usr              /usr                     ext3    defaults        1 2
LABEL=/var              /var                     ext3    defaults        1 2
LABEL=SW-cciss/c0d0p2   swap                     swap    defaults        0 0
#
# OCFS RAC File Systems
#
/dev/mpath/mpath0p1     /oracle/RACConfig        ocfs2   _netdev,datavolume 0 0
/dev/mpath/mpath1p1     /oracle/oradata/TEST_DBs ocfs2   _netdev,datavolume 0 0
/dev/mpath/mpath2p1     /oracle/oradata/prod     ocfs2   _netdev,datavolume 0 0
#

/dev/hda                /media/cdrom            auto    pamconsole,exec,noauto,managed 0 0
/dev/fd0                /media/floppy           auto    pamconsole,exec,noauto,managed 0 0

------

[root@lu3cduddb1 ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02

-----

[root@lu3cduddb1 ~]# cdrecord -scanbus
Cdrecord-Clone 2.01-dvd (i686-pc-linux-gnu) Copyright (C) 1995-2004 Jörg Schilling
Note: This version is an unofficial (modified) version with DVD support
Note: and therefore may have bugs that are not present in the original.
Note: Please send bug reports or support requests to http://bugzilla.redhat.com/bugzilla
Note: The author of cdrecord should not be bothered with problems in this version.
scsidev: 'ATA'
devname: 'ATA'
scsibus: -2 target: -2 lun: -2
Linux sg driver version: 3.5.27
Using libscg version 'schily-0.8'.
cdrecord: Warning: using inofficial libscg transport code version (schily - Red Hat-scsi-linux-sg.c-1.83-RH '@(#)scsi-linux-sg.c      1.83 04/05/20 Copyright 1997 J. Schilling').
scsibus0:
        0,0,0     0) 'TEAC    ' 'DV-28E-C        ' 'B.4F' Removable CD-ROM
        0,1,0     1) *
        0,2,0     2) *
        0,3,0     3) *
        0,4,0     4) *
        0,5,0     5) *
        0,6,0     6) *
        0,7,0     7) *
0
 
LVL 2

Author Comment

by:Kong
ID: 17030492
Not sure if this output helps, but there shouldn't be any [failed] lines:

[root@lu3cduddb1 ~]# multipath -l
mpath2 (3600508b4001026000000600000420000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:3 sdc 8:32  [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:3 sdf 8:80  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:3 sdi 8:128 [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:3 sdl 8:176 [failed][faulty]

mpath1 (3600508b40010260000006000003b0000)
[size=20 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 0:0:0:2 sdb 8:16  [failed][faulty]
\_ round-robin 0 [active]
 \_ 0:0:1:2 sde 8:64  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:2 sdh 8:112 [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:1:2 sdk 8:160 [active][ready]

mpath0 (3600508b4001026000000600000340000)
[size=1 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:1 sda 8:0   [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:1 sdd 8:48  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:1 sdg 8:96  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:1 sdj 8:144 [failed][faulty]
0
 
LVL 22

Accepted Solution

by:
pjedmond earned 500 total points
ID: 17030692
Woooooooooooooohooooooo! It *may* be time to panic here!!!!

What this looks like is a number of RAID arrays. Where you have a lot of dead drives!!!!!!! I recommend stopping everything and making a complete backup - It looks as if you may be in quite a serious situation with the chance of loosing data! I don't know how your scsi is configured, but you've got 2 failed drives in each group of 4. Normally for high performance you'd aim for RAID 5:

http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks

This uses 3 drives to store the data. If 1 drive fails, then the data can still be recovered. In your situation, I would suspect that there were 4 drives, (3 of which made up the array, and one was a 'hot spare'). I'd also suspect that the first drive has failed, and been replaced by a hot spare, and now a second drive has failed, meaning that the system is still ...just....functioning with reduced performance.

Not knowing exactly how your setup is configured, and the fact that you have 2 failed/faulty drives in each mpath, does however suggest that I might be wrong, and that the setup was in fact configured as RAID 1 or something else, and perhaps the [failed][faulty] are mis-reported when in fact they are hot-swaps or something else.

Either why, I'd be checking very carefully the hardware specs for my scsi RAID and where it stated [failed][faulty] I'd want to check what the other half of the cluster is doing to it. Perhaps try multipath -l fro mthe other 'half' of the cluster and check that it agrees. In which case, start changing hardware. If the other half doesn't agree, then it is probably a configuration issue.

What you need to ask yourself is - "How important is your oracle database?"...and if it's important, get someone in to assist!

(   (()
(`-' _\
 ''  ''

0
 
LVL 2

Author Comment

by:Kong
ID: 17049259
No, an engineer came around and apparently the fiber channel was configured incorrectly, the HP Eva (3000?) SAN doesn't support dual Active channels, it's now set to Active-Passive however the display still shows:

[root@lu3cduddb2 ~]# multipath -l
mpath2 (3600508b4001026000000600000420000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:3 sdc 8:32  [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:3 sdf 8:80  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:3 sdi 8:128 [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:3 sdl 8:176 [failed][faulty]

mpath1 (3600508b40010260000006000003b0000)
[size=20 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 0:0:0:2 sdb 8:16  [failed][faulty]
\_ round-robin 0 [active]
 \_ 0:0:1:2 sde 8:64  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:2 sdh 8:112 [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:1:2 sdk 8:160 [active][ready]

mpath0 (3600508b4001026000000600000340000)
[size=1 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:1 sda 8:0   [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:1 sdd 8:48  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:1 sdg 8:96  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:1 sdj 8:144 [failed][faulty]

The engineer thinks the driver is faulty because it shouldn't show the [failed][faulty] paths, says it's working as expected...

We're still getting spammed with "Device not ready" messages in /var/log/messages, but at least the two nodes can see the SAN now and I can create my database - for how long, I don't know... It looks a bit suss to me, but I'm not a sys admin...
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Network Interface Card (NIC) bonding, also known as link aggregation, NIC teaming and trunking, is an important concept to understand and implement in any environment where high availability is of concern. Using this feature, a server administrator …
How many times have you wanted to quickly do the same thing to a list but found yourself typing it again and again? I first figured out a small time saver with the up arrow to recall the last command but that can only get you so far if you have a bi…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

696 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question