Solved

"Device sdd not ready" in /var/log/messages

Posted on 2006-07-03
5
481 Views
Last Modified: 2013-12-16
Hi all,

We restarted the 2-node cluster (both nodes and SAN) and got a lot of Device not ready messages in /var/log/messages:

Jul  2 04:02:30 lu3cduddb1 syslogd 1.4.1: restart.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdb not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdd not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdf not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdh not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdj not ready.
...

How do I track down and resolve this problem? Thanks in advance.
 
0
Comment
Question by:Kong
  • 3
  • 2
5 Comments
 
LVL 22

Expert Comment

by:pjedmond
ID: 17030377
Number of possible reasons behind this:

1.   That the device is not connected:
2.   That the device takes time to 'spin up to speed'
3.   That the device is not accessable for some other reason...rebuilding a RAID array etc.
4.   That the device is a removeable device such as a CD.

Fault-finding:

1.    Look at /etc/fstab                      #Can you identify the device?
2.    cat /proc/scsi/scsi                     # May give you something useful?
3.    cdrecord -scanbus                     #CHecks what's on the scsi bus

The reason for the 3rd one is that I was wondering if you had a CD tower of some sort connected. Any drive without a CD in it could be declared 'not ready'

(   (()
(`-' _\
 ''  ''
0
 
LVL 2

Author Comment

by:Kong
ID: 17030473
Wow, thanks for the very quick response!

Here's the output, I'm not a linux admin by any stretch of the imagination so can't tell what's wrong from the output:

fstab:

[root@lu3cduddb1 ~]# cat /etc/fstab
# This file is edited by fstab-sync - see 'man fstab-sync' for details
LABEL=/                 /                        ext3    defaults        1 1
LABEL=/boot             /boot                    ext3    defaults        1 2
none                    /dev/pts                 devpts  gid=5,mode=620  0 0
none                    /dev/shm                 tmpfs   defaults        0 0
LABEL=/home             /home                    ext3    defaults        1 2
LABEL=/opt              /opt                     ext3    defaults        1 2
none                    /proc                    proc    defaults        0 0
none                    /sys                     sysfs   defaults        0 0
LABEL=/tmp              /tmp                     ext3    defaults        1 2
LABEL=/usr              /usr                     ext3    defaults        1 2
LABEL=/var              /var                     ext3    defaults        1 2
LABEL=SW-cciss/c0d0p2   swap                     swap    defaults        0 0
#
# OCFS RAC File Systems
#
/dev/mpath/mpath0p1     /oracle/RACConfig        ocfs2   _netdev,datavolume 0 0
/dev/mpath/mpath1p1     /oracle/oradata/TEST_DBs ocfs2   _netdev,datavolume 0 0
/dev/mpath/mpath2p1     /oracle/oradata/prod     ocfs2   _netdev,datavolume 0 0
#

/dev/hda                /media/cdrom            auto    pamconsole,exec,noauto,managed 0 0
/dev/fd0                /media/floppy           auto    pamconsole,exec,noauto,managed 0 0

------

[root@lu3cduddb1 ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02

-----

[root@lu3cduddb1 ~]# cdrecord -scanbus
Cdrecord-Clone 2.01-dvd (i686-pc-linux-gnu) Copyright (C) 1995-2004 Jörg Schilling
Note: This version is an unofficial (modified) version with DVD support
Note: and therefore may have bugs that are not present in the original.
Note: Please send bug reports or support requests to http://bugzilla.redhat.com/bugzilla
Note: The author of cdrecord should not be bothered with problems in this version.
scsidev: 'ATA'
devname: 'ATA'
scsibus: -2 target: -2 lun: -2
Linux sg driver version: 3.5.27
Using libscg version 'schily-0.8'.
cdrecord: Warning: using inofficial libscg transport code version (schily - Red Hat-scsi-linux-sg.c-1.83-RH '@(#)scsi-linux-sg.c      1.83 04/05/20 Copyright 1997 J. Schilling').
scsibus0:
        0,0,0     0) 'TEAC    ' 'DV-28E-C        ' 'B.4F' Removable CD-ROM
        0,1,0     1) *
        0,2,0     2) *
        0,3,0     3) *
        0,4,0     4) *
        0,5,0     5) *
        0,6,0     6) *
        0,7,0     7) *
0
 
LVL 2

Author Comment

by:Kong
ID: 17030492
Not sure if this output helps, but there shouldn't be any [failed] lines:

[root@lu3cduddb1 ~]# multipath -l
mpath2 (3600508b4001026000000600000420000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:3 sdc 8:32  [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:3 sdf 8:80  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:3 sdi 8:128 [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:3 sdl 8:176 [failed][faulty]

mpath1 (3600508b40010260000006000003b0000)
[size=20 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 0:0:0:2 sdb 8:16  [failed][faulty]
\_ round-robin 0 [active]
 \_ 0:0:1:2 sde 8:64  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:2 sdh 8:112 [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:1:2 sdk 8:160 [active][ready]

mpath0 (3600508b4001026000000600000340000)
[size=1 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:1 sda 8:0   [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:1 sdd 8:48  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:1 sdg 8:96  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:1 sdj 8:144 [failed][faulty]
0
 
LVL 22

Accepted Solution

by:
pjedmond earned 500 total points
ID: 17030692
Woooooooooooooohooooooo! It *may* be time to panic here!!!!

What this looks like is a number of RAID arrays. Where you have a lot of dead drives!!!!!!! I recommend stopping everything and making a complete backup - It looks as if you may be in quite a serious situation with the chance of loosing data! I don't know how your scsi is configured, but you've got 2 failed drives in each group of 4. Normally for high performance you'd aim for RAID 5:

http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks

This uses 3 drives to store the data. If 1 drive fails, then the data can still be recovered. In your situation, I would suspect that there were 4 drives, (3 of which made up the array, and one was a 'hot spare'). I'd also suspect that the first drive has failed, and been replaced by a hot spare, and now a second drive has failed, meaning that the system is still ...just....functioning with reduced performance.

Not knowing exactly how your setup is configured, and the fact that you have 2 failed/faulty drives in each mpath, does however suggest that I might be wrong, and that the setup was in fact configured as RAID 1 or something else, and perhaps the [failed][faulty] are mis-reported when in fact they are hot-swaps or something else.

Either why, I'd be checking very carefully the hardware specs for my scsi RAID and where it stated [failed][faulty] I'd want to check what the other half of the cluster is doing to it. Perhaps try multipath -l fro mthe other 'half' of the cluster and check that it agrees. In which case, start changing hardware. If the other half doesn't agree, then it is probably a configuration issue.

What you need to ask yourself is - "How important is your oracle database?"...and if it's important, get someone in to assist!

(   (()
(`-' _\
 ''  ''

0
 
LVL 2

Author Comment

by:Kong
ID: 17049259
No, an engineer came around and apparently the fiber channel was configured incorrectly, the HP Eva (3000?) SAN doesn't support dual Active channels, it's now set to Active-Passive however the display still shows:

[root@lu3cduddb2 ~]# multipath -l
mpath2 (3600508b4001026000000600000420000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:3 sdc 8:32  [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:3 sdf 8:80  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:3 sdi 8:128 [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:3 sdl 8:176 [failed][faulty]

mpath1 (3600508b40010260000006000003b0000)
[size=20 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 0:0:0:2 sdb 8:16  [failed][faulty]
\_ round-robin 0 [active]
 \_ 0:0:1:2 sde 8:64  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:2 sdh 8:112 [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:1:2 sdk 8:160 [active][ready]

mpath0 (3600508b4001026000000600000340000)
[size=1 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:1 sda 8:0   [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:1 sdd 8:48  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:1 sdg 8:96  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:1 sdj 8:144 [failed][faulty]

The engineer thinks the driver is faulty because it shouldn't show the [failed][faulty] paths, says it's working as expected...

We're still getting spammed with "Device not ready" messages in /var/log/messages, but at least the two nodes can see the SAN now and I can create my database - for how long, I don't know... It looks a bit suss to me, but I'm not a sys admin...
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Why VNC doesn't work in Redhat? 11 61
CSVDE Novell Directory -a password option isn't working keeps prompting 2 31
Adding more CPU cores to a Linux VM 5 94
Linux VM 6 88
How many times have you wanted to quickly do the same thing to a list but found yourself typing it again and again? I first figured out a small time saver with the up arrow to recall the last command but that can only get you so far if you have a bi…
SSH (Secure Shell) - Tips and Tricks As you all know SSH(Secure Shell) is a network protocol, which we use to access/transfer files securely between two networked devices. SSH was actually designed as a replacement for insecure protocols that sen…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question