Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

"Device sdd not ready" in /var/log/messages

Posted on 2006-07-03
5
Medium Priority
?
487 Views
Last Modified: 2013-12-16
Hi all,

We restarted the 2-node cluster (both nodes and SAN) and got a lot of Device not ready messages in /var/log/messages:

Jul  2 04:02:30 lu3cduddb1 syslogd 1.4.1: restart.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdb not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdd not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdf not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdh not ready.
Jul  2 04:02:31 lu3cduddb1 kernel: Device sdj not ready.
...

How do I track down and resolve this problem? Thanks in advance.
 
0
Comment
Question by:Kong
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 22

Expert Comment

by:pjedmond
ID: 17030377
Number of possible reasons behind this:

1.   That the device is not connected:
2.   That the device takes time to 'spin up to speed'
3.   That the device is not accessable for some other reason...rebuilding a RAID array etc.
4.   That the device is a removeable device such as a CD.

Fault-finding:

1.    Look at /etc/fstab                      #Can you identify the device?
2.    cat /proc/scsi/scsi                     # May give you something useful?
3.    cdrecord -scanbus                     #CHecks what's on the scsi bus

The reason for the 3rd one is that I was wondering if you had a CD tower of some sort connected. Any drive without a CD in it could be declared 'not ready'

(   (()
(`-' _\
 ''  ''
0
 
LVL 2

Author Comment

by:Kong
ID: 17030473
Wow, thanks for the very quick response!

Here's the output, I'm not a linux admin by any stretch of the imagination so can't tell what's wrong from the output:

fstab:

[root@lu3cduddb1 ~]# cat /etc/fstab
# This file is edited by fstab-sync - see 'man fstab-sync' for details
LABEL=/                 /                        ext3    defaults        1 1
LABEL=/boot             /boot                    ext3    defaults        1 2
none                    /dev/pts                 devpts  gid=5,mode=620  0 0
none                    /dev/shm                 tmpfs   defaults        0 0
LABEL=/home             /home                    ext3    defaults        1 2
LABEL=/opt              /opt                     ext3    defaults        1 2
none                    /proc                    proc    defaults        0 0
none                    /sys                     sysfs   defaults        0 0
LABEL=/tmp              /tmp                     ext3    defaults        1 2
LABEL=/usr              /usr                     ext3    defaults        1 2
LABEL=/var              /var                     ext3    defaults        1 2
LABEL=SW-cciss/c0d0p2   swap                     swap    defaults        0 0
#
# OCFS RAC File Systems
#
/dev/mpath/mpath0p1     /oracle/RACConfig        ocfs2   _netdev,datavolume 0 0
/dev/mpath/mpath1p1     /oracle/oradata/TEST_DBs ocfs2   _netdev,datavolume 0 0
/dev/mpath/mpath2p1     /oracle/oradata/prod     ocfs2   _netdev,datavolume 0 0
#

/dev/hda                /media/cdrom            auto    pamconsole,exec,noauto,managed 0 0
/dev/fd0                /media/floppy           auto    pamconsole,exec,noauto,managed 0 0

------

[root@lu3cduddb1 ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 00 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 00
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   RAID                             ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 01
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 02
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 03
  Vendor: HP       Model: HSV100           Rev: 3028
  Type:   Direct-Access                    ANSI SCSI revision: 02

-----

[root@lu3cduddb1 ~]# cdrecord -scanbus
Cdrecord-Clone 2.01-dvd (i686-pc-linux-gnu) Copyright (C) 1995-2004 Jörg Schilling
Note: This version is an unofficial (modified) version with DVD support
Note: and therefore may have bugs that are not present in the original.
Note: Please send bug reports or support requests to http://bugzilla.redhat.com/bugzilla
Note: The author of cdrecord should not be bothered with problems in this version.
scsidev: 'ATA'
devname: 'ATA'
scsibus: -2 target: -2 lun: -2
Linux sg driver version: 3.5.27
Using libscg version 'schily-0.8'.
cdrecord: Warning: using inofficial libscg transport code version (schily - Red Hat-scsi-linux-sg.c-1.83-RH '@(#)scsi-linux-sg.c      1.83 04/05/20 Copyright 1997 J. Schilling').
scsibus0:
        0,0,0     0) 'TEAC    ' 'DV-28E-C        ' 'B.4F' Removable CD-ROM
        0,1,0     1) *
        0,2,0     2) *
        0,3,0     3) *
        0,4,0     4) *
        0,5,0     5) *
        0,6,0     6) *
        0,7,0     7) *
0
 
LVL 2

Author Comment

by:Kong
ID: 17030492
Not sure if this output helps, but there shouldn't be any [failed] lines:

[root@lu3cduddb1 ~]# multipath -l
mpath2 (3600508b4001026000000600000420000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:3 sdc 8:32  [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:3 sdf 8:80  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:3 sdi 8:128 [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:3 sdl 8:176 [failed][faulty]

mpath1 (3600508b40010260000006000003b0000)
[size=20 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 0:0:0:2 sdb 8:16  [failed][faulty]
\_ round-robin 0 [active]
 \_ 0:0:1:2 sde 8:64  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:2 sdh 8:112 [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:1:2 sdk 8:160 [active][ready]

mpath0 (3600508b4001026000000600000340000)
[size=1 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:1 sda 8:0   [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:1 sdd 8:48  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:1 sdg 8:96  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:1 sdj 8:144 [failed][faulty]
0
 
LVL 22

Accepted Solution

by:
pjedmond earned 2000 total points
ID: 17030692
Woooooooooooooohooooooo! It *may* be time to panic here!!!!

What this looks like is a number of RAID arrays. Where you have a lot of dead drives!!!!!!! I recommend stopping everything and making a complete backup - It looks as if you may be in quite a serious situation with the chance of loosing data! I don't know how your scsi is configured, but you've got 2 failed drives in each group of 4. Normally for high performance you'd aim for RAID 5:

http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks

This uses 3 drives to store the data. If 1 drive fails, then the data can still be recovered. In your situation, I would suspect that there were 4 drives, (3 of which made up the array, and one was a 'hot spare'). I'd also suspect that the first drive has failed, and been replaced by a hot spare, and now a second drive has failed, meaning that the system is still ...just....functioning with reduced performance.

Not knowing exactly how your setup is configured, and the fact that you have 2 failed/faulty drives in each mpath, does however suggest that I might be wrong, and that the setup was in fact configured as RAID 1 or something else, and perhaps the [failed][faulty] are mis-reported when in fact they are hot-swaps or something else.

Either why, I'd be checking very carefully the hardware specs for my scsi RAID and where it stated [failed][faulty] I'd want to check what the other half of the cluster is doing to it. Perhaps try multipath -l fro mthe other 'half' of the cluster and check that it agrees. In which case, start changing hardware. If the other half doesn't agree, then it is probably a configuration issue.

What you need to ask yourself is - "How important is your oracle database?"...and if it's important, get someone in to assist!

(   (()
(`-' _\
 ''  ''

0
 
LVL 2

Author Comment

by:Kong
ID: 17049259
No, an engineer came around and apparently the fiber channel was configured incorrectly, the HP Eva (3000?) SAN doesn't support dual Active channels, it's now set to Active-Passive however the display still shows:

[root@lu3cduddb2 ~]# multipath -l
mpath2 (3600508b4001026000000600000420000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:3 sdc 8:32  [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:3 sdf 8:80  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:3 sdi 8:128 [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:3 sdl 8:176 [failed][faulty]

mpath1 (3600508b40010260000006000003b0000)
[size=20 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 0:0:0:2 sdb 8:16  [failed][faulty]
\_ round-robin 0 [active]
 \_ 0:0:1:2 sde 8:64  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:2 sdh 8:112 [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:1:2 sdk 8:160 [active][ready]

mpath0 (3600508b4001026000000600000340000)
[size=1 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:1 sda 8:0   [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:1:1 sdd 8:48  [failed][faulty]
\_ round-robin 0 [enabled]
 \_ 1:0:0:1 sdg 8:96  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:1 sdj 8:144 [failed][faulty]

The engineer thinks the driver is faulty because it shouldn't show the [failed][faulty] paths, says it's working as expected...

We're still getting spammed with "Device not ready" messages in /var/log/messages, but at least the two nodes can see the SAN now and I can create my database - for how long, I don't know... It looks a bit suss to me, but I'm not a sys admin...
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

rdate is a Linux command and the network time protocol for immediate date and time setup from another machine. The clocks are synchronized by entering rdate with the -s switch (command without switch just checks the time but does not set anything). …
Over the last ten+ years I have seen Linux configuration tools come and go. In the early days there was the tried-and-true, all-powerful linuxconf that many thought would remain the one and only Linux configuration tool until the end of times. Well,…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Suggested Courses

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question