Solved

Enterprise 450 How to Find the Failed Disk ????

Posted on 2003-10-21
18
649 Views
Last Modified: 2013-12-27
Hi Everyone, I have a failed disk in an Enterprise 450 (it has 20 scsi HDD slots).  I presently have 12 disks set up in a RAID 0:

c0t0d0s7      System Disk i      Mirror *
c0t1d0s0      Conactinated 3      Mirror @
c0t2d0s0      Conactinated 2      Mirror =
c0t3d0s0      Conactinated 1      Mirror =
c2t0d0s0       Conactinated 1      Mirror =
c2t1d0s0      Conactinated 4      Mirror @
c2t2d0s0      Conactinated 3      Mirror @
c2t3d0s0      Conactinated 2      Mirror =
c3t0d0s7       System Disk ii      Mirror *
c3t1d0s0      Conactinated 2      Mirror =
c3t2d0s0      Conactinated 4      Mirror @
c3t3d0s0      Conactinated 1      Mirror =

The disk chassis has a labeled number for each disk slot.  The disks are using slots 0 to 11.  12 to 19 are empty.

Disk in position c0t2d0 is reported as "Invoke: metareplace d9 c0t2d0s7 <new device>"  This is a production server so I can not do much experimentation to find the disk (besides all off line work has to be done on the weekend :-(

My problem is determining which is the failed disk in the HDD chassis.  I can't find a slot numbering diagram for the Enterprise 450 Part Number 600-6363-01 that correlates the scsi bus and scsi drop to the HDD chasis slot number.

Anyone have any ideas?

Here is the applicable output for matastat -i

d9: Mirror
    Submirror 0: d19
      State: Okay
    Submirror 1: d29
      State: Needs maintenance
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 214496640 blocks (102 GB)

d19: Submirror of d9
    State: Okay
    Size: 214496640 blocks (102 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c2t0d0s7          0     No            Okay   Yes
    Stripe 1:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c3t3d0s7          0     No            Okay   Yes
    Stripe 2:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t3d0s7          0     No            Okay   Yes


d29: Submirror of d9
    State: Needs maintenance
    Invoke: metareplace d9 c0t2d0s7 <new device>
    Size: 214496640 blocks (102 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c2t3d0s7          0     No            Okay   Yes
    Stripe 1:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c3t1d0s7          0     No            Okay   Yes
    Stripe 2:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t2d0s7          0     No     Maintenance   Yes
0
Comment
Question by:huffmana
  • 5
  • 5
  • 3
  • +3
18 Comments
 
LVL 18

Expert Comment

by:liddler
ID: 9590592
http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=finfodoc%2F18931&zone_110=18931
should give you the mapping from slot no to pci location and you can marry that with an
ls -l /dev/rdsk/c0t0d0s7
0
 
LVL 38

Expert Comment

by:yuzh
ID: 9590701
Have a look at the following doc, pay attention to the diagram, to see if you can
locate the HD:

http://www.sun.com/products-n-solutions/hardware/docs/html/806-3992-10/disk_hotplug.html
0
 
LVL 18

Assisted Solution

by:liddler
liddler earned 50 total points
ID: 9590730
Found a method here:
http://sunsolve.sun.com/data/805/805-1391/pdf/008.mapping.pdf
ls -l /dev/rdsk/c0t2d0s7
then eg
lrwxrwxrwx   1 root     root          45 Aug 28  2002 /dev/rdsk/c0t2d0s7 -> ../.
./devices/pci@1f,4000/scsi@3/sd@0,0:a,raw

then type
prtconf -vp |grep pci@1f,4000/scsi@3/disk@0
Note we change sd@0 to disk@0
and look for the slot number
0
 
LVL 7

Assisted Solution

by:glassd
glassd earned 50 total points
ID: 9590795
As a double check, run something disk-intesive like "find" on each file system and note which disks are being used. You can often work out the pattern of disk allocation from that.
0
 

Author Comment

by:huffmana
ID: 9591516
Hi Everyone,

Liddler,
The link you sent gave an "Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request."  But I used the ls -l to get the following:

Question.  I though that the c in c0t0d0s7 was the scsi bus number...  That is why I planned the mirror to always be across different scsi busses.  But the listing seems to say that I've used scsi@3 and scsi@2.  What is scsi@2,1?  

# ls -l /dev/rdsk/c0t0d0s7

/dev/rdsk/c0t0d0s7->../../devices/pci@1f,4000/scsi@3/sd@0,0:h,raw
/dev/rdsk/c0t1d0s0->../../devices/pci@1f,4000/scsi@3/sd@1,0:a,raw
/dev/rdsk/c0t2d0s0->../../devices/pci@1f,4000/scsi@3/sd@2,0:a,raw
/dev/rdsk/c0t3d0s0->../../devices/pci@1f,4000/scsi@3/sd@3,0:a,raw
/dev/rdsk/c2t0d0s0->../../devices/pci@6,4000/scsi@2/sd@0,0:a,raw
/dev/rdsk/c2t1d0s0->../../devices/pci@6,4000/scsi@2/sd@1,0:a,raw
/dev/rdsk/c2t2d0s0->../../devices/pci@6,4000/scsi@2/sd@2,0:a,raw
/dev/rdsk/c2t3d0s0->../../devices/pci@6,4000/scsi@2/sd@3,0:a,raw
/dev/rdsk/c3t0d0s7->../../devices/pci@6,4000/scsi@2,1/sd@0,0:h,raw
/dev/rdsk/c3t1d0s0->../../devices/pci@6,4000/scsi@2,1/sd@1,0:a,raw
/dev/rdsk/c3t2d0s0->../../devices/pci@6,4000/scsi@2,1/sd@2,0:a,raw
/dev/rdsk/c3t3d0s0->../../devices/pci@6,4000/scsi@2,1/sd@3,0:a,raw

yuzh,
Then I tried the Sun reference that yuzh sent me and Sun says to look at the output of prtconf -vp.  But, unfortunately, prtconf -vp gives only the following for slot assignments (nothing more - I did a vi on the output):
 pci-slot-skip-list:  'none'

Questiuon: So is this a dead end unless there is a way to fill-in the pci-slot-skip-list....?

Question: Why is there a reference to PCI ?  Are the disk scsi busses connected to the PCI bus?

glassd,
I will be trying the find command on the respective partitions but the failed disk is in a group of 6 disks (3 concatenated disks per submirror).  And the partition is only 49% full.  It may not reach the 3rd disk.

Question: will I still see activity on a concatenated disk if there is no data stored on it?

Thanks everyone for your help.  I really appreciate it!  Allan :-)

0
 

Author Comment

by:huffmana
ID: 9591549
Find Command Results:
Slot 6 and 10 are the system disks (they are always blinking).  Slot 4 blinks like crazy when I execute a find command on the vol1 partition.  So the answer is: no not all the disks are being accessed during the find command.  

What can I do next without taking the system offline?
0
 
LVL 18

Expert Comment

by:liddler
ID: 9591603
prtconf -vp |grep pci@1f,4000/scsi@3/disk@0 |grep slot
will give u the slot number of c0t0d0s7
0
 

Author Comment

by:huffmana
ID: 9591646
Slot 6or10 /dev/rdsk/c0t0d0s7->/pci@1f,4000/scsi@3/sd@0,0:h,raw
                /dev/rdsk/c0t1d0s0->/pci@1f,4000/scsi@3/sd@1,0:a,raw
vol1          /dev/rdsk/c0t2d0s0->/pci@1f,4000/scsi@3/sd@2,0:a,raw
vol1          /dev/rdsk/c0t3d0s0->/pci@1f,4000/scsi@3/sd@3,0:a,raw
Slot 4 or    /dev/rdsk/c2t0d0s0->/pci@6,4000/scsi@2/sd@0,0:a,raw
                /dev/rdsk/c2t1d0s0->/pci@6,4000/scsi@2/sd@1,0:a,raw
                /dev/rdsk/c2t2d0s0->/pci@6,4000/scsi@2/sd@2,0:a,raw
Slot 4        /dev/rdsk/c2t3d0s0->/pci@6,4000/scsi@2/sd@3,0:a,raw
Slot 6or10 /dev/rdsk/c3t0d0s7->/pci@6,4000/scsi@2,1/sd@0,0:h,raw
vol1           /dev/rdsk/c3t1d0s0->/pci@6,4000/scsi@2,1/sd@1,0:a,raw
                 /dev/rdsk/c3t2d0s0->/pci@6,4000/scsi@2,1/sd@2,0:a,raw
vol1           /dev/rdsk/c3t3d0s0->/pci@6,4000/scsi@2,1/sd@3,0:a,raw

I think that this is a summary up to now.  Based on the metastat data:

d19: Submirror of d9
    Stripe 0:         c2t0d0s7          0     No            Okay   Yes
    Stripe 1:        c3t3d0s7          0     No            Okay   Yes
    Stripe 2:        c0t3d0s7          0     No            Okay   Yes
d29: Submirror of d9
    State: Needs maintenance
    Invoke: metareplace d9 c0t2d0s7 <new device>
    Stripe 0:        c2t3d0s7          0     No            Okay   Yes
    Stripe 1:        c3t1d0s7          0     No            Okay   Yes
    Stripe 2:        c0t2d0s7          0     No     Maintenance   Yes
0
 
LVL 1

Accepted Solution

by:
stewbeast earned 200 total points
ID: 9593816
Using the dd command you have some options:

1) cmd> dd if=/dev/rdsk/c0t2d0s2 of=/dev/null
        break out of comand at anytime

        if the disk is semi functional... the HDD LED will light up

2) cmd> dd if=/dev/rdsk/c0t1d0s2 of=/dev/null
        break out of comand at anytime
    cmd> dd if=/dev/rdsk/c0t1d0s2 of=/dev/null
        break out of comand at anytime

        If failed HDD is "TOO BROKEN" to light the LED... then the above command will light the HDD LEDs around it... you can of course do this on all drive for ultimate process of elimination.
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 24

Assisted Solution

by:shivsa
shivsa earned 100 total points
ID: 9596143
Q: What is scsi@2,1?

if u have dual PCI scsi card in one of the slot,  u will see both scsi@2 and scsi@2,1, each representing one of the controllers.

Question: Why is there a reference to PCI ?  Are the disk scsi busses connected to the PCI bus?

PCI in these device names are referring to the PCI type of IO boards, where all these slots are placed. this is the way the full board is connected to the system, within this board your scsi/isp/hme/qfe/glm kind of cards are placed and then those cards are connected to your disks.

Now if u go one by one.

slot 10 referes to the card placed within the board not actual slot number for the disk. so in slot 10 u will see a wire connected to some disks.

The device path /pci@1f,4000/scsi@3 may be reported. This device path references the disk controller built onto the system board that controls the first four internal disk slots in a Sun Enterprise 450 - the bottom four slots.

so these are internal disk slots.
               /dev/rdsk/c0t0d0s7->/pci@1f,4000/scsi@3/sd@0,0:h,raw
                /dev/rdsk/c0t1d0s0->/pci@1f,4000/scsi@3/sd@1,0:a,raw
vol1          /dev/rdsk/c0t2d0s0->/pci@1f,4000/scsi@3/sd@2,0:a,raw
vol1          /dev/rdsk/c0t3d0s0->/pci@1f,4000/scsi@3/sd@3,0:a,raw

these next 4 disks are connected to card in slot 3.
-------
               /dev/rdsk/c2t0d0s0->/pci@6,4000/scsi@2/sd@0,0:a,raw
                /dev/rdsk/c2t1d0s0->/pci@6,4000/scsi@2/sd@1,0:a,raw
                /dev/rdsk/c2t2d0s0->/pci@6,4000/scsi@2/sd@2,0:a,raw
        /dev/rdsk/c2t3d0s0->/pci@6,4000/scsi@2/sd@3,0:a,raw
 /dev/rdsk/c3t0d0s7->/pci@6,4000/scsi@2,1/sd@0,0:h,raw
          /dev/rdsk/c3t1d0s0->/pci@6,4000/scsi@2,1/sd@1,0:a,raw
                 /dev/rdsk/c3t2d0s0->/pci@6,4000/scsi@2,1/sd@2,0:a,raw
           /dev/rdsk/c3t3d0s0->/pci@6,4000/scsi@2,1/sd@3,0:a,raw






0
 
LVL 24

Expert Comment

by:shivsa
ID: 9596154
sorry for unfinshed comment above, network problem.
anyway so u will some disk storage connected to slot 3 card.

sc@0,0 and tell u the target, check your disk storage and u can get the targets number and map all the disks to individual device.
same thing for the first 4 internal disks also.
hope this help.
0
 
LVL 38

Assisted Solution

by:yuzh
yuzh earned 100 total points
ID: 9596790
Here's a better doc for indentifying the Faulty Disk Drive:

Identifying the Faulty Disk Drive
Disk errors may be reported in a number of different ways. Often you can find messages about failing or failed disks in your system console. This information is also logged in the /usr/adm/messages file(s). These error messages typically refer to a failed disk drive by its UNIX physical device name (such as /devices/pci@6,4000/scsi@4,1/sd@3,0) and its UNIX device instance name (such as sd14). In some cases, a faulty disk may be identified by its UNIX logical device name, such as c2t3d0. In addition, some applications may report a disk slot number (0 through 19) or activate an LED located next to the disk drive itself (see Figure 3–3).

details:

http://docs.sun.com/db/doc/806-3992-10/6jd3qmd5q?a=view

0
 
LVL 38

Assisted Solution

by:yuzh
yuzh earned 100 total points
ID: 9596798
Also:

Mapping Between Logical and Physical Device Names :
http://docs.sun.com/db/doc/806-3992-10/6jd3qmd5r?a=view
0
 
LVL 18

Expert Comment

by:liddler
ID: 9597213
What is the output of:
prtconf -vp |grep pci@1f,4000/scsi@3/disk@2 |grep slot
0
 

Author Comment

by:huffmana
ID: 9597382
dd command did it!! :-)  The mapping looks like this:

Slot 0   c0t0d0s7->/pci@1f,4000/scsi@3/sd@0,0:h,raw
Slot 1   c0t1d0s0->/pci@1f,4000/scsi@3/sd@1,0:a,raw
Solt 2   c0t2d0s0->/pci@1f,4000/scsi@3/sd@2,0:a,raw
Solt 3   c0t3d0s0->/pci@1f,4000/scsi@3/sd@3,0:a,raw
Slot 4   c2t0d0s0->/pci@6,4000/scsi@2/sd@0,0:a,raw
Slot 5   c2t1d0s0->/pci@6,4000/scsi@2/sd@1,0:a,raw
Slot 6   c2t2d0s0->/pci@6,4000/scsi@2/sd@2,0:a,raw
Slot 7   c2t3d0s0->/pci@6,4000/scsi@2/sd@3,0:a,raw
Slot 8   c3t0d0s7->/pci@6,4000/scsi@2,1/sd@0,0:h,raw
Solt 9   c3t1d0s0->/pci@6,4000/scsi@2,1/sd@1,0:a,raw
Slot 10 c3t2d0s0->/pci@6,4000/scsi@2,1/sd@2,0:a,raw
Solt 11 c3t3d0s0->/pci@6,4000/scsi@2,1/sd@3,0:a,raw

Note 1: When the dd command runs the light turns OFF.  Also, slot 0 turned off a short time and then turned right back on.  The terminial shows that the process stopped as follows:
# dd if=/dev/rdsk/c0t0d0s7 of=/dev/null
40320+0 records in
40320+0 records out

Since it is not the failed disk (c0t2d0) I assume that it is the active system disk and did not like being scanned....

Note 2: Since the "failed disk" acted the same as all the other disks I'm wondering if it is really failed or just needs to be reseated or something.  Perhaps tomorrow evening I can stay late and remove it and put it back in....
0
 
LVL 1

Expert Comment

by:stewbeast
ID: 9597998
sweeeeeet...  one of my 450's did this about 4 months ago... kinda a pain

because of issues like this, SUN added some handiness to openboot

you can use "disk-led-assoc" with setenv at the openboot prompt to map disks and controllers to "led flags" in the prtdiag display.  It is something you have to take the time to setup BEFORE a problem arises, but will save you time when this happens again.

for info on this: search for "Enterprise 450 diag" on docs.sun.com
0
 

Author Comment

by:huffmana
ID: 9598999
Another way that I found is #format>analyze>test
The light blinks very quickly on the selected disk :-)

The dd command was the one that solved my problem but everyone was very helpful!  Is it OK if I split the points with
40% stewbeast
20% yuzh
20% shivsa
10% lidder
10% glassd

I don't know this experts-exchange GUI very well and may make a mistake allocating points.  I also don't know how to close a question.  I'll have to do some reading... it would be good to accept multiple answers (several of the methods could have worked).

I should like to give everyone who helped a million points....
0
 
LVL 18

Expert Comment

by:liddler
ID: 9599043
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

My previous tech tip, Installing the Solaris OS From the Flash Archive On a Tape (http://www.experts-exchange.com/articles/OS/Unix/Solaris/Installing-the-Solaris-OS-From-the-Flash-Archive-on-a-Tape.html), discussed installing the Solaris Operating S…
Using libpcap/Jpcap to capture and send packets on Solaris version (10/11) Library used: 1.      Libpcap (http://www.tcpdump.org) Version 1.2 2.      Jpcap(http://netresearch.ics.uci.edu/kfujii/Jpcap/doc/index.html) Version 0.6 Prerequisite: 1.      GCC …
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now