WARNING: md: d1: read error on /dev/dsk/c1t0d0s0

Our Sun Fire V480 server is reporting errors in the message log.  The server is running solaris 9, 4x900 MHz; 16GB memory and 2x36 GB hard drives (mirrored).

The message log reports the following:  

Aug  8 08:03:47 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug  8 08:03:47 cells   FCP: WWN 0x21000004cf9666db reset successfully
Aug  8 08:03:54 cells md_stripe: [ID 641072 kern.warning] WARNING: md: d1: read
error on /dev/dsk/c1t0d0s0
Aug  8 08:03:54 cells md_mirror: [ID 104909 kern.warning] WARNING: md: d1: /dev/
dsk/c1t0d0s0 needs maintenance
Aug  8 08:19:57 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug  8 08:19:57 cells   FCP: WWN 0x21000004cf9666db reset successfully
Aug  8 08:20:01 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug  8 08:20:01 cells   FCP: WWN 0x21000004cf9666db reset successfully
Aug  8 08:20:01 cells md_stripe: [ID 641072 kern.warning] WARNING: md: d3: read
error on /dev/dsk/c1t0d0s5
Aug  8 08:20:08 cells last message repeated 1 time
Aug  8 08:20:08 cells md_mirror: [ID 104909 kern.warning] WARNING: md: d3: /dev/
dsk/c1t0d0s5 needs maintenance

AND ERRORS REPORTED ON 8/15
-----------------------------------------
Aug 15 08:36:05 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 15 08:36:05 cells   FCP: WWN 0x21000004cf9666db reset successfully
Aug 15 08:36:05 cells scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0/ssd@w21000004cf9666db,0 (ssd1):
Aug 15 08:36:05 cells   SCSI transport failed: reason 'reset': retrying command
Aug 15 08:36:21 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 15 08:36:21 cells   FCP: WWN 0x21000004cf9666db reset successfully
Aug 15 08:36:24 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 15 08:36:24 cells   FCP: WWN 0x21000004cf9666db reset successfully
Aug 15 08:36:25 cells md_stripe: [ID 641072 kern.warning] WARNING: md: d6: read
error on /dev/dsk/c1t0d0s3

Also, Solaris management console does not work any more.  When you start the  program, it says " starting the server for the first time. May take a few minutes.  Please allow configuation to continue until you see " Welcome to the Solaris Management Console."

Can someone tell me what these errors mean?  Can this be fixed without loosing data on the FS.

Help!
dee43Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

rugdogCommented:
seems like one of your hard drives in the mirror set is failing. you shouldn't loose data because they're mirrored, but certainly you have to check that failing disk. is it hot-swappable?
0
madan1278Commented:
U get these type of err's even if ur disk actually not failed ...

d0 -m d1 d2
d1 1 1 c0t0d0s0
d2 1 1 c0t1d0s0

When u run metastat d0 .. u will actualy fing metasync command for d2 and metareplace command on d1. In this case clear the mirrors and recreate it..

in /etc/lvm/md.cf comment the above 3 lines and run metaroot /dev/dsk/c0t0d0s0 and issue reboot -- -s

then run metaclear -r d0 followed by the below commands

metainit -f d1 1 1 c0t0d0s0
metainit d2 c0t1d0s0
metainit d0 -m d2
metaroot d0
lockfs -fa
reboot
metattach d0 d1 (Note: - U r making primary mirror as d2 and secondary as d1)
metastat d2 (It should be ok now)

If u wanted to make d1 as primary and d2 as secondary follow the same procedure again ..

Thanks
Mada

0
dee43Author Commented:
Why can't I open the Solaris Management console?
0
Rowby Goren Makes an Impact on Screen and Online

Learn about longtime user Rowby Goren and his great contributions to the site. We explore his method for posing questions that are likely to yield a solution, and take a look at how his career transformed from a Hollywood writer to a website entrepreneur.

Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
a) What shows
     metastat -p      (shows your setup)

I' assume this:
d0 -m d1 d2
d1 1 1 c1t0d0s0
d2 1 1 c1t1d0s0
d5 -m d3 d4
d3 1 1 c1t0d0s0
d5 1 1 c1t1d0s0

b) check status of mirrored metadevice d0 (d5 repectively)
metastat d0

If you get a message like
" ... maintenance... "
and someting like
" ... run metareplace c1t1d0s0 <device2>"

Try first to resync with
metareplace d0 -e c1t1d0s0

Could you post output from "metastat -p" please?
0
dee43Author Commented:
JustUNIX,

Here's the output from metastat -p and metastat d0.

metastat -p
d20 -m d16 d18 1
d16 1 2 c4t3d0s7 c4t4d0s7 -i 32b
d18 1 1 c4t6d0s6
d19 -m d15 d17 1
d15 1 2 c4t1d0s5 c4t2d0s5 -i 32b
d17 1 1 c4t5d0s6
d14 -m d12 d13 1
d12 1 1 c1t0d0s1
d13 1 1 c1t1d0s1
d11 -m d9 d10 1
d9 1 1 c1t0d0s4
d10 1 1 c1t1d0s4
d8 -m d6 d7 1
d6 1 1 c1t0d0s3
d7 1 1 c1t1d0s3
d5 -m d3 d4 1
d3 1 1 c1t0d0s5
d4 1 1 c1t1d0s5
d0 -m d1 d2 1
d1 1 1 c1t0d0s0
d2 1 1 c1t1d0s0
hsp001

---------------------------

 metastat d0
d0: Mirror
    Submirror 0: d1
      State: Needs maintenance
    Submirror 1: d2
      State: Okay        
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 6292242 blocks

d1: Submirror of d0
    State: Needs maintenance
    Invoke: metareplace d0 c1t0d0s0 <new device>
    Size: 6292242 blocks
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t0d0s0          0     No     Maintenance   Yes


d2: Submirror of d0
    State: Okay        
    Size: 6292242 blocks
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t1d0s0          0     No            Okay   Yes


Device Relocation Information:
Device   Reloc  Device ID
c1t0d0   Yes    id1,ssd@w20000004cf9666db
c1t1d0   Yes    id1,ssd@w20000004cf966034

---------------------------------
0
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
Ok, that's not too bad. Now try
 metareplace d0 -e c1t0d0s0
and see if it resynchronizes the mirror ... (It should copy data from d2 to d1)
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Unix OS

From novice to tech pro — start learning today.