dee43
asked on
WARNING: md: d1: read error on /dev/dsk/c1t0d0s0
Our Sun Fire V480 server is reporting errors in the message log. The server is running solaris 9, 4x900 MHz; 16GB memory and 2x36 GB hard drives (mirrored).
The message log reports the following:
Aug 8 08:03:47 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 8 08:03:47 cells FCP: WWN 0x21000004cf9666db reset successfully
Aug 8 08:03:54 cells md_stripe: [ID 641072 kern.warning] WARNING: md: d1: read
error on /dev/dsk/c1t0d0s0
Aug 8 08:03:54 cells md_mirror: [ID 104909 kern.warning] WARNING: md: d1: /dev/
dsk/c1t0d0s0 needs maintenance
Aug 8 08:19:57 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 8 08:19:57 cells FCP: WWN 0x21000004cf9666db reset successfully
Aug 8 08:20:01 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 8 08:20:01 cells FCP: WWN 0x21000004cf9666db reset successfully
Aug 8 08:20:01 cells md_stripe: [ID 641072 kern.warning] WARNING: md: d3: read
error on /dev/dsk/c1t0d0s5
Aug 8 08:20:08 cells last message repeated 1 time
Aug 8 08:20:08 cells md_mirror: [ID 104909 kern.warning] WARNING: md: d3: /dev/
dsk/c1t0d0s5 needs maintenance
AND ERRORS REPORTED ON 8/15
-------------------------- ---------- -----
Aug 15 08:36:05 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 15 08:36:05 cells FCP: WWN 0x21000004cf9666db reset successfully
Aug 15 08:36:05 cells scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0/ssd@w2100000 4cf9666db, 0 (ssd1):
Aug 15 08:36:05 cells SCSI transport failed: reason 'reset': retrying command
Aug 15 08:36:21 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 15 08:36:21 cells FCP: WWN 0x21000004cf9666db reset successfully
Aug 15 08:36:24 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 15 08:36:24 cells FCP: WWN 0x21000004cf9666db reset successfully
Aug 15 08:36:25 cells md_stripe: [ID 641072 kern.warning] WARNING: md: d6: read
error on /dev/dsk/c1t0d0s3
Also, Solaris management console does not work any more. When you start the program, it says " starting the server for the first time. May take a few minutes. Please allow configuation to continue until you see " Welcome to the Solaris Management Console."
Can someone tell me what these errors mean? Can this be fixed without loosing data on the FS.
Help!
The message log reports the following:
Aug 8 08:03:47 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 8 08:03:47 cells FCP: WWN 0x21000004cf9666db reset successfully
Aug 8 08:03:54 cells md_stripe: [ID 641072 kern.warning] WARNING: md: d1: read
error on /dev/dsk/c1t0d0s0
Aug 8 08:03:54 cells md_mirror: [ID 104909 kern.warning] WARNING: md: d1: /dev/
dsk/c1t0d0s0 needs maintenance
Aug 8 08:19:57 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 8 08:19:57 cells FCP: WWN 0x21000004cf9666db reset successfully
Aug 8 08:20:01 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 8 08:20:01 cells FCP: WWN 0x21000004cf9666db reset successfully
Aug 8 08:20:01 cells md_stripe: [ID 641072 kern.warning] WARNING: md: d3: read
error on /dev/dsk/c1t0d0s5
Aug 8 08:20:08 cells last message repeated 1 time
Aug 8 08:20:08 cells md_mirror: [ID 104909 kern.warning] WARNING: md: d3: /dev/
dsk/c1t0d0s5 needs maintenance
AND ERRORS REPORTED ON 8/15
--------------------------
Aug 15 08:36:05 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 15 08:36:05 cells FCP: WWN 0x21000004cf9666db reset successfully
Aug 15 08:36:05 cells scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0/ssd@w2100000
Aug 15 08:36:05 cells SCSI transport failed: reason 'reset': retrying command
Aug 15 08:36:21 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 15 08:36:21 cells FCP: WWN 0x21000004cf9666db reset successfully
Aug 15 08:36:24 cells scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW
,qlc@2/fp@0,0 (fcp0):
Aug 15 08:36:24 cells FCP: WWN 0x21000004cf9666db reset successfully
Aug 15 08:36:25 cells md_stripe: [ID 641072 kern.warning] WARNING: md: d6: read
error on /dev/dsk/c1t0d0s3
Also, Solaris management console does not work any more. When you start the program, it says " starting the server for the first time. May take a few minutes. Please allow configuation to continue until you see " Welcome to the Solaris Management Console."
Can someone tell me what these errors mean? Can this be fixed without loosing data on the FS.
Help!
seems like one of your hard drives in the mirror set is failing. you shouldn't loose data because they're mirrored, but certainly you have to check that failing disk. is it hot-swappable?
U get these type of err's even if ur disk actually not failed ...
d0 -m d1 d2
d1 1 1 c0t0d0s0
d2 1 1 c0t1d0s0
When u run metastat d0 .. u will actualy fing metasync command for d2 and metareplace command on d1. In this case clear the mirrors and recreate it..
in /etc/lvm/md.cf comment the above 3 lines and run metaroot /dev/dsk/c0t0d0s0 and issue reboot -- -s
then run metaclear -r d0 followed by the below commands
metainit -f d1 1 1 c0t0d0s0
metainit d2 c0t1d0s0
metainit d0 -m d2
metaroot d0
lockfs -fa
reboot
metattach d0 d1 (Note: - U r making primary mirror as d2 and secondary as d1)
metastat d2 (It should be ok now)
If u wanted to make d1 as primary and d2 as secondary follow the same procedure again ..
Thanks
Mada
d0 -m d1 d2
d1 1 1 c0t0d0s0
d2 1 1 c0t1d0s0
When u run metastat d0 .. u will actualy fing metasync command for d2 and metareplace command on d1. In this case clear the mirrors and recreate it..
in /etc/lvm/md.cf comment the above 3 lines and run metaroot /dev/dsk/c0t0d0s0 and issue reboot -- -s
then run metaclear -r d0 followed by the below commands
metainit -f d1 1 1 c0t0d0s0
metainit d2 c0t1d0s0
metainit d0 -m d2
metaroot d0
lockfs -fa
reboot
metattach d0 d1 (Note: - U r making primary mirror as d2 and secondary as d1)
metastat d2 (It should be ok now)
If u wanted to make d1 as primary and d2 as secondary follow the same procedure again ..
Thanks
Mada
ASKER
Why can't I open the Solaris Management console?
a) What shows
metastat -p (shows your setup)
I' assume this:
d0 -m d1 d2
d1 1 1 c1t0d0s0
d2 1 1 c1t1d0s0
d5 -m d3 d4
d3 1 1 c1t0d0s0
d5 1 1 c1t1d0s0
b) check status of mirrored metadevice d0 (d5 repectively)
metastat d0
If you get a message like
" ... maintenance... "
and someting like
" ... run metareplace c1t1d0s0 <device2>"
Try first to resync with
metareplace d0 -e c1t1d0s0
Could you post output from "metastat -p" please?
metastat -p (shows your setup)
I' assume this:
d0 -m d1 d2
d1 1 1 c1t0d0s0
d2 1 1 c1t1d0s0
d5 -m d3 d4
d3 1 1 c1t0d0s0
d5 1 1 c1t1d0s0
b) check status of mirrored metadevice d0 (d5 repectively)
metastat d0
If you get a message like
" ... maintenance... "
and someting like
" ... run metareplace c1t1d0s0 <device2>"
Try first to resync with
metareplace d0 -e c1t1d0s0
Could you post output from "metastat -p" please?
ASKER
JustUNIX,
Here's the output from metastat -p and metastat d0.
metastat -p
d20 -m d16 d18 1
d16 1 2 c4t3d0s7 c4t4d0s7 -i 32b
d18 1 1 c4t6d0s6
d19 -m d15 d17 1
d15 1 2 c4t1d0s5 c4t2d0s5 -i 32b
d17 1 1 c4t5d0s6
d14 -m d12 d13 1
d12 1 1 c1t0d0s1
d13 1 1 c1t1d0s1
d11 -m d9 d10 1
d9 1 1 c1t0d0s4
d10 1 1 c1t1d0s4
d8 -m d6 d7 1
d6 1 1 c1t0d0s3
d7 1 1 c1t1d0s3
d5 -m d3 d4 1
d3 1 1 c1t0d0s5
d4 1 1 c1t1d0s5
d0 -m d1 d2 1
d1 1 1 c1t0d0s0
d2 1 1 c1t1d0s0
hsp001
-------------------------- -
metastat d0
d0: Mirror
Submirror 0: d1
State: Needs maintenance
Submirror 1: d2
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 6292242 blocks
d1: Submirror of d0
State: Needs maintenance
Invoke: metareplace d0 c1t0d0s0 <new device>
Size: 6292242 blocks
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s0 0 No Maintenance Yes
d2: Submirror of d0
State: Okay
Size: 6292242 blocks
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 0 No Okay Yes
Device Relocation Information:
Device Reloc Device ID
c1t0d0 Yes id1,ssd@w20000004cf9666db
c1t1d0 Yes id1,ssd@w20000004cf966034
-------------------------- -------
Here's the output from metastat -p and metastat d0.
metastat -p
d20 -m d16 d18 1
d16 1 2 c4t3d0s7 c4t4d0s7 -i 32b
d18 1 1 c4t6d0s6
d19 -m d15 d17 1
d15 1 2 c4t1d0s5 c4t2d0s5 -i 32b
d17 1 1 c4t5d0s6
d14 -m d12 d13 1
d12 1 1 c1t0d0s1
d13 1 1 c1t1d0s1
d11 -m d9 d10 1
d9 1 1 c1t0d0s4
d10 1 1 c1t1d0s4
d8 -m d6 d7 1
d6 1 1 c1t0d0s3
d7 1 1 c1t1d0s3
d5 -m d3 d4 1
d3 1 1 c1t0d0s5
d4 1 1 c1t1d0s5
d0 -m d1 d2 1
d1 1 1 c1t0d0s0
d2 1 1 c1t1d0s0
hsp001
--------------------------
metastat d0
d0: Mirror
Submirror 0: d1
State: Needs maintenance
Submirror 1: d2
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 6292242 blocks
d1: Submirror of d0
State: Needs maintenance
Invoke: metareplace d0 c1t0d0s0 <new device>
Size: 6292242 blocks
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s0 0 No Maintenance Yes
d2: Submirror of d0
State: Okay
Size: 6292242 blocks
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 0 No Okay Yes
Device Relocation Information:
Device Reloc Device ID
c1t0d0 Yes id1,ssd@w20000004cf9666db
c1t1d0 Yes id1,ssd@w20000004cf966034
--------------------------
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.