Every so often I would see something like this in /var/adm/messages on a Sun Blade 2000 with Solaris 10:
==========================
==========
==========
======
May 15 08:20:20 moose scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/SUNW,qlc@4/f
p@0,0/ssd@
w210000203
7eb570a,0 (ssd1):May 15 08:20:20 moose Error for Command: write(10) Error Level: Retryable
May 15 08:20:20 moose scsi: [ID 107833 kern.notice] Requested Block: 7338112 Error Block: 7338112
May 15 08:20:20 moose scsi: [ID 107833 kern.notice] Vendor: SEAGATE Serial Number: 0308B0V3M4
May 15 08:20:20 moose scsi: [ID 107833 kern.notice] Sense Key: Hardware Error
May 15 08:20:20 moose scsi: [ID 107833 kern.notice] ASC: 0x32 (no defect spare location available), ASCQ: 0x0, FRU: 0x4
==========================
==========
==========
======
So I have been assuming that I have a disk drive that might be in the early stages of failure and for which I've been mentally preparing myself to deal with one of these days. Then today in /var/adm/messages, I saw the following each time I booted the computer:
==========================
==========
==========
======
May 26 17:54:25 moose fmd: [ID 441519 daemon.error] SUNW-MSG-ID: PCIEX-8000-5Y, TYPE: Fault, VER: 1, SEVERITY: Critical
May 26 17:54:25 moose EVENT-TIME: Mon May 26 08:05:03 PDT 2008
May 26 17:54:25 moose PLATFORM: SUNW,Sun-Blade-1000, CSN: -, HOSTNAME: moose
May 26 17:54:25 moose SOURCE: eft, REV: 1.16
May 26 17:54:25 moose EVENT-ID: bc6ebca9-ebb7-e000-a4f1-e4
29b499f944
May 26 17:54:25 moose DESC: The transmitting device sent an invalid request.
May 26 17:54:25 moose Refer to
http://sun.com/msg/PCIEX-8000-5Y for more information.
May 26 17:54:25 moose AUTO-RESPONSE: One or more device instances may be disabled
May 26 17:54:25 moose IMPACT: Loss of services provided by the device instances associated with this fault
May 26 17:54:25 moose REC-ACTION: Ensure that the latest drivers and patches are installed. Otherwise schedule a repair procedure to replace the affected device(s). Use fmdump -v -u <EVENT_ID> to identify the devices or contact Sun for support.
==========================
==========
==========
======
So I ran fmdump and got the following:
==========================
==========
==========
======
May 26 17:54:25.3813 bc6ebca9-ebb7-e000-a4f1-e4
29b499f944
PCIEX-8000-5Y
100% fault.io.pci.device-invreq
Problem in: hc://:product-id=SUNW,Sun-
Blade-1000
:server-id
=moose/mot
herboard=0
/hostbridg
e=0/pcibus
=0/pcidev=
4/pcifn=0 Affects: dev:////pci@8,600000/SUNW,
qlc@4
FRU: hc:///component=MB
Location: MB
==========================
==========
==========
======
So what's this telling me? Could both sets of messages relate to the same issue. Is it likely the disk drive? Or might it be something more serious? Or is it benign?