Link to home
Start Free TrialLog in
Avatar of arthurk123
arthurk123

asked on

Solaris Express pci-ide errors

Hello.

Put together an open solaris box, tested it for about a week, stable, so loaded it up with data. Six drive raid-z2, one drive OS. Up for a month, stable, no issues. I shut it down to upgrade some memory, and it wouldn't want to load back up properly with the new ram, so I took it out and left just the old ram in. Boot it up, half hour later, errors and forced to restart. Half a day later, I ssh in to check up on it, and it crashes the whole system right after I put the password in. Changed motherboard, 3 days later, errors again..

here is a capture...

Jun 13 15:57:42 media   timeout: early timeout, target=1 lun=0
Jun 13 15:57:42 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@1 (ata3):
Jun 13 15:57:42 media   timeout: early timeout, target=0 lun=0
Jun 13 15:57:42 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,5/ide@1 (ata5):
Jun 13 15:57:42 media   timeout: abort request, target=0 lun=0
Jun 13 15:57:42 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,5/ide@1 (ata5):
Jun 13 15:57:42 media   timeout: abort device, target=0 lun=0
Jun 13 15:57:42 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,5/ide@1 (ata5):
Jun 13 15:57:42 media   timeout: reset target, target=0 lun=0
Jun 13 15:57:42 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,5/ide@1 (ata5):
Jun 13 15:57:42 media   timeout: reset bus, target=0 lun=0
Jun 13 15:57:43 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0 (ata2):
Jun 13 15:57:43 media   timeout: abort request, target=0 lun=0
Jun 13 15:57:43 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0 (ata2):
Jun 13 15:57:43 media   timeout: abort device, target=0 lun=0
Jun 13 15:57:43 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0 (ata2):
Jun 13 15:57:43 media   timeout: reset target, target=0 lun=0
Jun 13 15:57:43 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0 (ata2):
Jun 13 15:57:43 media   timeout: reset bus, target=0 lun=0
Jun 13 15:57:43 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0 (ata2):
Jun 13 15:57:43 media   timeout: early timeout, target=1 lun=0
Jun 13 15:57:43 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,5/ide@0 (ata4):
Jun 13 15:57:43 media   timeout: abort request, target=0 lun=0
Jun 13 15:57:43 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,5/ide@0 (ata4):
Jun 13 15:57:43 media   timeout: abort device, target=0 lun=0
Jun 13 15:57:43 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,5/ide@0 (ata4):
Jun 13 15:57:43 media   timeout: reset target, target=0 lun=0
Jun 13 15:57:43 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,5/ide@0 (ata4):
Jun 13 15:57:43 media   timeout: reset bus, target=0 lun=0
Jun 13 15:57:43 media gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@1/cmdk@0,0 (Disk5):
Jun 13 15:57:43 media   Error for command 'read sector' Error Level: Informational
Jun 13 15:57:43 media gda: [ID 107833 kern.notice]      Sense Key: aborted command
Jun 13 15:57:43 media gda: [ID 107833 kern.notice]      Vendor 'Gen-ATA ' error code: 0x3
Jun 13 15:57:43 media gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@1/cmdk@1,0 (Disk4):
Jun 13 15:57:43 media   Error for command 'read sector' Error Level: Informational
Jun 13 15:57:43 media gda: [ID 107833 kern.notice]      Sense Key: aborted command
Jun 13 15:57:43 media gda: [ID 107833 kern.notice]      Vendor 'Gen-ATA ' error code: 0x3
Jun 13 15:57:43 media gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@1/cmdk@0,0 (Disk5):
Jun 13 15:57:43 media   Error for command 'read sector' Error Level: Informational
Jun 13 15:57:43 media gda: [ID 107833 kern.notice]      Sense Key: aborted command
Jun 13 15:57:43 media gda: [ID 107833 kern.notice]      Vendor 'Gen-ATA ' error code: 0x3
Jun 13 15:57:43 media gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,5/ide@1/cmdk@0,0 (Disk1):
Jun 13 15:57:43 media   Error for command 'read sector' Error Level: Informational
Jun 13 15:57:43 media gda: [ID 107833 kern.notice]      Sense Key: aborted command
Jun 13 15:57:43 media gda: [ID 107833 kern.notice]      Vendor 'Gen-ATA ' error code: 0x3
Jun 13 15:57:43 media gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0/cmdk@0,0 (Disk2):
Jun 13 15:57:43 media   Error for command 'read sector' Error Level: Informational
Jun 13 15:57:43 media gda: [ID 107833 kern.notice]      Sense Key: aborted command
Jun 13 15:57:43 media gda: [ID 107833 kern.notice]      Vendor 'Gen-ATA ' error code: 0x3
Jun 13 15:57:43 media gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0/cmdk@1,0 (Disk3):
Jun 13 15:57:43 media   Error for command 'read sector' Error Level: Informational
Jun 13 15:57:43 media gda: [ID 107833 kern.notice]      Sense Key: aborted command
Jun 13 15:57:43 media gda: [ID 107833 kern.notice]      Vendor 'Gen-ATA ' error code: 0x3
Jun 13 15:57:43 media gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,5/ide@0/cmdk@0,0 (Disk0):
Jun 13 15:57:43 media   Error for command 'read sector' Error Level: Informational
Jun 13 15:57:43 media gda: [ID 107833 kern.notice]      Sense Key: aborted command
Jun 13 15:57:43 media gda: [ID 107833 kern.notice]      Vendor 'Gen-ATA ' error code: 0x3
Jun 13 15:57:58 media fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
Jun 13 15:57:58 media EVENT-TIME: Mon Jun 13 15:57:58 PDT 2011
Jun 13 15:57:58 media PLATFORM: i86pc, CSN: -, HOSTNAME: media
Jun 13 15:57:58 media SOURCE: zfs-diagnosis, REV: 1.0
Jun 13 15:57:58 media EVENT-ID: c6ff929e-f0a8-490d-e1c4-9afe1e0f7ffa
Jun 13 15:57:58 media DESC: The number of I/O errors associated with a ZFS device exceeded
Jun 13 15:57:58 media        acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD for more information.
Jun 13 15:57:58 media AUTO-RESPONSE: The device has been offlined and marked as faulted.  An attempt
Jun 13 15:57:58 media        will be made to activate a hot spare if available.
Jun 13 15:57:58 media IMPACT: Fault tolerance of the pool may be compromised.
Jun 13 15:57:58 media REC-ACTION: Run 'zpool status -x' and replace the bad device.
Jun 13 15:57:59 media fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
Jun 13 15:57:59 media EVENT-TIME: Mon Jun 13 15:57:58 PDT 2011
Jun 13 15:57:59 media PLATFORM: i86pc, CSN: -, HOSTNAME: media
Jun 13 15:57:59 media SOURCE: zfs-diagnosis, REV: 1.0
Jun 13 15:57:59 media EVENT-ID: 1caa3b1c-ba3a-e147-f3a8-cc52cda6fd41
Jun 13 15:57:59 media DESC: The number of I/O errors associated with a ZFS device exceeded
Jun 13 15:57:59 media        acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD for more information.
Jun 13 15:57:59 media AUTO-RESPONSE: The device has been offlined and marked as faulted.  An attempt
Jun 13 15:57:59 media        will be made to activate a hot spare if available.
Jun 13 15:57:59 media IMPACT: Fault tolerance of the pool may be compromised.
Jun 13 15:57:59 media REC-ACTION: Run 'zpool status -x' and replace the bad device.
Jun 13 15:57:59 media fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
Jun 13 15:57:59 media EVENT-TIME: Mon Jun 13 15:57:59 PDT 2011
Jun 13 15:57:59 media PLATFORM: i86pc, CSN: -, HOSTNAME: media
Jun 13 15:57:59 media SOURCE: zfs-diagnosis, REV: 1.0
Jun 13 15:57:59 media EVENT-ID: b560954e-1f9e-c000-8423-aa71382ee369
Jun 13 15:57:59 media DESC: The number of I/O errors associated with a ZFS device exceeded
Jun 13 15:57:59 media        acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD for more information.
Jun 13 15:57:59 media AUTO-RESPONSE: The device has been offlined and marked as faulted.  An attempt
Jun 13 15:57:59 media        will be made to activate a hot spare if available.
Jun 13 15:57:59 media IMPACT: Fault tolerance of the pool may be compromised.
Jun 13 15:57:59 media REC-ACTION: Run 'zpool status -x' and replace the bad device.
Jun 13 15:57:59 media fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
Jun 13 15:57:59 media EVENT-TIME: Mon Jun 13 15:57:59 PDT 2011
Jun 13 15:57:59 media PLATFORM: i86pc, CSN: -, HOSTNAME: media
Jun 13 15:57:59 media SOURCE: zfs-diagnosis, REV: 1.0
Jun 13 15:57:59 media EVENT-ID: dc447f4c-f5af-6dd5-ae8e-8036a1c1cc53
Jun 13 15:57:59 media DESC: The number of I/O errors associated with a ZFS device exceeded
Jun 13 15:57:59 media        acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD for more information.
Jun 13 15:57:59 media AUTO-RESPONSE: The device has been offlined and marked as faulted.  An attempt
Jun 13 15:57:59 media        will be made to activate a hot spare if available.
Jun 13 15:57:59 media IMPACT: Fault tolerance of the pool may be compromised.
Jun 13 15:57:59 media REC-ACTION: Run 'zpool status -x' and replace the bad device.
Jun 13 15:57:59 media fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
Jun 13 15:57:59 media EVENT-TIME: Mon Jun 13 15:57:59 PDT 2011
Jun 13 15:57:59 media PLATFORM: i86pc, CSN: -, HOSTNAME: media
Jun 13 15:57:59 media SOURCE: zfs-diagnosis, REV: 1.0
Jun 13 15:57:59 media EVENT-ID: af15a1b5-b2b4-64a2-a2fd-af9c2cb71a10
Jun 13 15:57:59 media DESC: The number of I/O errors associated with a ZFS device exceeded
Jun 13 15:57:59 media        acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD for more information.
Jun 13 15:57:59 media AUTO-RESPONSE: The device has been offlined and marked as faulted.  An attempt
Jun 13 15:57:59 media        will be made to activate a hot spare if available.
Jun 13 15:57:59 media IMPACT: Fault tolerance of the pool may be compromised.
Jun 13 15:57:59 media REC-ACTION: Run 'zpool status -x' and replace the bad device.
Jun 13 15:57:59 media fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
Jun 13 15:57:59 media EVENT-TIME: Mon Jun 13 15:57:59 PDT 2011
Jun 13 15:57:59 media PLATFORM: i86pc, CSN: -, HOSTNAME: media
Jun 13 15:57:59 media SOURCE: zfs-diagnosis, REV: 1.0
Jun 13 15:57:59 media EVENT-ID: de155125-4334-4106-e848-f7f86a96e03e
Jun 13 15:57:59 media DESC: The number of I/O errors associated with a ZFS device exceeded
Jun 13 15:57:59 media        acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD for more information.
Jun 13 15:57:59 media AUTO-RESPONSE: The device has been offlined and marked as faulted.  An attempt
Jun 13 15:57:59 media        will be made to activate a hot spare if available.
Jun 13 15:57:59 media IMPACT: Fault tolerance of the pool may be compromised.
Jun 13 15:57:59 media REC-ACTION: Run 'zpool status -x' and replace the bad device.
Jun 13 15:58:18 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@1 (ata3):
Jun 13 15:58:18 media   timeout: abort request, target=0 lun=0
Jun 13 15:58:18 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@1 (ata3):
Jun 13 15:58:18 media   timeout: abort device, target=0 lun=0
Jun 13 15:58:18 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@1 (ata3):
Jun 13 15:58:18 media   timeout: reset target, target=0 lun=0
Jun 13 15:58:18 media scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@1 (ata3):
Jun 13 15:58:18 media   timeout: reset bus, target=0 lun=0
Jun 13 15:58:19 media gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@1/cmdk@0,0 (Disk5):
Jun 13 15:58:19 media   Error for command 'read sector' Error Level: Informational
Jun 13 15:58:19 media gda: [ID 107833 kern.notice]      Sense Key: aborted command
Jun 13 15:58:19 media gda: [ID 107833 kern.notice]      Vendor 'Gen-ATA ' error code: 0x3




What the heck is going on with this? Hardware specs are DH55TC with an i3, 4 gb ram. Six drive raid-z2 is on the motherboard sata controllers, operating system is on a 80 gig sata drive (single) on a solaris-compatible sata controller.

I can't say its due to hardware compatibility because its been up for over a month rock stable... I thought maybe its the operating system drive, but the errors are pointing to the other hard drives?

*If I restart the system, it comes back 100%, no errors, raid-z2 functioning with no issues, so I don't want to say its bad hard drives..*

Please help out. Thank you.
Avatar of dfke
dfke

Errors pointing to the other drives probably indicate a software problem.

A similar issue is described here: http://download.oracle.com/docs/cd/E19082-01/820-0543/fqnkp/index.html

Maybe this gets you on the right track.
ASKER CERTIFIED SOLUTION
Avatar of arthurk123
arthurk123

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of arthurk123

ASKER

Found answer by searching