[x]
Posted via EE Mobile

Search, ask, and monitor your questions on the go with EE Mobile. Visit Experts Exchange from your mobile device and never be out of touch again.

Question
[x]
Attachment Details
[x]
The Solution Rating System

With so many solutions, how can you tell which solutions are most likely to help you and which ones are not? To provide you with a tool to use, we rate our solutions based on various elements that most accurately determine if a solution is a quality solution. To explain what factors affect the solution rating, here are the elements we take into consideration when formulating our solution rating.

  • The Grade of the Solution
  • The Zone Rank of the Expert Providing the Solution
  • The Number of Author and Expert Comments
  • The Number of Experts Contributing
  • The Feedback of the Community

Your Input Matters
Because of the way the system is set up, the most important variable in this equation is you. As a member of Experts Exchange, you are able to cast your vote on the quality of the solutions in regard to how complete, accurate, helpful and easy to understand each solution is. When you provide your feedback, each rating is adjusted accordingly. So, if you see a solution that has a poor rating that you think is a good solution, let us know by rating it. As you do, the rating will be adjusted and will become more accurate for other members of our site.

If you have any suggestions that you would like to make for our rating system, please ask a question in the Suggestions Zone of Community Support.

Thank you!

9.2

Solaris "Critical" Hardware Errors

Asked by w6hr in Sun Microsystems Desktop Computers, Sun Solaris

Tags: Sun Microsystems, Sun Blade 2000, 2000, Running Solaris 10 (05/08) and patched up to date

Every so often I would see something like this in /var/adm/messages on a Sun Blade 2000 with Solaris 10:
====================================================
May 15 08:20:20 moose scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100002037eb570a,0 (ssd1):May 15 08:20:20 moose   Error for Command: write(10)               Error Level: Retryable
May 15 08:20:20 moose scsi: [ID 107833 kern.notice]     Requested Block: 7338112                   Error Block: 7338112
May 15 08:20:20 moose scsi: [ID 107833 kern.notice]     Vendor: SEAGATE                            Serial Number: 0308B0V3M4

May 15 08:20:20 moose scsi: [ID 107833 kern.notice]     Sense Key: Hardware Error
May 15 08:20:20 moose scsi: [ID 107833 kern.notice]     ASC: 0x32 (no defect spare location available), ASCQ: 0x0, FRU: 0x4
====================================================

So I have been assuming that I have a disk drive that might be in the early stages of failure and for which I've been mentally preparing myself to deal with one of these days.  Then today in /var/adm/messages, I saw the following each time I booted the computer:

====================================================
May 26 17:54:25 moose fmd: [ID 441519 daemon.error] SUNW-MSG-ID: PCIEX-8000-5Y, TYPE: Fault, VER: 1, SEVERITY: Critical
May 26 17:54:25 moose EVENT-TIME: Mon May 26 08:05:03 PDT 2008
May 26 17:54:25 moose PLATFORM: SUNW,Sun-Blade-1000, CSN: -, HOSTNAME: moose
May 26 17:54:25 moose SOURCE: eft, REV: 1.16
May 26 17:54:25 moose EVENT-ID: bc6ebca9-ebb7-e000-a4f1-e429b499f944
May 26 17:54:25 moose DESC: The transmitting device sent an invalid request.
May 26 17:54:25 moose   Refer to http://sun.com/msg/PCIEX-8000-5Y for more information.
May 26 17:54:25 moose AUTO-RESPONSE: One or more device instances may be disabled
May 26 17:54:25 moose IMPACT: Loss of services provided by the device instances associated with this fault
May 26 17:54:25 moose REC-ACTION: Ensure that the latest drivers and patches are installed. Otherwise schedule a repair procedure to replace the affected device(s).  Use fmdump -v -u <EVENT_ID> to identify the devices or contact Sun for support.
====================================================

So I ran fmdump and got the following:

====================================================
May 26 17:54:25.3813 bc6ebca9-ebb7-e000-a4f1-e429b499f944 PCIEX-8000-5Y
 100%  fault.io.pci.device-invreq

       Problem in: hc://:product-id=SUNW,Sun-Blade-1000:server-id=moose/motherboard=0/hostbridge=0/pcibus=0/pcidev=4/pcifn=0           Affects: dev:////pci@8,600000/SUNW,qlc@4
              FRU: hc:///component=MB
         Location: MB
====================================================

So what's this telling me?  Could both sets of messages relate to the same issue.  Is it likely the disk drive?  Or might it be something more serious?  Or is it benign?
[+][-]05/27/08 12:35 AM, ID: 21650029Expert Comment

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]06/02/08 07:52 PM, ID: 21697273Author Comment

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]06/03/08 12:18 AM, ID: 21698181Accepted Solution

View this solution now by starting your 30-day free trial. Setting up your free trial is quick, easy, and secure. We will return you to this solution, unlocked, when you're done.

About this solution

Zones: Sun Microsystems Desktop Computers, Sun Solaris
Tags: Sun Microsystems, Sun Blade 2000, 2000, Running Solaris 10 (05/08) and patched up to date
Sign Up Now!
Solution Provided By: robocat
Participating Experts: 1
Solution Grade: A
 
 
Loading Advertisement...
20091111-EE-VQP-89 / EE_QW_2_20070628