Solved

How do I solve storage disk problems on Solaris 9

Posted on 2008-06-13
7
605 Views
Last Modified: 2013-12-21
Hy guys.
I have two 490 sun servers with solaris 9 connected to a storage through a brocade switch 48k.
When I run the 'dmesg' command I get this error several times in both machines:
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g6000b5d0006a0000006a026500280000 (ssd47):
Jun 11 08:04:00 powrsol       Error for Command: read(10)                Error Level: Retryable
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Requested Block: 2633824                   Error Block: 2633824
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Vendor: FUJITSU                            Serial Number:   6A02650028
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Sense Key: Unit Attention
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       ASC: 0xf9 (<vendor unique code 0xf9>), ASCQ: 0xe0, FRU: 0x10
The disks are ok and the application is still up, but I'm worried about these errors.
The card type is emulex and we´re using solaris multi-path. I think the patches are ok and so are the card's firmware. Does anybody have any idea or has already been through a problem like this?
Thanks in advance for any help!
0
Comment
Question by:ricardorossi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
7 Comments
 
LVL 40

Accepted Solution

by:
omarfarid earned 200 total points
ID: 21784273
are you sure that multi-pathing is configured and working fine? have you tried to disconnect one path and see if it keeps working? What san are you using?
0
 
LVL 62

Assisted Solution

by:gheist
gheist earned 300 total points
ID: 21787643
Retryable errors do no harm. Sector read was retired.

ASC and ASCQ with highest bit set are vendor extensions. Omiting highest bit of ASC/ASCQ does not make sense either.
Faulty unit 0x10 is undefined either - could be FC path

If you get some scsi.h from your vendor inside (debug version of) driver then you can decode them.

Let me suggest you to use "smartctl" from "smartmontools" to check phsical drive error counters. But do not worry errors might be FC path discoveries etc.
0
 

Author Comment

by:ricardorossi
ID: 21796573
Thanks Omarfarid and Gheist for the suggestions.
Yes I'm pretty sure the multipath is well configured and we've already done the test to see if it works.
Although I know that this kind of error or warning is not critical, you guys know that we never want to see these messages on our systems.
Fortunately I could get the ASC/ASCQ table from the manufacturer and it says the following:
ASC 'f9'  ASCQ 'e0'-> Execution of command was interrupted due to heavy load inside the System.
Theses messages are bothering us because there are a lot of them every day and only on the machines using SUN Card (emulex OEM) and Solaris Multipath. The others two boxes using Emulex Card and Fujitsu Multipath don't show the messages.  
As I cannot use Fujitsu Multipath with SUN Card (emulex OEM), I´ll try to upgrade the HBA's driver/firmware to SFS 4.4.13 and see the results. We also received one document asking us to set ssd_max_throttle=20  in the kernel configuration file (/etc/system) and to check two parameters for 'topology' and 'link speed' by editing the configuration file: '/kernel/drv/emlxs.conf'
Well I´ll try to make those changes and check the results.
I really appreciate your comments and later I'll let you know about the results.
0
Migrating Your Company's PCs

To keep pace with competitors, businesses must keep employees productive, and that means providing them with the latest technology. This document provides the tips and tricks you need to help you migrate an outdated PC fleet to new desktops, laptops, and tablets.

 
LVL 62

Assisted Solution

by:gheist
gheist earned 300 total points
ID: 21797551
Fault is outside solaris system.

Easiest is to keep all controller firmwares at exactly same level. Same with disk firmwares.
0
 

Author Comment

by:ricardorossi
ID: 21814914
Hello Gheist. Very relevant your information. Sometimes we have to follow some procedures in order to achieve the solution. But you're absolutely right. There is no connection between the fault and the hba's firmware. Thanks again.
I got all informations I had, coming from you guys and other sites, I put all together and while I was analysing all the environment, I realized that there was a problem being reported by the switch saying that one of its ports was marginal due to many comunications errors. That port was not the one connected directly to the Solaris box, however it's related to the trunk connection between two core's switches over this environment.
After I changed the cables the switch stopped to report problem as well the solaris box.
I appreciate to share this problem with you and I thank you so much for your help.
Now I'll case this close.  
0
 

Author Closing Comment

by:ricardorossi
ID: 31467133
Thanks guys for helping me to find out what was going on with my system.
0
 
LVL 62

Expert Comment

by:gheist
ID: 21815287
Controller is unit 0x01.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I previously wrote an article addressing the use of UBCD4WIN and SARDU. All are great, but I have always been an advocate of SARDU. Recently it was suggested that I go back and take a look at Easy2Boot in comparison.
Is your phone running out of space to hold pictures?  This article will show you quick tips on how to solve this problem.
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…

628 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question