Solved

How do I solve storage disk problems on Solaris 9

Posted on 2008-06-13
7
591 Views
Last Modified: 2013-12-21
Hy guys.
I have two 490 sun servers with solaris 9 connected to a storage through a brocade switch 48k.
When I run the 'dmesg' command I get this error several times in both machines:
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g6000b5d0006a0000006a026500280000 (ssd47):
Jun 11 08:04:00 powrsol       Error for Command: read(10)                Error Level: Retryable
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Requested Block: 2633824                   Error Block: 2633824
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Vendor: FUJITSU                            Serial Number:   6A02650028
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Sense Key: Unit Attention
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       ASC: 0xf9 (<vendor unique code 0xf9>), ASCQ: 0xe0, FRU: 0x10
The disks are ok and the application is still up, but I'm worried about these errors.
The card type is emulex and we´re using solaris multi-path. I think the patches are ok and so are the card's firmware. Does anybody have any idea or has already been through a problem like this?
Thanks in advance for any help!
0
Comment
Question by:ricardorossi
  • 3
  • 3
7 Comments
 
LVL 40

Accepted Solution

by:
omarfarid earned 200 total points
ID: 21784273
are you sure that multi-pathing is configured and working fine? have you tried to disconnect one path and see if it keeps working? What san are you using?
0
 
LVL 61

Assisted Solution

by:gheist
gheist earned 300 total points
ID: 21787643
Retryable errors do no harm. Sector read was retired.

ASC and ASCQ with highest bit set are vendor extensions. Omiting highest bit of ASC/ASCQ does not make sense either.
Faulty unit 0x10 is undefined either - could be FC path

If you get some scsi.h from your vendor inside (debug version of) driver then you can decode them.

Let me suggest you to use "smartctl" from "smartmontools" to check phsical drive error counters. But do not worry errors might be FC path discoveries etc.
0
 

Author Comment

by:ricardorossi
ID: 21796573
Thanks Omarfarid and Gheist for the suggestions.
Yes I'm pretty sure the multipath is well configured and we've already done the test to see if it works.
Although I know that this kind of error or warning is not critical, you guys know that we never want to see these messages on our systems.
Fortunately I could get the ASC/ASCQ table from the manufacturer and it says the following:
ASC 'f9'  ASCQ 'e0'-> Execution of command was interrupted due to heavy load inside the System.
Theses messages are bothering us because there are a lot of them every day and only on the machines using SUN Card (emulex OEM) and Solaris Multipath. The others two boxes using Emulex Card and Fujitsu Multipath don't show the messages.  
As I cannot use Fujitsu Multipath with SUN Card (emulex OEM), I´ll try to upgrade the HBA's driver/firmware to SFS 4.4.13 and see the results. We also received one document asking us to set ssd_max_throttle=20  in the kernel configuration file (/etc/system) and to check two parameters for 'topology' and 'link speed' by editing the configuration file: '/kernel/drv/emlxs.conf'
Well I´ll try to make those changes and check the results.
I really appreciate your comments and later I'll let you know about the results.
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 61

Assisted Solution

by:gheist
gheist earned 300 total points
ID: 21797551
Fault is outside solaris system.

Easiest is to keep all controller firmwares at exactly same level. Same with disk firmwares.
0
 

Author Comment

by:ricardorossi
ID: 21814914
Hello Gheist. Very relevant your information. Sometimes we have to follow some procedures in order to achieve the solution. But you're absolutely right. There is no connection between the fault and the hba's firmware. Thanks again.
I got all informations I had, coming from you guys and other sites, I put all together and while I was analysing all the environment, I realized that there was a problem being reported by the switch saying that one of its ports was marginal due to many comunications errors. That port was not the one connected directly to the Solaris box, however it's related to the trunk connection between two core's switches over this environment.
After I changed the cables the switch stopped to report problem as well the solaris box.
I appreciate to share this problem with you and I thank you so much for your help.
Now I'll case this close.  
0
 

Author Closing Comment

by:ricardorossi
ID: 31467133
Thanks guys for helping me to find out what was going on with my system.
0
 
LVL 61

Expert Comment

by:gheist
ID: 21815287
Controller is unit 0x01.
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Solid State Drive Performance Tips: Solid state storage technology is now a standard.  After testing and using several different brands and revisions of SSD's over the years I have put together a collection of tips,tools and suggestions that I ha…
I previously wrote an article addressing the use of UBCD4WIN and SARDU. All are great, but I have always been an advocate of SARDU. Recently it was suggested that I go back and take a look at Easy2Boot in comparison.
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now