Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

How do I solve storage disk problems on Solaris 9

Posted on 2008-06-13
7
Medium Priority
?
608 Views
Last Modified: 2013-12-21
Hy guys.
I have two 490 sun servers with solaris 9 connected to a storage through a brocade switch 48k.
When I run the 'dmesg' command I get this error several times in both machines:
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g6000b5d0006a0000006a026500280000 (ssd47):
Jun 11 08:04:00 powrsol       Error for Command: read(10)                Error Level: Retryable
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Requested Block: 2633824                   Error Block: 2633824
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Vendor: FUJITSU                            Serial Number:   6A02650028
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Sense Key: Unit Attention
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       ASC: 0xf9 (<vendor unique code 0xf9>), ASCQ: 0xe0, FRU: 0x10
The disks are ok and the application is still up, but I'm worried about these errors.
The card type is emulex and we´re using solaris multi-path. I think the patches are ok and so are the card's firmware. Does anybody have any idea or has already been through a problem like this?
Thanks in advance for any help!
0
Comment
Question by:ricardorossi
  • 3
  • 3
7 Comments
 
LVL 40

Accepted Solution

by:
omarfarid earned 600 total points
ID: 21784273
are you sure that multi-pathing is configured and working fine? have you tried to disconnect one path and see if it keeps working? What san are you using?
0
 
LVL 62

Assisted Solution

by:gheist
gheist earned 900 total points
ID: 21787643
Retryable errors do no harm. Sector read was retired.

ASC and ASCQ with highest bit set are vendor extensions. Omiting highest bit of ASC/ASCQ does not make sense either.
Faulty unit 0x10 is undefined either - could be FC path

If you get some scsi.h from your vendor inside (debug version of) driver then you can decode them.

Let me suggest you to use "smartctl" from "smartmontools" to check phsical drive error counters. But do not worry errors might be FC path discoveries etc.
0
 

Author Comment

by:ricardorossi
ID: 21796573
Thanks Omarfarid and Gheist for the suggestions.
Yes I'm pretty sure the multipath is well configured and we've already done the test to see if it works.
Although I know that this kind of error or warning is not critical, you guys know that we never want to see these messages on our systems.
Fortunately I could get the ASC/ASCQ table from the manufacturer and it says the following:
ASC 'f9'  ASCQ 'e0'-> Execution of command was interrupted due to heavy load inside the System.
Theses messages are bothering us because there are a lot of them every day and only on the machines using SUN Card (emulex OEM) and Solaris Multipath. The others two boxes using Emulex Card and Fujitsu Multipath don't show the messages.  
As I cannot use Fujitsu Multipath with SUN Card (emulex OEM), I´ll try to upgrade the HBA's driver/firmware to SFS 4.4.13 and see the results. We also received one document asking us to set ssd_max_throttle=20  in the kernel configuration file (/etc/system) and to check two parameters for 'topology' and 'link speed' by editing the configuration file: '/kernel/drv/emlxs.conf'
Well I´ll try to make those changes and check the results.
I really appreciate your comments and later I'll let you know about the results.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 62

Assisted Solution

by:gheist
gheist earned 900 total points
ID: 21797551
Fault is outside solaris system.

Easiest is to keep all controller firmwares at exactly same level. Same with disk firmwares.
0
 

Author Comment

by:ricardorossi
ID: 21814914
Hello Gheist. Very relevant your information. Sometimes we have to follow some procedures in order to achieve the solution. But you're absolutely right. There is no connection between the fault and the hba's firmware. Thanks again.
I got all informations I had, coming from you guys and other sites, I put all together and while I was analysing all the environment, I realized that there was a problem being reported by the switch saying that one of its ports was marginal due to many comunications errors. That port was not the one connected directly to the Solaris box, however it's related to the trunk connection between two core's switches over this environment.
After I changed the cables the switch stopped to report problem as well the solaris box.
I appreciate to share this problem with you and I thank you so much for your help.
Now I'll case this close.  
0
 

Author Closing Comment

by:ricardorossi
ID: 31467133
Thanks guys for helping me to find out what was going on with my system.
0
 
LVL 62

Expert Comment

by:gheist
ID: 21815287
Controller is unit 0x01.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
The business world is becoming increasingly integrated with tech. It’s not just for a select few anymore — but what about if you have a small business? It may be easier than you think to integrate technology into your small business, and it’s likely…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.
Suggested Courses

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question