Solved

How do I solve storage disk problems on Solaris 9

Posted on 2008-06-13
7
598 Views
Last Modified: 2013-12-21
Hy guys.
I have two 490 sun servers with solaris 9 connected to a storage through a brocade switch 48k.
When I run the 'dmesg' command I get this error several times in both machines:
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g6000b5d0006a0000006a026500280000 (ssd47):
Jun 11 08:04:00 powrsol       Error for Command: read(10)                Error Level: Retryable
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Requested Block: 2633824                   Error Block: 2633824
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Vendor: FUJITSU                            Serial Number:   6A02650028
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Sense Key: Unit Attention
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       ASC: 0xf9 (<vendor unique code 0xf9>), ASCQ: 0xe0, FRU: 0x10
The disks are ok and the application is still up, but I'm worried about these errors.
The card type is emulex and we´re using solaris multi-path. I think the patches are ok and so are the card's firmware. Does anybody have any idea or has already been through a problem like this?
Thanks in advance for any help!
0
Comment
Question by:ricardorossi
  • 3
  • 3
7 Comments
 
LVL 40

Accepted Solution

by:
omarfarid earned 200 total points
ID: 21784273
are you sure that multi-pathing is configured and working fine? have you tried to disconnect one path and see if it keeps working? What san are you using?
0
 
LVL 62

Assisted Solution

by:gheist
gheist earned 300 total points
ID: 21787643
Retryable errors do no harm. Sector read was retired.

ASC and ASCQ with highest bit set are vendor extensions. Omiting highest bit of ASC/ASCQ does not make sense either.
Faulty unit 0x10 is undefined either - could be FC path

If you get some scsi.h from your vendor inside (debug version of) driver then you can decode them.

Let me suggest you to use "smartctl" from "smartmontools" to check phsical drive error counters. But do not worry errors might be FC path discoveries etc.
0
 

Author Comment

by:ricardorossi
ID: 21796573
Thanks Omarfarid and Gheist for the suggestions.
Yes I'm pretty sure the multipath is well configured and we've already done the test to see if it works.
Although I know that this kind of error or warning is not critical, you guys know that we never want to see these messages on our systems.
Fortunately I could get the ASC/ASCQ table from the manufacturer and it says the following:
ASC 'f9'  ASCQ 'e0'-> Execution of command was interrupted due to heavy load inside the System.
Theses messages are bothering us because there are a lot of them every day and only on the machines using SUN Card (emulex OEM) and Solaris Multipath. The others two boxes using Emulex Card and Fujitsu Multipath don't show the messages.  
As I cannot use Fujitsu Multipath with SUN Card (emulex OEM), I´ll try to upgrade the HBA's driver/firmware to SFS 4.4.13 and see the results. We also received one document asking us to set ssd_max_throttle=20  in the kernel configuration file (/etc/system) and to check two parameters for 'topology' and 'link speed' by editing the configuration file: '/kernel/drv/emlxs.conf'
Well I´ll try to make those changes and check the results.
I really appreciate your comments and later I'll let you know about the results.
0
Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

 
LVL 62

Assisted Solution

by:gheist
gheist earned 300 total points
ID: 21797551
Fault is outside solaris system.

Easiest is to keep all controller firmwares at exactly same level. Same with disk firmwares.
0
 

Author Comment

by:ricardorossi
ID: 21814914
Hello Gheist. Very relevant your information. Sometimes we have to follow some procedures in order to achieve the solution. But you're absolutely right. There is no connection between the fault and the hba's firmware. Thanks again.
I got all informations I had, coming from you guys and other sites, I put all together and while I was analysing all the environment, I realized that there was a problem being reported by the switch saying that one of its ports was marginal due to many comunications errors. That port was not the one connected directly to the Solaris box, however it's related to the trunk connection between two core's switches over this environment.
After I changed the cables the switch stopped to report problem as well the solaris box.
I appreciate to share this problem with you and I thank you so much for your help.
Now I'll case this close.  
0
 

Author Closing Comment

by:ricardorossi
ID: 31467133
Thanks guys for helping me to find out what was going on with my system.
0
 
LVL 62

Expert Comment

by:gheist
ID: 21815287
Controller is unit 0x01.
0

Featured Post

Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When we purchase storage, we typically are advertised storage of 500GB, 1TB, 2TB and so on. However, when you actually install it into your computer, your 500GB HDD will actually show up as 465GB. Why? It has to do with the way people and computers…
In this article we will learn how to backup a VMware farm using Nakivo Backup & Replication. In this tutorial we will install the software on a Windows 2012 R2 Server.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question