Solved

How do I solve storage disk problems on Solaris 9

Posted on 2008-06-13
7
604 Views
Last Modified: 2013-12-21
Hy guys.
I have two 490 sun servers with solaris 9 connected to a storage through a brocade switch 48k.
When I run the 'dmesg' command I get this error several times in both machines:
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g6000b5d0006a0000006a026500280000 (ssd47):
Jun 11 08:04:00 powrsol       Error for Command: read(10)                Error Level: Retryable
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Requested Block: 2633824                   Error Block: 2633824
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Vendor: FUJITSU                            Serial Number:   6A02650028
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       Sense Key: Unit Attention
Jun 11 08:04:00 powrsol       scsi: [ID 107833 kern.notice]       ASC: 0xf9 (<vendor unique code 0xf9>), ASCQ: 0xe0, FRU: 0x10
The disks are ok and the application is still up, but I'm worried about these errors.
The card type is emulex and we´re using solaris multi-path. I think the patches are ok and so are the card's firmware. Does anybody have any idea or has already been through a problem like this?
Thanks in advance for any help!
0
Comment
Question by:ricardorossi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
7 Comments
 
LVL 40

Accepted Solution

by:
omarfarid earned 200 total points
ID: 21784273
are you sure that multi-pathing is configured and working fine? have you tried to disconnect one path and see if it keeps working? What san are you using?
0
 
LVL 62

Assisted Solution

by:gheist
gheist earned 300 total points
ID: 21787643
Retryable errors do no harm. Sector read was retired.

ASC and ASCQ with highest bit set are vendor extensions. Omiting highest bit of ASC/ASCQ does not make sense either.
Faulty unit 0x10 is undefined either - could be FC path

If you get some scsi.h from your vendor inside (debug version of) driver then you can decode them.

Let me suggest you to use "smartctl" from "smartmontools" to check phsical drive error counters. But do not worry errors might be FC path discoveries etc.
0
 

Author Comment

by:ricardorossi
ID: 21796573
Thanks Omarfarid and Gheist for the suggestions.
Yes I'm pretty sure the multipath is well configured and we've already done the test to see if it works.
Although I know that this kind of error or warning is not critical, you guys know that we never want to see these messages on our systems.
Fortunately I could get the ASC/ASCQ table from the manufacturer and it says the following:
ASC 'f9'  ASCQ 'e0'-> Execution of command was interrupted due to heavy load inside the System.
Theses messages are bothering us because there are a lot of them every day and only on the machines using SUN Card (emulex OEM) and Solaris Multipath. The others two boxes using Emulex Card and Fujitsu Multipath don't show the messages.  
As I cannot use Fujitsu Multipath with SUN Card (emulex OEM), I´ll try to upgrade the HBA's driver/firmware to SFS 4.4.13 and see the results. We also received one document asking us to set ssd_max_throttle=20  in the kernel configuration file (/etc/system) and to check two parameters for 'topology' and 'link speed' by editing the configuration file: '/kernel/drv/emlxs.conf'
Well I´ll try to make those changes and check the results.
I really appreciate your comments and later I'll let you know about the results.
0
Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

 
LVL 62

Assisted Solution

by:gheist
gheist earned 300 total points
ID: 21797551
Fault is outside solaris system.

Easiest is to keep all controller firmwares at exactly same level. Same with disk firmwares.
0
 

Author Comment

by:ricardorossi
ID: 21814914
Hello Gheist. Very relevant your information. Sometimes we have to follow some procedures in order to achieve the solution. But you're absolutely right. There is no connection between the fault and the hba's firmware. Thanks again.
I got all informations I had, coming from you guys and other sites, I put all together and while I was analysing all the environment, I realized that there was a problem being reported by the switch saying that one of its ports was marginal due to many comunications errors. That port was not the one connected directly to the Solaris box, however it's related to the trunk connection between two core's switches over this environment.
After I changed the cables the switch stopped to report problem as well the solaris box.
I appreciate to share this problem with you and I thank you so much for your help.
Now I'll case this close.  
0
 

Author Closing Comment

by:ricardorossi
ID: 31467133
Thanks guys for helping me to find out what was going on with my system.
0
 
LVL 62

Expert Comment

by:gheist
ID: 21815287
Controller is unit 0x01.
0

Featured Post

Comparison of Amazon Drive, Google Drive, OneDrive

What is Best for Backup: Amazon Drive, Google Drive or MS OneDrive? In this free whitepaper we look at their performance, pricing, and platform availability to help you decide which cloud drive is right for your situation. Download and read the results of our testing for free!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How to update Firmware and Bios in Dell Equalogic PS6000 Arrays and Hard Disks firmware update.
The article will include the best Data Recovery Tools along with their Features, Capabilities, and their Download Links. Hope you’ll enjoy it and will choose the one as required by you.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…

710 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question