SCSI Bus Reset - Disk or SCSI adapter issue

One of our Solaris servers stopped responding this morning.  The application that runs on the server is vendor supported.  The vendor found no errors reported by iostat or prtdiag.  No amber light was showing on the server either.  However, two disks were disconnected.  They shutdown and turned the server backup on and were able to boot up.  

The only errors reported on the server are the one found in the /var/adm/messages which I have attached a copy of.

Although I understand that multiple hard disks may fail, I am not sure the issue are the HDs, SCSI controller or motherboard.  I was hoping someone could tell by perhaps looking at the attached log.  Please let me know if there are other commands that can be run that might give you a better idea of the problem.
messages.0.txt
cartereverettAsked:
Who is Participating?
 
Joseph GanSystem AdminCommented:
The system had lots of errors:

Dec 18 02:38:42 VRCdata.braishfield.local scsi: [ID 107833 kern.notice]       Requested Block: 114103696                 Error Block: 114103696
Dec 18 02:38:42 VRCdata.braishfield.local scsi: [ID 107833 kern.notice]       Vendor: FUJITSU                            Serial Number: 0745B0PAJU  

I asume this was a Fujitsu internal disk or disks, which has OS installed on it.

If you could show output of "iostat -En" here?
0
 
DavidPresidentCommented:
Check cabling & termination. There are no other entries in the log that reveal other issues.
Now you can spend some money and buy some diagnostic software that will get to the bottom of things, but it probably isn't worth the money.
0
 
Gerald ConnollyCommented:
As David said check the cabling and termination.
NB. SCSI is a bus and requires termination at both ends of the bus.
No termination or multiple terminations per end will cause problems
0
Network Scalability - Handle Complex Environments

Monitor your entire network from a single platform. Free 30 Day Trial Now!

 
gheistCommented:
Would be nice if you provide reasonable system information e.g. at least if disks are builtin and if your server is a pc or sparc....

ASC 02 -> no seek complete... i.e scsi device did not do anything on command...
Given failing command is "write" you most likely lose 4KB every couple of minutes...

Fos system info send in prtconf (-v)

What do you mean by "vendor" - was it oracle saying continuous disk errors involving data loss is ok for them to leave?
0
 
DavidPresidentCommented:
Gheist - You are misreading this.
it is ASC=29h, ASCQ=02h, not ASC=02h.  This is defined as a SCSI bus reset per the ANSI spec.

A no seek complete would be ASC=02h, ASCQ=06h  (Which can't happen on a WRITE10 CDB anyway).
P.S. I write SCSI diagnostic code professionally.
0
 
cartereverettAuthor Commented:
The issue was one of the hard disks in the data mirror.  Replaced the drive, resynced and everything is back to normal.
0
 
Joseph GanSystem AdminCommented:
Yes, that's it!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.