Troubleshooting SCSI Error EventID 9.
Posted on 2004-09-10
I jsut started looking into a backup server that was periodically failing. I found an eventid 9 "The device, \Device\Scsi\adpu160m2, did not respond within the timeout period." and then backup exec fails the job with a TFLE_PROGRAMMER_ERROR1" When I look into the statistics, I notice that a hard write has occured (unrecoverable write error.) This is just on one tape drive, and occurs only when the drive is transferring near its theoretical maximum for some time. When the same job is run with the same tapes on another tape drive, it is fine. I am trying to troubleshoot it and here are some ideas I have come up with so far. Please add any suggestions.
Using IBM LTO autoloader with 2 Ultrium LTO-1 Tape drives (3580)
There is an adaptec 29160 SCSI controller controlling these tape drives and changer.
There is a cable running about 12 feet total, and all these three devices (changer, two tape drives) are chained on one SCSI channel
The SCSI card is seperate from any other SCSI component in the system (hard drives)
It SHOULD be connected at Ultra2/Wide SCSI (40 feet maximum run or so)
I have run local drive tests for some time now and cannot replicate or find an error (local tape drive fast read/write, wrap test, head resistance test, etc.)
I have noticed that the cable running to the drives is surrounded by AC power cables that are running parallel to it....
Ideas to possibly fix...
Reroute/shorten SCSI cable
update firmware/drivers on tape drives and SCSI controller
swap SCSI cables
change terminator/check to see if the ultrium dive is supplying power to it
change scsi id's with changer being the lowest and tapes being next higher two