[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3570
  • Last Modified:

Troubleshooting SCSI Error EventID 9.

I jsut started looking into a backup server that was periodically failing.  I found an eventid 9 "The device, \Device\Scsi\adpu160m2, did not respond within the timeout period." and then backup exec fails the job with a TFLE_PROGRAMMER_ERROR1"  When I look into the statistics, I notice that a hard write has occured (unrecoverable write error.)  This is just on one tape drive, and occurs only when the drive is transferring near its theoretical maximum for some time.  When the same job is run with the same tapes on another tape drive, it is fine.  I am trying to troubleshoot it and here are some ideas I have come up with so far.  Please add any suggestions.

Details.

Using IBM LTO autoloader with 2 Ultrium LTO-1 Tape drives (3580)

There is an adaptec 29160 SCSI controller controlling these tape drives and changer.

There is a cable running about 12 feet total, and all these three devices (changer, two tape drives) are chained on one SCSI channel

The SCSI card is seperate from any other SCSI component in the system (hard drives)
It SHOULD be connected at Ultra2/Wide SCSI (40 feet maximum run or so)

I have run local drive tests for some time now and cannot replicate or find an error (local tape drive fast read/write, wrap test, head resistance test, etc.)

I have noticed that the cable running to the drives is surrounded by AC power cables that are running parallel to it....

Ideas to possibly fix...

Reroute/shorten SCSI cable
update firmware/drivers on tape drives and SCSI controller
swap SCSI cables
change terminator/check to see if the ultrium dive is supplying power to it
change scsi id's with changer being the lowest and tapes being next higher two
replace drive...

Thanks!
0
cdesimone
Asked:
cdesimone
  • 5
  • 4
  • 3
  • +1
1 Solution
 
Cyber-DudeCommented:
Had that same issue, called IBM!!!
:)

If you have a Veritas Backup Exec, try to update its softwae drivers version. Also, the autoloader has its own drivers you can dwnload via IBM. Did you check you have the latest version?
The IBM 3850 tape page:
http://www-1.ibm.com/servers/storage/support/lto/3580.html

From what I can learn there was a firmware update not long ago;

Cyber
0
 
Cyber-DudeCommented:
PS
The links are not working but you should check the following as well;
http://www.mail-archive.com/adsm-l@vm.marist.edu/msg34428.html

Cyber
0
 
dovidmichelCommented:
Was a diagnostic dump done on the drive? That would provides the history of the drive and the failures should show up.
So after the drive was replaced if the new drive started having the same problem perhaps it is related to the drive position. Try swapping the position of the drive with one of the other drives and see if the problem follows the drive or the position.
By the way IBM drives do work better and more trouble free when IBM tapes are being used.
0
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

 
cdesimoneAuthor Commented:
I did not dump the history of the drive, and that would be the next step.  From what I remember, most if not all of the tapes are IBM.  Would running AC power cords with the SCSI cable be a problem?? I am guessing right now that it would be a local problem with the drive due to the problem not really being seen on another tape drive on the same scsi channel.  I will do this and post the results.  Thanks
0
 
dovidmichelCommented:
Power cords and other sources of RF interferience can cause problems but I have only seen them cause data corruption problem and not drive read/write or SCSI errors. Stuff like there is no sign of a problem but yet files on tape fail to compare back to what is on disk.
0
 
Cyber-DudeCommented:
Try to disable the SCSI cable the tape is attached to and verify you get the same resault, which I doubt it. It could be anywhere between the controller and the tape. dovidmichel is also right, try to clear communication path with any objects may increase data latency though, I know the cables and they are quit resistant to such obsticles.

Cyber
0
 
cdesimoneAuthor Commented:
I jsut ran a small amount of tests on the drive and found a "sequential positioning error" to follow the backup exec failures closely.  I will look into it further very shortly.  Cyber, are you saying that I should just bypass all the cables that go from the host to drive with new ones?  I can do this, but I am not sure if the tapeloader will work in this configuration (it has to send commands to the drive to control retension and loading-if I can remember)  Thanks!
0
 
Cyber-DudeCommented:
I just woke up from bad; so it may take me time to analyse what you just did.

I meant disabling the tape to see whether you get the same error on the backup software (the device did not respond). I guess you will get something like 'The Device was not found' but I just wanted to make sure of that...
I know it wont work on that config; hehehe...

Cyber
0
 
cdesimoneAuthor Commented:
NP, right now I am slamming the tape drive with any test I can find.  I am about to check the sequential transfer rate on the SCSI controller and turn it down if there is no limitation on the transfer speed to limit the data rate to 30 MB/Sec, hoping that it will be an easy fix.  Also, I found a firmware update that says it fixes "Reports invalid data to host on sequential positioning request" I am curious to see what the problem would be in this case, as none of our other LTO drives have any such problem, and the same firmware revision.  
0
 
dovidmichelCommented:
Problem with most tests is that they do not accurately reproduce the process that takes place when the application does the backup. The controller does an auto-negotiate on the transfer speed, and from what I have seen it usually does a poor job. So setting manually setting the transfer speed is a good idea.

Did you try swapping the position of this drive with another to see if the problem follows the drive or the position?

0
 
cdesimoneAuthor Commented:
I just tried to set the synchronous transfer rate dwn to 40MB/Sec on that one drive and also turn off enable disconnect/reconnect and it still did the exact same thing.  Right now, I am changing the SCSI id's of the tape drives hoping that this will fix something.  While I am at it, I will switch the two drives around.  Next, I will try to update the firmware (although I cannot see it doing anything due to the other drives not having the same problem)  and finally, update drivers.  

Cyber,
    What did IBM say/do to fix your problem?

Thanks
0
 
Cyber-DudeCommented:
IBM didnt ask much; they replaced the whole 3580 drive, SCSI adapter and cables set. They updated firware (via their internal FTP service) and they eventually solved that problem (aided by the development team respinsible for the tape's development; see the tape in not assembled and developed by IBM it-self but by one of its outsourcing firms). And all under the 'Service' tag.

Now you know why I did call IBM...

Cyber
0
 
antonamCommented:
I'd love to know what exactly was done to fix this issue. If anyone knows, please post the solution details!

-antonam
0

Featured Post

NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

  • 5
  • 4
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now