Solved

Troubleshooting SCSI Error EventID 9.

Posted on 2004-09-10
13
3,557 Views
Last Modified: 2008-01-09
I jsut started looking into a backup server that was periodically failing.  I found an eventid 9 "The device, \Device\Scsi\adpu160m2, did not respond within the timeout period." and then backup exec fails the job with a TFLE_PROGRAMMER_ERROR1"  When I look into the statistics, I notice that a hard write has occured (unrecoverable write error.)  This is just on one tape drive, and occurs only when the drive is transferring near its theoretical maximum for some time.  When the same job is run with the same tapes on another tape drive, it is fine.  I am trying to troubleshoot it and here are some ideas I have come up with so far.  Please add any suggestions.

Details.

Using IBM LTO autoloader with 2 Ultrium LTO-1 Tape drives (3580)

There is an adaptec 29160 SCSI controller controlling these tape drives and changer.

There is a cable running about 12 feet total, and all these three devices (changer, two tape drives) are chained on one SCSI channel

The SCSI card is seperate from any other SCSI component in the system (hard drives)
It SHOULD be connected at Ultra2/Wide SCSI (40 feet maximum run or so)

I have run local drive tests for some time now and cannot replicate or find an error (local tape drive fast read/write, wrap test, head resistance test, etc.)

I have noticed that the cable running to the drives is surrounded by AC power cables that are running parallel to it....

Ideas to possibly fix...

Reroute/shorten SCSI cable
update firmware/drivers on tape drives and SCSI controller
swap SCSI cables
change terminator/check to see if the ultrium dive is supplying power to it
change scsi id's with changer being the lowest and tapes being next higher two
replace drive...

Thanks!
0
Comment
Question by:cdesimone
  • 5
  • 4
  • 3
  • +1
13 Comments
 
LVL 15

Expert Comment

by:Cyber-Dude
ID: 12035287
Had that same issue, called IBM!!!
:)

If you have a Veritas Backup Exec, try to update its softwae drivers version. Also, the autoloader has its own drivers you can dwnload via IBM. Did you check you have the latest version?
The IBM 3850 tape page:
http://www-1.ibm.com/servers/storage/support/lto/3580.html

From what I can learn there was a firmware update not long ago;

Cyber
0
 
LVL 15

Expert Comment

by:Cyber-Dude
ID: 12035290
PS
The links are not working but you should check the following as well;
http://www.mail-archive.com/adsm-l@vm.marist.edu/msg34428.html

Cyber
0
 
LVL 22

Expert Comment

by:dovidmichel
ID: 12036918
Was a diagnostic dump done on the drive? That would provides the history of the drive and the failures should show up.
So after the drive was replaced if the new drive started having the same problem perhaps it is related to the drive position. Try swapping the position of the drive with one of the other drives and see if the problem follows the drive or the position.
By the way IBM drives do work better and more trouble free when IBM tapes are being used.
0
 

Author Comment

by:cdesimone
ID: 12040660
I did not dump the history of the drive, and that would be the next step.  From what I remember, most if not all of the tapes are IBM.  Would running AC power cords with the SCSI cable be a problem?? I am guessing right now that it would be a local problem with the drive due to the problem not really being seen on another tape drive on the same scsi channel.  I will do this and post the results.  Thanks
0
 
LVL 22

Expert Comment

by:dovidmichel
ID: 12040847
Power cords and other sources of RF interferience can cause problems but I have only seen them cause data corruption problem and not drive read/write or SCSI errors. Stuff like there is no sign of a problem but yet files on tape fail to compare back to what is on disk.
0
 
LVL 15

Expert Comment

by:Cyber-Dude
ID: 12041637
Try to disable the SCSI cable the tape is attached to and verify you get the same resault, which I doubt it. It could be anywhere between the controller and the tape. dovidmichel is also right, try to clear communication path with any objects may increase data latency though, I know the cables and they are quit resistant to such obsticles.

Cyber
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 

Author Comment

by:cdesimone
ID: 12061029
I jsut ran a small amount of tests on the drive and found a "sequential positioning error" to follow the backup exec failures closely.  I will look into it further very shortly.  Cyber, are you saying that I should just bypass all the cables that go from the host to drive with new ones?  I can do this, but I am not sure if the tapeloader will work in this configuration (it has to send commands to the drive to control retension and loading-if I can remember)  Thanks!
0
 
LVL 15

Accepted Solution

by:
Cyber-Dude earned 250 total points
ID: 12061163
I just woke up from bad; so it may take me time to analyse what you just did.

I meant disabling the tape to see whether you get the same error on the backup software (the device did not respond). I guess you will get something like 'The Device was not found' but I just wanted to make sure of that...
I know it wont work on that config; hehehe...

Cyber
0
 

Author Comment

by:cdesimone
ID: 12061317
NP, right now I am slamming the tape drive with any test I can find.  I am about to check the sequential transfer rate on the SCSI controller and turn it down if there is no limitation on the transfer speed to limit the data rate to 30 MB/Sec, hoping that it will be an easy fix.  Also, I found a firmware update that says it fixes "Reports invalid data to host on sequential positioning request" I am curious to see what the problem would be in this case, as none of our other LTO drives have any such problem, and the same firmware revision.  
0
 
LVL 22

Expert Comment

by:dovidmichel
ID: 12069305
Problem with most tests is that they do not accurately reproduce the process that takes place when the application does the backup. The controller does an auto-negotiate on the transfer speed, and from what I have seen it usually does a poor job. So setting manually setting the transfer speed is a good idea.

Did you try swapping the position of this drive with another to see if the problem follows the drive or the position?

0
 

Author Comment

by:cdesimone
ID: 12071356
I just tried to set the synchronous transfer rate dwn to 40MB/Sec on that one drive and also turn off enable disconnect/reconnect and it still did the exact same thing.  Right now, I am changing the SCSI id's of the tape drives hoping that this will fix something.  While I am at it, I will switch the two drives around.  Next, I will try to update the firmware (although I cannot see it doing anything due to the other drives not having the same problem)  and finally, update drivers.  

Cyber,
    What did IBM say/do to fix your problem?

Thanks
0
 
LVL 15

Expert Comment

by:Cyber-Dude
ID: 12092178
IBM didnt ask much; they replaced the whole 3580 drive, SCSI adapter and cables set. They updated firware (via their internal FTP service) and they eventually solved that problem (aided by the development team respinsible for the tape's development; see the tape in not assembled and developed by IBM it-self but by one of its outsourcing firms). And all under the 'Service' tag.

Now you know why I did call IBM...

Cyber
0
 

Expert Comment

by:antonam
ID: 14013510
I'd love to know what exactly was done to fix this issue. If anyone knows, please post the solution details!

-antonam
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

If you have a USB Drive that is not recognized by Windows the problem is usually that you have too many network drives or other drives that occupy all the drive letters D: E: or F: which is the normal drive letter of a usb drive. The way to correct …
I have written before on the benefits of using a Boot media other than your HDD when it has become infected.   The article I wrote about creating a bootable CD/DVD/USB (http://e-e.com/A_2343.html) was mainly concerned with building a UBCD4Win on CD …
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now