Link to home
Start Free TrialLog in
Avatar of Mazerender
Mazerender

asked on

SCSI Errors communicating with Exabyte Library

Greetings,

I'm having a problem resolving Event ID 11's (The driver detected a controller error on \Device\Scsi\adpu160m3) when trying to get my Brightstore ARCserve software to give me a good backup.  This is on an HP Proliant ML350 G3 using an Adaptec 29160 card talking to a Exabyte Magnum 224 LTO2 drive.  Server is running Windows 2000 SP4.  Arcserve will give E6300 NT SCSI Port errors and fail the job - if I look in the system log right before the Arcserve errors I will have the above Event ID 11 error.

What I have tried (what haven't I tried?):
Replaced the 29160 card with an identical 29160 card (ensure BIOS on the Adaptec card is at 3.10.0)
Uninstalled the Adaptec driver a few different times/ways making sure it was at the current 6.4.630.100 (2/4/2004)

Replaced SCSI cable and terminator with recommended Adaptec quality cable/terminator

Installed this in three different systems (2 HP Proliant ML 350's and 1 Dell PowerEdge 2850) the Dell was running W2k3 and I did not get the error in Device manager but had the job fail with NT SCSI errors in arcserve

Originally this problem started with a Iomega REV Autoloader 1000, we attributed it to this device which was very special (that's all I will say), nevertheless Iomega replaced the drive at least once due to this problem, and now Exabyte/Tandenburg has replaced the 224 library due to this same problem.

Yesterday I ran the ltotool from Exabyte while all Arcserve services were stopped (cstop) and got the Event ID 11 just running the tool.

Iomega, Exabyte, and Adaptec thus far have not been able to resolve this, talking with them I have:

-Played with the BIOS settings of the 29160, trying different things, mostly using all defaults except turning Domain Validation Off, however we did try configuring per Adaptecs KB article 15055 for a while
-turned off Removable storage
-uninstalled the default Driver for the drive so it shows up in unknown devices, arcserve likes it like this I'm told
-Assigned a different SCSI Id to the library/drive

At this point the boss is saying told you so should have went Disk based backup.  I'm kind of out of ideas, my power is clean and I don't store large magnets in my server room..I don't get it.  Any assistance is appreciated.
ASKER CERTIFIED SOLUTION
Avatar of mikelfritz
mikelfritz
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Just a thought - I see some people having the error 11 with CD/DVD Rippers - the problem is with the ASPI drivers.  The solution for some was to pull back to 4.60? version:  I know I used to have a heck of a time getting Arcserve to run on Novel with some versions of the ASPI, but that's Novel - not MS.

http://aspi.radified.com/
Hi

First of event ID 11 are related to hardware and nothing to do with the OS or software. I see that  you have already changed the scsi card, cable and the drivers. This seems to be primarily a configuration issue. Improper termination of the scsi cable or a IRQ conflict. Did you ensure that the SCSI cables have been properly terminated.

Check if you are using scsiport miniport drivers or storport miniport drivers for the SCSI card. The device vendor would be able to update you with that information.
Please check with this microsoft article if you have missed out anything this is something microsoft wants you to check to ensure that the issue is with hardware or software

How to troubleshoot event ID 9, event ID 11, and event ID 15 error messages

http://support.microsoft.com/kb/154690/

Please try to check the cable connections and also try to use a differnt tape for backup.

if you are using native windows backup then you need to have the RSM service running. Hope you ensure RSM was running when you tried to backup from windows instead of arcserve

bhanu


Avatar of Mazerender
Mazerender

ASKER

Thank you for your suggestions.  

mikelfritz: I set the speed in the Adaptec BIOS to 10 MB/sec for the Library SCSI ID, slow as this was the error still presented as normal.  I feel kind of bad buying anything else on this project, too bad I didn't think to buy a non Adaptec SCSI card the second time around as it's possible it could be a conflict.

bhanukir7:  yes I have checked the termination a bunch of times (not much to check right?  either the terminator is on or not, or am I missing something?).  Everything seems secure and seated well.  The light on the Terminator lights up when the server is powered on.  I show no IRQ or other resource conflicts, course sometimes they don't always show up.

I will talk to Adaptec about scsiport/storport miniport drivers and look over the Microsoft Article.  I have tried using a number of different tapes.  Finally I am using Arcserve which shouldn't use RSM, but I have tried with it disabled anyhow as a test.  I was able to get this same error when using the ltotool from Exabyte so I don't feel it's an Arcserve problem.

Thank you all for your comments and I'm all ears for more suggestions.

Norbert
Hi,

what were the kind of tests did you try with the lto tools. Did you try writing data without compression and then try1:1 and 1.5:1 and 2:1 ratio compression.

I hope you might have already upgraded the firmware for the library i.e medium changer and the tape drive.

I would certianly consider this as a issue with the tape drive or the tapes that are being used. But you dont see the errors with all the tapes unless you are trying to use a different set of tapes lto-2 or lto-3 while the drive is a lto-1 or a lto-2.

That is only a conclusion i wanted to draw. The tape drives might be a lto-4 or a lto-3 tape drives.

If possible try changing the scsi card from the current slot to another pci slot and see if that helps

bhanu
All right,

First off I talked to Adaptec they confirmed I am using scsiport miniport drivers and I have carefully went through the Microsoft article on this.  I am terminating both ends correctly according to Adaptec (I tried changing SCSI Controller Termination = Enabled instead of Automatic but that didn't seem to help.

I tried using TDK LTO2 tapes vs the SONY ones that I normally use.  Both are LTO2 tapes as the drive is an LTO2 drive.  I have cycled through many different tapes in troubleshooting this so don't feel it's a problem with bad media.

I have attached the screen shot from when I was running the ltotool, I can say that it failed almost immediately after starting to run.  Sometimes my backup jobs will run a few hours and sometimes they will fail fairly quickly.

At this point I will see if Exabyte can gather anymore information from the error given in trying to run their ltotool, perhaps run more test using it and also confirm that I have the latest firmware for the library and drive.

Again, thanks for the suggestions.  Norbert
ltotool.bmp
In talking to Exabyte Support and running more test using their ltotool, Exabyte wants to replace the drive in the library as the next step.  Currently, I am waiting on their RMA department to get me a new drive which I'll swap with the one in the library.

As I told them this is not the first time this has happened, so I'm afraid this may not do anything but at this point I"m game to try.

I will post back with my results of working with the new drive, it will take a few days I imagine.
Just an aside - I did bring that up in the first post:
>When they replaced the 224 I assume that is the library and drive.

If they did not replace the drive...  

It certainly stinks of hardware with all you've done.  Any other tape drives you could stick on the bus to vindicate yourself?  

I really feel for you with the "told you so" headed your way.  Not that I disagree with a disk backup, but how many copies can you keep on a disk backup as opposed to a cycle of tapes.  I totally agree with your philosophy - disk backup would be a nice addition to the tape but not a substitute.
Mikelfritz,

No they had replaced the whole library and drive before, which is why I'm surprised they would do it again.  Before this same problem had started happening and eventually I could not see the library in the Adaptec BIOS - I hesitated to mention that at first as replacing the library seemed to fix it so I'm not sure if it was related or not.  If I experience the same problems with the new drive I guess I'll see if Exabyte can bump my call to a higher tier of support - perhaps they will have a little more knowledge on this.

No other tape drives unfortunately.

I feel the same as you and would argue for a tape based backup over disk to disk or remote storage for a primary solution.  The big reason being number of portable copies.  Also I have looked into remote storage, however we backup close to 300 GB a night.  That amount of storage makes the price unreachable for using someone else to back the data up to.  Doing it myself would be an option but compared to a tape, the tape seemed to make more sense.  Until this problem showed up anyhow :)

Thanks again for suggestions, once the new drive is in place I will post back.

Norbert
I'm still thinking about a non-Adaptec controller, just to eliminate it.

Ever thought about theft?  Go give some IT guy a latte and, while he's not looking, steal a Mylex or some such...

Hopefully it's just two or three bad libs/drives - seems unlikely but sometimes that's the way it goes.
Mikelfritz,

I live in New Jersey, would you want to go for a coffee sometime?  

Joking aside that may be a good next step.  I haven't bought too many non Adaptec SCSI cards any recommendations?  The Mylex or LSI LSI20160 seems to be compareable to the Adaptec 29160.  Even a used one would probably work for my purposes.

Norbert
Where in Jersey? - I'm up in Sussex County - Lake Hopatcong area.  

I think Mylex got eaten by LSI.  so the 20160 maybe, anything that is LVD should be fine for a test.

Heare's one for under $100.00

http://www.cdw.com/shop/products/default.aspx?EDC=1002219
I'm in Fairfield, so about 30 minutes away.  Funny huh?

Ok, I might have to pick up one of those cards depending on how this new drive runs.  If it's typical of how it has behaved in the past it may run fine for a few weeks and then start up again with those errors.

Norbert
Well, if it runs fine for a few week and then starts giving you trouble I would suspect the 224 as the problem.
Well Exabyte shipped me a whole new library not just a drive.  Dropped it into production last night and was able to get a successful backup.  I'll keep my fingers crossed, and keep this ticket open for a week or so.  As I said above this is typical that it will run a while and then start flaking out.

Norbert
Going to assign the points to you as you did mention replacing the whole library even though they had done that before you were the closest answer.