Mazerender
asked on
SCSI Errors communicating with Exabyte Library
Greetings,
I'm having a problem resolving Event ID 11's (The driver detected a controller error on \Device\Scsi\adpu160m3) when trying to get my Brightstore ARCserve software to give me a good backup. This is on an HP Proliant ML350 G3 using an Adaptec 29160 card talking to a Exabyte Magnum 224 LTO2 drive. Server is running Windows 2000 SP4. Arcserve will give E6300 NT SCSI Port errors and fail the job - if I look in the system log right before the Arcserve errors I will have the above Event ID 11 error.
What I have tried (what haven't I tried?):
Replaced the 29160 card with an identical 29160 card (ensure BIOS on the Adaptec card is at 3.10.0)
Uninstalled the Adaptec driver a few different times/ways making sure it was at the current 6.4.630.100 (2/4/2004)
Replaced SCSI cable and terminator with recommended Adaptec quality cable/terminator
Installed this in three different systems (2 HP Proliant ML 350's and 1 Dell PowerEdge 2850) the Dell was running W2k3 and I did not get the error in Device manager but had the job fail with NT SCSI errors in arcserve
Originally this problem started with a Iomega REV Autoloader 1000, we attributed it to this device which was very special (that's all I will say), nevertheless Iomega replaced the drive at least once due to this problem, and now Exabyte/Tandenburg has replaced the 224 library due to this same problem.
Yesterday I ran the ltotool from Exabyte while all Arcserve services were stopped (cstop) and got the Event ID 11 just running the tool.
Iomega, Exabyte, and Adaptec thus far have not been able to resolve this, talking with them I have:
-Played with the BIOS settings of the 29160, trying different things, mostly using all defaults except turning Domain Validation Off, however we did try configuring per Adaptecs KB article 15055 for a while
-turned off Removable storage
-uninstalled the default Driver for the drive so it shows up in unknown devices, arcserve likes it like this I'm told
-Assigned a different SCSI Id to the library/drive
At this point the boss is saying told you so should have went Disk based backup. I'm kind of out of ideas, my power is clean and I don't store large magnets in my server room..I don't get it. Any assistance is appreciated.
I'm having a problem resolving Event ID 11's (The driver detected a controller error on \Device\Scsi\adpu160m3) when trying to get my Brightstore ARCserve software to give me a good backup. This is on an HP Proliant ML350 G3 using an Adaptec 29160 card talking to a Exabyte Magnum 224 LTO2 drive. Server is running Windows 2000 SP4. Arcserve will give E6300 NT SCSI Port errors and fail the job - if I look in the system log right before the Arcserve errors I will have the above Event ID 11 error.
What I have tried (what haven't I tried?):
Replaced the 29160 card with an identical 29160 card (ensure BIOS on the Adaptec card is at 3.10.0)
Uninstalled the Adaptec driver a few different times/ways making sure it was at the current 6.4.630.100 (2/4/2004)
Replaced SCSI cable and terminator with recommended Adaptec quality cable/terminator
Installed this in three different systems (2 HP Proliant ML 350's and 1 Dell PowerEdge 2850) the Dell was running W2k3 and I did not get the error in Device manager but had the job fail with NT SCSI errors in arcserve
Originally this problem started with a Iomega REV Autoloader 1000, we attributed it to this device which was very special (that's all I will say), nevertheless Iomega replaced the drive at least once due to this problem, and now Exabyte/Tandenburg has replaced the 224 library due to this same problem.
Yesterday I ran the ltotool from Exabyte while all Arcserve services were stopped (cstop) and got the Event ID 11 just running the tool.
Iomega, Exabyte, and Adaptec thus far have not been able to resolve this, talking with them I have:
-Played with the BIOS settings of the 29160, trying different things, mostly using all defaults except turning Domain Validation Off, however we did try configuring per Adaptecs KB article 15055 for a while
-turned off Removable storage
-uninstalled the default Driver for the drive so it shows up in unknown devices, arcserve likes it like this I'm told
-Assigned a different SCSI Id to the library/drive
At this point the boss is saying told you so should have went Disk based backup. I'm kind of out of ideas, my power is clean and I don't store large magnets in my server room..I don't get it. Any assistance is appreciated.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Hi
First of event ID 11 are related to hardware and nothing to do with the OS or software. I see that you have already changed the scsi card, cable and the drivers. This seems to be primarily a configuration issue. Improper termination of the scsi cable or a IRQ conflict. Did you ensure that the SCSI cables have been properly terminated.
Check if you are using scsiport miniport drivers or storport miniport drivers for the SCSI card. The device vendor would be able to update you with that information.
Please check with this microsoft article if you have missed out anything this is something microsoft wants you to check to ensure that the issue is with hardware or software
How to troubleshoot event ID 9, event ID 11, and event ID 15 error messages
http://support.microsoft.com/kb/154690/
Please try to check the cable connections and also try to use a differnt tape for backup.
if you are using native windows backup then you need to have the RSM service running. Hope you ensure RSM was running when you tried to backup from windows instead of arcserve
bhanu
First of event ID 11 are related to hardware and nothing to do with the OS or software. I see that you have already changed the scsi card, cable and the drivers. This seems to be primarily a configuration issue. Improper termination of the scsi cable or a IRQ conflict. Did you ensure that the SCSI cables have been properly terminated.
Check if you are using scsiport miniport drivers or storport miniport drivers for the SCSI card. The device vendor would be able to update you with that information.
Please check with this microsoft article if you have missed out anything this is something microsoft wants you to check to ensure that the issue is with hardware or software
How to troubleshoot event ID 9, event ID 11, and event ID 15 error messages
http://support.microsoft.com/kb/154690/
Please try to check the cable connections and also try to use a differnt tape for backup.
if you are using native windows backup then you need to have the RSM service running. Hope you ensure RSM was running when you tried to backup from windows instead of arcserve
bhanu
ASKER
Thank you for your suggestions.
mikelfritz: I set the speed in the Adaptec BIOS to 10 MB/sec for the Library SCSI ID, slow as this was the error still presented as normal. I feel kind of bad buying anything else on this project, too bad I didn't think to buy a non Adaptec SCSI card the second time around as it's possible it could be a conflict.
bhanukir7: yes I have checked the termination a bunch of times (not much to check right? either the terminator is on or not, or am I missing something?). Everything seems secure and seated well. The light on the Terminator lights up when the server is powered on. I show no IRQ or other resource conflicts, course sometimes they don't always show up.
I will talk to Adaptec about scsiport/storport miniport drivers and look over the Microsoft Article. I have tried using a number of different tapes. Finally I am using Arcserve which shouldn't use RSM, but I have tried with it disabled anyhow as a test. I was able to get this same error when using the ltotool from Exabyte so I don't feel it's an Arcserve problem.
Thank you all for your comments and I'm all ears for more suggestions.
Norbert
mikelfritz: I set the speed in the Adaptec BIOS to 10 MB/sec for the Library SCSI ID, slow as this was the error still presented as normal. I feel kind of bad buying anything else on this project, too bad I didn't think to buy a non Adaptec SCSI card the second time around as it's possible it could be a conflict.
bhanukir7: yes I have checked the termination a bunch of times (not much to check right? either the terminator is on or not, or am I missing something?). Everything seems secure and seated well. The light on the Terminator lights up when the server is powered on. I show no IRQ or other resource conflicts, course sometimes they don't always show up.
I will talk to Adaptec about scsiport/storport miniport drivers and look over the Microsoft Article. I have tried using a number of different tapes. Finally I am using Arcserve which shouldn't use RSM, but I have tried with it disabled anyhow as a test. I was able to get this same error when using the ltotool from Exabyte so I don't feel it's an Arcserve problem.
Thank you all for your comments and I'm all ears for more suggestions.
Norbert
Hi,
what were the kind of tests did you try with the lto tools. Did you try writing data without compression and then try1:1 and 1.5:1 and 2:1 ratio compression.
I hope you might have already upgraded the firmware for the library i.e medium changer and the tape drive.
I would certianly consider this as a issue with the tape drive or the tapes that are being used. But you dont see the errors with all the tapes unless you are trying to use a different set of tapes lto-2 or lto-3 while the drive is a lto-1 or a lto-2.
That is only a conclusion i wanted to draw. The tape drives might be a lto-4 or a lto-3 tape drives.
If possible try changing the scsi card from the current slot to another pci slot and see if that helps
bhanu
what were the kind of tests did you try with the lto tools. Did you try writing data without compression and then try1:1 and 1.5:1 and 2:1 ratio compression.
I hope you might have already upgraded the firmware for the library i.e medium changer and the tape drive.
I would certianly consider this as a issue with the tape drive or the tapes that are being used. But you dont see the errors with all the tapes unless you are trying to use a different set of tapes lto-2 or lto-3 while the drive is a lto-1 or a lto-2.
That is only a conclusion i wanted to draw. The tape drives might be a lto-4 or a lto-3 tape drives.
If possible try changing the scsi card from the current slot to another pci slot and see if that helps
bhanu
ASKER
All right,
First off I talked to Adaptec they confirmed I am using scsiport miniport drivers and I have carefully went through the Microsoft article on this. I am terminating both ends correctly according to Adaptec (I tried changing SCSI Controller Termination = Enabled instead of Automatic but that didn't seem to help.
I tried using TDK LTO2 tapes vs the SONY ones that I normally use. Both are LTO2 tapes as the drive is an LTO2 drive. I have cycled through many different tapes in troubleshooting this so don't feel it's a problem with bad media.
I have attached the screen shot from when I was running the ltotool, I can say that it failed almost immediately after starting to run. Sometimes my backup jobs will run a few hours and sometimes they will fail fairly quickly.
At this point I will see if Exabyte can gather anymore information from the error given in trying to run their ltotool, perhaps run more test using it and also confirm that I have the latest firmware for the library and drive.
Again, thanks for the suggestions. Norbert
ltotool.bmp
First off I talked to Adaptec they confirmed I am using scsiport miniport drivers and I have carefully went through the Microsoft article on this. I am terminating both ends correctly according to Adaptec (I tried changing SCSI Controller Termination = Enabled instead of Automatic but that didn't seem to help.
I tried using TDK LTO2 tapes vs the SONY ones that I normally use. Both are LTO2 tapes as the drive is an LTO2 drive. I have cycled through many different tapes in troubleshooting this so don't feel it's a problem with bad media.
I have attached the screen shot from when I was running the ltotool, I can say that it failed almost immediately after starting to run. Sometimes my backup jobs will run a few hours and sometimes they will fail fairly quickly.
At this point I will see if Exabyte can gather anymore information from the error given in trying to run their ltotool, perhaps run more test using it and also confirm that I have the latest firmware for the library and drive.
Again, thanks for the suggestions. Norbert
ltotool.bmp
ASKER
In talking to Exabyte Support and running more test using their ltotool, Exabyte wants to replace the drive in the library as the next step. Currently, I am waiting on their RMA department to get me a new drive which I'll swap with the one in the library.
As I told them this is not the first time this has happened, so I'm afraid this may not do anything but at this point I"m game to try.
I will post back with my results of working with the new drive, it will take a few days I imagine.
As I told them this is not the first time this has happened, so I'm afraid this may not do anything but at this point I"m game to try.
I will post back with my results of working with the new drive, it will take a few days I imagine.
Just an aside - I did bring that up in the first post:
>When they replaced the 224 I assume that is the library and drive.
If they did not replace the drive...
It certainly stinks of hardware with all you've done. Any other tape drives you could stick on the bus to vindicate yourself?
I really feel for you with the "told you so" headed your way. Not that I disagree with a disk backup, but how many copies can you keep on a disk backup as opposed to a cycle of tapes. I totally agree with your philosophy - disk backup would be a nice addition to the tape but not a substitute.
>When they replaced the 224 I assume that is the library and drive.
If they did not replace the drive...
It certainly stinks of hardware with all you've done. Any other tape drives you could stick on the bus to vindicate yourself?
I really feel for you with the "told you so" headed your way. Not that I disagree with a disk backup, but how many copies can you keep on a disk backup as opposed to a cycle of tapes. I totally agree with your philosophy - disk backup would be a nice addition to the tape but not a substitute.
ASKER
Mikelfritz,
No they had replaced the whole library and drive before, which is why I'm surprised they would do it again. Before this same problem had started happening and eventually I could not see the library in the Adaptec BIOS - I hesitated to mention that at first as replacing the library seemed to fix it so I'm not sure if it was related or not. If I experience the same problems with the new drive I guess I'll see if Exabyte can bump my call to a higher tier of support - perhaps they will have a little more knowledge on this.
No other tape drives unfortunately.
I feel the same as you and would argue for a tape based backup over disk to disk or remote storage for a primary solution. The big reason being number of portable copies. Also I have looked into remote storage, however we backup close to 300 GB a night. That amount of storage makes the price unreachable for using someone else to back the data up to. Doing it myself would be an option but compared to a tape, the tape seemed to make more sense. Until this problem showed up anyhow :)
Thanks again for suggestions, once the new drive is in place I will post back.
Norbert
No they had replaced the whole library and drive before, which is why I'm surprised they would do it again. Before this same problem had started happening and eventually I could not see the library in the Adaptec BIOS - I hesitated to mention that at first as replacing the library seemed to fix it so I'm not sure if it was related or not. If I experience the same problems with the new drive I guess I'll see if Exabyte can bump my call to a higher tier of support - perhaps they will have a little more knowledge on this.
No other tape drives unfortunately.
I feel the same as you and would argue for a tape based backup over disk to disk or remote storage for a primary solution. The big reason being number of portable copies. Also I have looked into remote storage, however we backup close to 300 GB a night. That amount of storage makes the price unreachable for using someone else to back the data up to. Doing it myself would be an option but compared to a tape, the tape seemed to make more sense. Until this problem showed up anyhow :)
Thanks again for suggestions, once the new drive is in place I will post back.
Norbert
I'm still thinking about a non-Adaptec controller, just to eliminate it.
Ever thought about theft? Go give some IT guy a latte and, while he's not looking, steal a Mylex or some such...
Hopefully it's just two or three bad libs/drives - seems unlikely but sometimes that's the way it goes.
Ever thought about theft? Go give some IT guy a latte and, while he's not looking, steal a Mylex or some such...
Hopefully it's just two or three bad libs/drives - seems unlikely but sometimes that's the way it goes.
ASKER
Mikelfritz,
I live in New Jersey, would you want to go for a coffee sometime?
Joking aside that may be a good next step. I haven't bought too many non Adaptec SCSI cards any recommendations? The Mylex or LSI LSI20160 seems to be compareable to the Adaptec 29160. Even a used one would probably work for my purposes.
Norbert
I live in New Jersey, would you want to go for a coffee sometime?
Joking aside that may be a good next step. I haven't bought too many non Adaptec SCSI cards any recommendations? The Mylex or LSI LSI20160 seems to be compareable to the Adaptec 29160. Even a used one would probably work for my purposes.
Norbert
Where in Jersey? - I'm up in Sussex County - Lake Hopatcong area.
I think Mylex got eaten by LSI. so the 20160 maybe, anything that is LVD should be fine for a test.
Heare's one for under $100.00
http://www.cdw.com/shop/products/default.aspx?EDC=1002219
I think Mylex got eaten by LSI. so the 20160 maybe, anything that is LVD should be fine for a test.
Heare's one for under $100.00
http://www.cdw.com/shop/products/default.aspx?EDC=1002219
ASKER
I'm in Fairfield, so about 30 minutes away. Funny huh?
Ok, I might have to pick up one of those cards depending on how this new drive runs. If it's typical of how it has behaved in the past it may run fine for a few weeks and then start up again with those errors.
Norbert
Ok, I might have to pick up one of those cards depending on how this new drive runs. If it's typical of how it has behaved in the past it may run fine for a few weeks and then start up again with those errors.
Norbert
Well, if it runs fine for a few week and then starts giving you trouble I would suspect the 224 as the problem.
ASKER
Well Exabyte shipped me a whole new library not just a drive. Dropped it into production last night and was able to get a successful backup. I'll keep my fingers crossed, and keep this ticket open for a week or so. As I said above this is typical that it will run a while and then start flaking out.
Norbert
Norbert
ASKER
Going to assign the points to you as you did mention replacing the whole library even though they had done that before you were the closest answer.
http://aspi.radified.com/