Solved

BackupExec 8.x / NT4 Tape Failures

Posted on 2004-08-22
12
1,083 Views
Last Modified: 2013-12-01
I have been struggling with this for quite some time now.  We are experiencing an extremely high number of tape/backup failures on our servers, and have not been able to track down the source of the problem.  We are currently running:

Proliant DL380 servers
Compaq AIT 100 drives using Sony SDX3-100C tapes
Win NT Server
Veritas Backup Exec 7.3 and 8.2(?) (dont feel like running back to the farm at the moment..)  :)

Event viewer logs show numerous  ID 7's and 11's.  Mainly ID 11: 'The driver detected a controller error on .....'

Backups usually hang the server until we yank the drive, or do a hard-reboot.

We have good availability of drives, and have been able to switch them out.  However, we did find that on one of our drives that was residing on one of the worst servers (backup wise) the eject button was stuck depressed.  (might cause long-term damage to drive/tapes maybe?)

We have leaned more to a batch of bad tapes, due to the fact that we can get a good backup with different tapes.  However, the tapes that are reported good on one server, are reported bad on another and it is very unpredictable as to what tapes are good where.  All Veritas documentation that I have found points to bad media, and bad headers on the tapes.  We have to lean away from SCSI controller errors, due to the fact that this is happening on more than one server and is not an isolated problem.

Any ideas?
0
Comment
Question by:derenm
  • 4
  • 3
  • 3
  • +2
12 Comments
 
LVL 30

Expert Comment

by:Duncan Meyers
ID: 11867085
Check that SCSI termination is correct first of all.

If you are using an Adpatec SCSI controller, ensure that SCSI termination is set to Auto or On/Wide.

Check that you either have a terminator pack on the end of the SCSI cable or that termination is set correctly onb the drive (termination is on and term power from the device). I'm having no end of trouble with the HP websiteat the moment otherwise I'd be able to give you more specifics for tape drive itself.

0
 
LVL 55

Expert Comment

by:andyalder
ID: 11868731
Are these the internal hotswap 50/100 AITs or external ones? If internal is the backplane simplex or duplex? Does HP Library & Tape Tools show drive errors? (yes, it sounds like bad media or worn heads)
0
 

Author Comment

by:derenm
ID: 11872746
They are internal AIT100 drives.  We have been able to replace the drives, and I have verified that termination is on.  (Termination -can- be set on the drive itself..)  I have rarely heard of so many tapes going bad so quickly, though.  I am taking a bunch of them out of rotation as we speak just to make sure.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 11872932
Internal in a DL380, so they replace the top left hand disk with the tape and it shares the disk backplane. The backplane is terminated already as is the Smart 5i (or which ever other card you are using), termination must be OFF on the tapedrive. Didn't realise it could be set on the hotswap tapes without taking them out of the carrier first.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 11873265
Even with the non-hotplug internal HP/Compaq drives you never turn termination on on drive itself , there's a terminated cable hiddden away inside the chassis wrapped up and not connected to anything.

Can you describe your servers better, it should say G2, G3 or G4 and under POST and you do not mention if you have additional controllers or external storage - you should be using duplex rather than the default simplex if possible.
0
 
LVL 3

Accepted Solution

by:
passmark earned 125 total points
ID: 11878252

PassMark BurnIntest Pro has a tape drive test built into it.
http://www.passmark.com/products/bit.htm

It might not solve your problem but may help you diagnose the issue and do some testing to determine if the problem is software, the drive or the tapes.

David
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 55

Expert Comment

by:andyalder
ID: 11890284
Huh? Why do you need third party tools when HP provide LT&T and you already know you have terminated one end of the bus twice?
0
 

Expert Comment

by:ITHELP-BOCS
ID: 13563662
I am experiencing the same sort of problem on a HP Proliant DL380 G3 on SBS 2003.

Symptoms are:

Veritas Backup Exec V9.1 Build 4691 hangs on loading media and the tape becomes un-ejectable (either with software eject or by pressing the eject button on the drive), when this happens the only way to get the tape out of the drive and run a successful backup is to shut the server hardware down and power back up - this normally allows about 2 days of successful backups before the problem comes back.

HP StorageWorks Library & Tape Tools is unable to generate a support ticket, the error produced is:

The diagnostic function encountered a failure while generating the support ticket.

and 2 identical errors are recorded in the system event log:

Event ID: 11
Source: cpqcissm
Description: The driver detected a controller error on \Device\Scsi\cpqcissm1

I have tried (but neither have made any difference):
Installing SP1 for Backup Exec 9.1.
When the problem occurs, stopping all Backup Exec services and ejecting the tape.

I have also, on advice from HP tech support, downloaded and installed the most recent Proliant support pack for Windows Server 2003 so all device drivers and management software is upto date - this does seem to have improved the situation.

I think the Smart Array 5i controller firmware is upto date - it's V2.58 and HP StorageWorks L&TT is V3.5 SR1

The drive has already been replaced by HP and it's in the bottom left hand hot plug slot in the front of the server + the server has never been opened up from new so all hardware configuration is still at factory settings.

Any suggestions would be greatly appreciated.
0
 

Author Comment

by:derenm
ID: 13586632
Finally have been able to solve the problems!

Here is the order of what I finally had to do to solve the issues.  In some instances, the load on the affected server HAS to be taken into account, but I will go more in depth later.

1.  Upgraded from BE 8.x (and in some rare cases, 7.x) to 9.1 SP1  (the SP update is a -must-!)
2.  Dowloaded newest proliant Support Pack from HP which did a barage of driver updates.  *keep in mind that if a Windows service pack or patch is installed, it is recommended that the Support Pack is reapplied, as per HP.
3.  Upgraded Firmware on SA 5i contoller.  (this in itself should elimiante about %50 of the SCSI bus slowdowns, it was a marked issue from HP)
4.  Removed the Windows driver and ONLY used the Veritas driver.  (In WindowsNT,  so to the control panel and completely remove the NT tape driver.    The Device & Media service will do all the work from there.  If not, re-run the device wizard under BE and tell be to ONLY use the Vertias drivers)
5.  GO THROUGH YOUR MEDIA LIBRARY WITH A FINE TOOTH COMB!!!!!  Chances are, you are do have alot of bad media.  If you have a test server, go through each tape, re-label, erase, and do a test job to try to get the job to produce these errors.  If you have bad media, you are sure to see these problems again.
6.  Also upgrade the firmware on your tape drives.  This was part of our issue as well.
7.  Establish a good system for troubleshooting.  Try your best to determine what hardware is good, what tapes are good and where your problem SCSI controllers are (if applicable.)  THIS IS A MULTI-THREADED PROBLEM!  Organization is key to solving these issues one at a time.
8.  To help diagnose bad tapes, esablish a good system to implement the Media managment system used by Veritas.  This will help you keep track of old tapes, new tapes and other media issues.

-ONLY WHEN ALL OTHER OPTIONS HAVE BEEN ELIMITATED....
Change the SCSI bus from Simplex to Duplex.  On our exchange server in particular, the load was just too much on our server to handle regular backups and typical use.  One or the other, but not both.  To solve this issuse, we slid the whole drive array (4 drives, raid 5) down two slots so the drive array was now living on positions 3, 4, 5 & 6 on its OWN SCSI port, and the tape drive was on its own as well.   Keep in mind that this does require a propriatary SCSI terminator from HP.

I am highly stressing organizaion for this problem.  Initally, I was under alot of pressure to get these issues fixed and was moving to fast to really try to get this out of the way.  Each server of ours was presenting unique issues with backups, however, all the symtoms were the same.

I am stressing again that there is an issue with the media.  It doesn't make sense, but this is majority of your problem.   On servers with heavy load is this more prominant for some reason.  Also, in one case of ours, a bad drive was causing our tapes to go bad.   I don't really know the details, but I am assuming that some type of header on the tape was getting overwritten, preventing the tape to be read anywhere else.  Case in point, it was yet another source of our problems.

BTW, the un-ejectable tape issues can be a pain.   Sometimes the BE services will hang when you try to stop them (then again, I am in an NT environment!)  So the quickest way is just to pull the drive, then stop the services.  Reinsert and restart.

-matt

0
 

Expert Comment

by:ITHELP-BOCS
ID: 13599418
Thanks for the advice Matt, the problems with Veritas loading media and un-ejectable tapes actually seem to have gone away (for now) since installing the up to date Proliant support pack and I have closed the call with HP tech support, the only issue that remains is with HP SW L&TT generating errors when I attempt to create a support ticket although this isn't effecting anything else as far as I can see. I will however by keeping your suggestions for use if the problem comes back.

Neil
0
 

Author Comment

by:derenm
ID: 13645857
Hmm...  maybe I went a little deep on that, eh?  :)

Out of curiosity, were you getting event 9's with the ID 11's?

0
 

Expert Comment

by:ITHELP-BOCS
ID: 13647617
No Event 9's!
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Having issues meeting security compliance criteria because of those pesky USB drives? Then I can help you! This article will explain how to disable USB Mass Storage devices in Windows Server 2008 R2.
A quick step-by-step overview of installing and configuring Carbonite Server Backup.
This tutorial will walk an individual through the process of configuring basic necessities in order to use the 2010 version of Data Protection Manager. These include storage, agents, and protection jobs. Launch Data Protection Manager from the deskt…
This tutorial will walk an individual through the process of installing of Data Protection Manager on a server running Windows Server 2012 R2, including the prerequisites. Microsoft .Net 3.5 is required. To install this feature, go to Server Manager…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now