NTBackup restore failure across multiple tapes: Cannot catalog media, cannot restore from previously cataloged media

Hi all,

I recently found that all my monthly backup tapes seem to be in a non-restorable state, and I'm kinda starting to get a little freaked out. Any advice or suggestions would be gratefully welcomed.

Background Info:
The backup setup in question is an HP Proliant DL380 G4 using a single drive HP StorageWorks Ultrium 960 and imation LTO3 tapes. All tapes were purchased in the last year, and they have only been written to once each. All backups were done with single job, single tape backup sets (no backups than run across 2 or more tapes). The same symptoms are consistent across all of my backup tapes older than 3 months old, and those symptoms are:
1) attempts to restore data from a tape result in a warning message saying that necessary files from the Active Family are offline. The tape in the drive ejects, and ntbackup refuses to use the tape even after reinsertion. All the backup jobs were done with single tape families.
2) I can traverse the files on the backup tapes thanks to the local cached catalogs of the backups, but when I try to run a Catalog job on one of the tapes, it errors saying their was an unexpected inconsistency on the requested media

The files that are on the tapes are backup to disk files that are several GB each.
After I first noticed the issue, I did a test backup using the same files and steps as always, but for reasons unknown I am able to restore from the new test backup tape without any issue.

Things tried:
-Un-checking the "use the catalog on the media to speed up restorations" option -> no effect
-Switching the SCSI ports that the tape drive and the MSA harddisk array are connected through -> no effect  
-Purchased a new tape drive cleaning catridge and cleaned the head -> no effect
-Ran 'rsm view /tlibrary  and then rsm.exe refresh /lf"library name"  to refresh RSM -> no effect
-Stopped the remote storage service, forced a recreate of the ntmsdata folder and rebooted the server -> all previously cataloged media disappeared but attempts to catalog my backup tapes fail the same as outlined above.
-Cataloging media on a separate Win 2003 server using NTBackup -> Catalog fails same as above
-Cataloging media on a separate Win 2003 server using BackupExec 2010 -> Fails saying "the blocksize being used is incorrect"
  - Tried changing the blocksize settings in BE 2010 -> no effect

Any advice would be greatly appreciated.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Thomas RushCommented:
I'd suggest you download HP's free tape drive diagnostic utility -- Library & Tape Tools.  Install it and run the drive diagnostics, and if that finds nothing odd, run tape media tests on a piece of media you can afford to lose (the write tests overwrite any current data; the read tests do not affect data on the tape).  

I am assuming that the backup jobs written to these tapes completed without errrors?  If not and there were errors, can you let us know the error messages?

It is possible that blocksize got changed somehow; have you tried all the possibilities?  

Other than running L&TT, it sounds like you've done most of the reasonable steps.  Let me know what it says, and also what trying as yet untried block sizes does.
sloutzAuthor Commented:
Thanks SelfGovern. I'll try the L&TT tests today and post the results when its finished.

Regarding the backup jobs, yes they were completed without errors, and I was unable to find anything in the error logs related to ntbackup.

With regards to the blocksize settings, I only tried the largest 2 (64k and 32k) based on the assumption that large multi-gb files would require a larger blocksize, but I'll give the remaining settings a go after the L&TT just to be thorough.
Thomas RushCommented:
Sounds like a plan.

As a point of interest, large files don't 'require' a large block size, but larger block sizes typically provide better performance on today's systems.  You could successfully backup up TB-sized files with 4K blocksize... it just might take a bit of time.
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

sloutzAuthor Commented:
Hi SelfGovern,
The Read/Write test of L&TT went fine, but the drive diagnostic was far from pretty. The results are below and items with the ** were highlighted in red.  I did a little reading but didn't find much other than this can indicate a general device failure. Being that I can still perform test backup and restores without issue, I'm reluctant to point the sole blame at my tape device, but I've arranged to borrow a similar device from a friend on Monday to get a "second opinion" so to speak.

All my catalog trials with the varying block sizes ended the same sadly. Interesting note on blocksize and file size relationships though. I'll be sure to remember that.

If you (or anyone else) happens to think of something else I can try over the weekend, I'm all ears~

|__ Analysis Results
    ||__ LTO Drive Assessment Test, version V23.01.2013
    ||__ Test run: Fri Dec 19 11:10:40 2014
    ||__ Drive serial number: HU10606
   ** ||__ There was an unexpected error condition on a Receive Diagnostic command
   ** ||__ Sense Key 0x00, Sense Code 0x0000 (No additional sense information) Error Code: 0x00 GOOD
    ||__ Sense Key 0x05, Sense Code 0x2600 (Operator selected invalid field in parameter list)
   ** ||__ There was an unexpected error condition on a Receive Diagnostic command
    ||__ Sense Key 0x05, Sense Code 0x2600 (Operator selected invalid field in parameter list) Error Code: 0x1802 DI_INVALID_PARAMETER
    ||__ This test requires that the Removable Storage service is not running.
    ||__ Please stop RSM (Computer Management/Services and Applications/Services)
    ||__ and then re-run the test.
    ||__ Test time: 1:53
Thomas RushCommented:
Did you verify that RSM was not running at the time you were running this test?

If not, and the drive is still under support, I'd recommend you call HP.

It will be interesting to see what happens with a different tape drive.  I have seen very rare cases where it seems that a drive has slowly drifted out of spec, and can read tapes it's writing now, but not ones it wrote a while back -- possibly coincident with poor quality media.  If another drive can read the old tapes but has trouble with the new tapes, this might be the issue.

And on a side note, an LTO-5 tape drive will read LTO-3 media (although it is only able to write to LTO-4 and LTO-5 tapes).  It also brings you 4x the capacity, and the ability to encrypt your tapes with no performance or capacity hit... so if this drive is on its last legs or questionable, consider upgrading to an LTO-5.
sloutzAuthor Commented:
This time around I disabled RSM instead of just stopping it, then rebooted, but when running the L&TT I still got similar errors with the new addition that the tape I am using is registering as write protected though it is physically not (little switch on the front is in the open state, the same as all new tapes).  This drive is haunted.

Ill post an update in a few hours after I get my hands on the secondary device.

|__ Test 'LTO Drive Assessment test' started on device 'HP Ultrium 3-SCSI' at address '2/0.5.0'
    |__ Test aborted
    |__ Operations Log
    |    |__ LTO Drive Assessment Test Options
    |    |__ Test Coverage : Default
    |    |__ Allow Overwrite : True
    |    |__ executing LTO Drive Assessment Test...
    |    |__ adjusting boost value...
    |    |__ erasing ...
    |    |__ soft unload ...
    |    |__ loading ...
    |    |__ writing wrap 0 (1.8 m/sec.)
    |    |__ writing wrap 0 (1.8 m/sec.)
    |    |__ soft unload ...
    |    |__ loading ...
    |    |__ erasing ...
    |    |__ checking tape load ...
    |    |__ Aborted
    |__ Analysis Results
        |__ LTO Drive Assessment Test, version V23.01.2013
        |__ Test run: Mon Dec 22 10:17:28 2014
        |__ Drive serial number: HU10606CB0
**        |__ There was an unexpected error condition on a Receive Diagnostic command
        |__ Sense Key 0x05, Sense Code 0x2600 (Operator selected invalid field in parameter list) Error Code: 0x1802 DI_INVALID_PARAMETER
**        |__ There was an unexpected error condition on a Receive Diagnostic command
        |__ Sense Key 0x05, Sense Code 0x2600 (Operator selected invalid field in parameter list) Error Code: 0x1802 DI_INVALID_PARAMETER
        |__ Data Cartridge Information:
        |__     Vendor: Unknown
        |__     Format: Unknown
        |__     Serial Number: Unknown
        |__     Barcode: Unknown
**        |__ The cartridge currently loaded in the drive is write protected.
**       |__ This test cannot be performed on a write protected cartridge.
        |__ Please replace the cartridge with a writeable data cartridge and re-run the test.
        |__ Test time: 1:48
sloutzAuthor Commented:
It looks like my tape drive was the culprit all along.
I have only tested with 2 so far, but both tapes were able to be cataloged and restored from using the LTO4 tape drive I borrowed.
Its crazy to see that my old drive can read from backups its has recently written without issue, but that it sees tapes that it wrote more 3 months as unusable or corrupt.
Thomas RushCommented:
Sometimes a drive slowly drifts out of spec -- alignment, for instance -- and for whatever reason, the drive's internal sensors aren't able to catch it.  Note: this is pretty rare in my experience.
It's like a gun with a scope that used to be dead on but is a bit loose and has been drifting off-target.  If you've been continually using the rifle as it gets worse, you'll compensate and shoot dead-on (i.e., your drive can read the tapes as they are now).  But if a gunsmith were to 'fix' your scope, you'd find that you couldn't hit the target anymore, even though you could before, when the scope was sighted in correctly (i.e., you can't hit the targets or read the tapes you used to be able to read).

HP has tools like TapeAssure (free), and especially Tape Assure Advanced (for which they charge) which can actively monitor tape drives for things like compression, speed, wear, and possible failures.  Other vendors may have similar diagnostics.   It's worthwhile to keep an eye on tape drives using the tools available.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
sloutzAuthor Commented:
I'm in the process of procuring new equipment, but as that will take a bit of time I'm closing out this ticket for now.
Thanks for the support and the interesting analogy to help explain my equipment's unique malfunction.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Storage Software

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.