Link to home
Start Free TrialLog in
Avatar of sloutz
sloutz

asked on

NTBackup restore failure across multiple tapes: Cannot catalog media, cannot restore from previously cataloged media

Hi all,

I recently found that all my monthly backup tapes seem to be in a non-restorable state, and I'm kinda starting to get a little freaked out. Any advice or suggestions would be gratefully welcomed.

Background Info:
The backup setup in question is an HP Proliant DL380 G4 using a single drive HP StorageWorks Ultrium 960 and imation LTO3 tapes. All tapes were purchased in the last year, and they have only been written to once each. All backups were done with single job, single tape backup sets (no backups than run across 2 or more tapes). The same symptoms are consistent across all of my backup tapes older than 3 months old, and those symptoms are:
1) attempts to restore data from a tape result in a warning message saying that necessary files from the Active Family are offline. The tape in the drive ejects, and ntbackup refuses to use the tape even after reinsertion. All the backup jobs were done with single tape families.
2) I can traverse the files on the backup tapes thanks to the local cached catalogs of the backups, but when I try to run a Catalog job on one of the tapes, it errors saying their was an unexpected inconsistency on the requested media

The files that are on the tapes are backup to disk files that are several GB each.
After I first noticed the issue, I did a test backup using the same files and steps as always, but for reasons unknown I am able to restore from the new test backup tape without any issue.

Things tried:
-Un-checking the "use the catalog on the media to speed up restorations" option -> no effect
-Switching the SCSI ports that the tape drive and the MSA harddisk array are connected through -> no effect  
-Purchased a new tape drive cleaning catridge and cleaned the head -> no effect
-Ran 'rsm view /tlibrary  and then rsm.exe refresh /lf"library name"  to refresh RSM -> no effect
-Stopped the remote storage service, forced a recreate of the ntmsdata folder and rebooted the server -> all previously cataloged media disappeared but attempts to catalog my backup tapes fail the same as outlined above.
-Cataloging media on a separate Win 2003 server using NTBackup -> Catalog fails same as above
-Cataloging media on a separate Win 2003 server using BackupExec 2010 -> Fails saying "the blocksize being used is incorrect"
  - Tried changing the blocksize settings in BE 2010 -> no effect

Any advice would be greatly appreciated.
Avatar of Thomas Rush
Thomas Rush
Flag of United States of America image

I'd suggest you download HP's free tape drive diagnostic utility -- Library & Tape Tools.  Install it and run the drive diagnostics, and if that finds nothing odd, run tape media tests on a piece of media you can afford to lose (the write tests overwrite any current data; the read tests do not affect data on the tape).  

I am assuming that the backup jobs written to these tapes completed without errrors?  If not and there were errors, can you let us know the error messages?

It is possible that blocksize got changed somehow; have you tried all the possibilities?  

Other than running L&TT, it sounds like you've done most of the reasonable steps.  Let me know what it says, and also what trying as yet untried block sizes does.
Avatar of sloutz
sloutz

ASKER

Thanks SelfGovern. I'll try the L&TT tests today and post the results when its finished.

Regarding the backup jobs, yes they were completed without errors, and I was unable to find anything in the error logs related to ntbackup.

With regards to the blocksize settings, I only tried the largest 2 (64k and 32k) based on the assumption that large multi-gb files would require a larger blocksize, but I'll give the remaining settings a go after the L&TT just to be thorough.
Sounds like a plan.

As a point of interest, large files don't 'require' a large block size, but larger block sizes typically provide better performance on today's systems.  You could successfully backup up TB-sized files with 4K blocksize... it just might take a bit of time.
Avatar of sloutz

ASKER

Hi SelfGovern,
The Read/Write test of L&TT went fine, but the drive diagnostic was far from pretty. The results are below and items with the ** were highlighted in red.  I did a little reading but didn't find much other than this can indicate a general device failure. Being that I can still perform test backup and restores without issue, I'm reluctant to point the sole blame at my tape device, but I've arranged to borrow a similar device from a friend on Monday to get a "second opinion" so to speak.

All my catalog trials with the varying block sizes ended the same sadly. Interesting note on blocksize and file size relationships though. I'll be sure to remember that.

If you (or anyone else) happens to think of something else I can try over the weekend, I'm all ears~

|__ Analysis Results
    ||__ LTO Drive Assessment Test, version V23.01.2013
    ||__ Test run: Fri Dec 19 11:10:40 2014
    ||__ Drive serial number: HU10606
   ** ||__ There was an unexpected error condition on a Receive Diagnostic command
   ** ||__ Sense Key 0x00, Sense Code 0x0000 (No additional sense information) Error Code: 0x00 GOOD
    ||__ Sense Key 0x05, Sense Code 0x2600 (Operator selected invalid field in parameter list)
   ** ||__ There was an unexpected error condition on a Receive Diagnostic command
    ||__ Sense Key 0x05, Sense Code 0x2600 (Operator selected invalid field in parameter list) Error Code: 0x1802 DI_INVALID_PARAMETER
    ||__ This test requires that the Removable Storage service is not running.
    ||__ Please stop RSM (Computer Management/Services and Applications/Services)
    ||__ and then re-run the test.
    ||__ Test time: 1:53
Did you verify that RSM was not running at the time you were running this test?

If not, and the drive is still under support, I'd recommend you call HP.

It will be interesting to see what happens with a different tape drive.  I have seen very rare cases where it seems that a drive has slowly drifted out of spec, and can read tapes it's writing now, but not ones it wrote a while back -- possibly coincident with poor quality media.  If another drive can read the old tapes but has trouble with the new tapes, this might be the issue.

And on a side note, an LTO-5 tape drive will read LTO-3 media (although it is only able to write to LTO-4 and LTO-5 tapes).  It also brings you 4x the capacity, and the ability to encrypt your tapes with no performance or capacity hit... so if this drive is on its last legs or questionable, consider upgrading to an LTO-5.
Avatar of sloutz

ASKER

This time around I disabled RSM instead of just stopping it, then rebooted, but when running the L&TT I still got similar errors with the new addition that the tape I am using is registering as write protected though it is physically not (little switch on the front is in the open state, the same as all new tapes).  This drive is haunted.

Ill post an update in a few hours after I get my hands on the secondary device.

|__ Test 'LTO Drive Assessment test' started on device 'HP Ultrium 3-SCSI' at address '2/0.5.0'
    |__ Test aborted
    |__ Operations Log
    |    |__ LTO Drive Assessment Test Options
    |    |__ Test Coverage : Default
    |    |__ Allow Overwrite : True
    |    |__ executing LTO Drive Assessment Test...
    |    |__ adjusting boost value...
    |    |__ erasing ...
    |    |__ soft unload ...
    |    |__ loading ...
    |    |__ writing wrap 0 (1.8 m/sec.)
    |    |__ writing wrap 0 (1.8 m/sec.)
    |    |__ soft unload ...
    |    |__ loading ...
    |    |__ erasing ...
    |    |__ checking tape load ...
    |    |__ Aborted
    |__ Analysis Results
        |__ LTO Drive Assessment Test, version V23.01.2013
        |__ Test run: Mon Dec 22 10:17:28 2014
        |__ Drive serial number: HU10606CB0
**        |__ There was an unexpected error condition on a Receive Diagnostic command
        |__ Sense Key 0x05, Sense Code 0x2600 (Operator selected invalid field in parameter list) Error Code: 0x1802 DI_INVALID_PARAMETER
**        |__ There was an unexpected error condition on a Receive Diagnostic command
        |__ Sense Key 0x05, Sense Code 0x2600 (Operator selected invalid field in parameter list) Error Code: 0x1802 DI_INVALID_PARAMETER
        |__ Data Cartridge Information:
        |__     Vendor: Unknown
        |__     Format: Unknown
        |__     Serial Number: Unknown
        |__     Barcode: Unknown
**        |__ The cartridge currently loaded in the drive is write protected.
**       |__ This test cannot be performed on a write protected cartridge.
        |__ Please replace the cartridge with a writeable data cartridge and re-run the test.
        |__ Test time: 1:48
Avatar of sloutz

ASKER

It looks like my tape drive was the culprit all along.
I have only tested with 2 so far, but both tapes were able to be cataloged and restored from using the LTO4 tape drive I borrowed.
Its crazy to see that my old drive can read from backups its has recently written without issue, but that it sees tapes that it wrote more 3 months as unusable or corrupt.
ASKER CERTIFIED SOLUTION
Avatar of Thomas Rush
Thomas Rush
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sloutz

ASKER

I'm in the process of procuring new equipment, but as that will take a bit of time I'm closing out this ticket for now.
Thanks for the support and the interesting analogy to help explain my equipment's unique malfunction.