Solved

NTBackup restore failure across multiple tapes: Cannot catalog media, cannot restore from previously cataloged media

Posted on 2014-12-17
9
239 Views
Last Modified: 2015-01-07
Hi all,

I recently found that all my monthly backup tapes seem to be in a non-restorable state, and I'm kinda starting to get a little freaked out. Any advice or suggestions would be gratefully welcomed.

Background Info:
The backup setup in question is an HP Proliant DL380 G4 using a single drive HP StorageWorks Ultrium 960 and imation LTO3 tapes. All tapes were purchased in the last year, and they have only been written to once each. All backups were done with single job, single tape backup sets (no backups than run across 2 or more tapes). The same symptoms are consistent across all of my backup tapes older than 3 months old, and those symptoms are:
1) attempts to restore data from a tape result in a warning message saying that necessary files from the Active Family are offline. The tape in the drive ejects, and ntbackup refuses to use the tape even after reinsertion. All the backup jobs were done with single tape families.
2) I can traverse the files on the backup tapes thanks to the local cached catalogs of the backups, but when I try to run a Catalog job on one of the tapes, it errors saying their was an unexpected inconsistency on the requested media

The files that are on the tapes are backup to disk files that are several GB each.
After I first noticed the issue, I did a test backup using the same files and steps as always, but for reasons unknown I am able to restore from the new test backup tape without any issue.

Things tried:
-Un-checking the "use the catalog on the media to speed up restorations" option -> no effect
-Switching the SCSI ports that the tape drive and the MSA harddisk array are connected through -> no effect  
-Purchased a new tape drive cleaning catridge and cleaned the head -> no effect
-Ran 'rsm view /tlibrary  and then rsm.exe refresh /lf"library name"  to refresh RSM -> no effect
-Stopped the remote storage service, forced a recreate of the ntmsdata folder and rebooted the server -> all previously cataloged media disappeared but attempts to catalog my backup tapes fail the same as outlined above.
-Cataloging media on a separate Win 2003 server using NTBackup -> Catalog fails same as above
-Cataloging media on a separate Win 2003 server using BackupExec 2010 -> Fails saying "the blocksize being used is incorrect"
  - Tried changing the blocksize settings in BE 2010 -> no effect

Any advice would be greatly appreciated.
0
Comment
Question by:sloutz
  • 5
  • 4
9 Comments
 
LVL 20

Expert Comment

by:SelfGovern
Comment Utility
I'd suggest you download HP's free tape drive diagnostic utility -- Library & Tape Tools.  Install it and run the drive diagnostics, and if that finds nothing odd, run tape media tests on a piece of media you can afford to lose (the write tests overwrite any current data; the read tests do not affect data on the tape).  

I am assuming that the backup jobs written to these tapes completed without errrors?  If not and there were errors, can you let us know the error messages?

It is possible that blocksize got changed somehow; have you tried all the possibilities?  

Other than running L&TT, it sounds like you've done most of the reasonable steps.  Let me know what it says, and also what trying as yet untried block sizes does.
0
 
LVL 1

Author Comment

by:sloutz
Comment Utility
Thanks SelfGovern. I'll try the L&TT tests today and post the results when its finished.

Regarding the backup jobs, yes they were completed without errors, and I was unable to find anything in the error logs related to ntbackup.

With regards to the blocksize settings, I only tried the largest 2 (64k and 32k) based on the assumption that large multi-gb files would require a larger blocksize, but I'll give the remaining settings a go after the L&TT just to be thorough.
0
 
LVL 20

Expert Comment

by:SelfGovern
Comment Utility
Sounds like a plan.

As a point of interest, large files don't 'require' a large block size, but larger block sizes typically provide better performance on today's systems.  You could successfully backup up TB-sized files with 4K blocksize... it just might take a bit of time.
0
 
LVL 1

Author Comment

by:sloutz
Comment Utility
Hi SelfGovern,
The Read/Write test of L&TT went fine, but the drive diagnostic was far from pretty. The results are below and items with the ** were highlighted in red.  I did a little reading but didn't find much other than this can indicate a general device failure. Being that I can still perform test backup and restores without issue, I'm reluctant to point the sole blame at my tape device, but I've arranged to borrow a similar device from a friend on Monday to get a "second opinion" so to speak.

All my catalog trials with the varying block sizes ended the same sadly. Interesting note on blocksize and file size relationships though. I'll be sure to remember that.

If you (or anyone else) happens to think of something else I can try over the weekend, I'm all ears~

|__ Analysis Results
    ||__ LTO Drive Assessment Test, version V23.01.2013
    ||__ Test run: Fri Dec 19 11:10:40 2014
    ||__ Drive serial number: HU10606
   ** ||__ There was an unexpected error condition on a Receive Diagnostic command
   ** ||__ Sense Key 0x00, Sense Code 0x0000 (No additional sense information) Error Code: 0x00 GOOD
    ||__ Sense Key 0x05, Sense Code 0x2600 (Operator selected invalid field in parameter list)
   ** ||__ There was an unexpected error condition on a Receive Diagnostic command
    ||__ Sense Key 0x05, Sense Code 0x2600 (Operator selected invalid field in parameter list) Error Code: 0x1802 DI_INVALID_PARAMETER
    ||__ This test requires that the Removable Storage service is not running.
    ||__ Please stop RSM (Computer Management/Services and Applications/Services)
    ||__ and then re-run the test.
    ||__ Test time: 1:53
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 20

Expert Comment

by:SelfGovern
Comment Utility
Did you verify that RSM was not running at the time you were running this test?

If not, and the drive is still under support, I'd recommend you call HP.

It will be interesting to see what happens with a different tape drive.  I have seen very rare cases where it seems that a drive has slowly drifted out of spec, and can read tapes it's writing now, but not ones it wrote a while back -- possibly coincident with poor quality media.  If another drive can read the old tapes but has trouble with the new tapes, this might be the issue.

And on a side note, an LTO-5 tape drive will read LTO-3 media (although it is only able to write to LTO-4 and LTO-5 tapes).  It also brings you 4x the capacity, and the ability to encrypt your tapes with no performance or capacity hit... so if this drive is on its last legs or questionable, consider upgrading to an LTO-5.
0
 
LVL 1

Author Comment

by:sloutz
Comment Utility
This time around I disabled RSM instead of just stopping it, then rebooted, but when running the L&TT I still got similar errors with the new addition that the tape I am using is registering as write protected though it is physically not (little switch on the front is in the open state, the same as all new tapes).  This drive is haunted.

Ill post an update in a few hours after I get my hands on the secondary device.

|__ Test 'LTO Drive Assessment test' started on device 'HP Ultrium 3-SCSI' at address '2/0.5.0'
    |__ Test aborted
    |__ Operations Log
    |    |__ LTO Drive Assessment Test Options
    |    |__ Test Coverage : Default
    |    |__ Allow Overwrite : True
    |    |__ executing LTO Drive Assessment Test...
    |    |__ adjusting boost value...
    |    |__ erasing ...
    |    |__ soft unload ...
    |    |__ loading ...
    |    |__ writing wrap 0 (1.8 m/sec.)
    |    |__ writing wrap 0 (1.8 m/sec.)
    |    |__ soft unload ...
    |    |__ loading ...
    |    |__ erasing ...
    |    |__ checking tape load ...
    |    |__ Aborted
    |__ Analysis Results
        |__ LTO Drive Assessment Test, version V23.01.2013
        |__ Test run: Mon Dec 22 10:17:28 2014
        |__ Drive serial number: HU10606CB0
**        |__ There was an unexpected error condition on a Receive Diagnostic command
        |__ Sense Key 0x05, Sense Code 0x2600 (Operator selected invalid field in parameter list) Error Code: 0x1802 DI_INVALID_PARAMETER
**        |__ There was an unexpected error condition on a Receive Diagnostic command
        |__ Sense Key 0x05, Sense Code 0x2600 (Operator selected invalid field in parameter list) Error Code: 0x1802 DI_INVALID_PARAMETER
        |__ Data Cartridge Information:
        |__     Vendor: Unknown
        |__     Format: Unknown
        |__     Serial Number: Unknown
        |__     Barcode: Unknown
**        |__ The cartridge currently loaded in the drive is write protected.
**       |__ This test cannot be performed on a write protected cartridge.
        |__ Please replace the cartridge with a writeable data cartridge and re-run the test.
        |__ Test time: 1:48
0
 
LVL 1

Author Comment

by:sloutz
Comment Utility
It looks like my tape drive was the culprit all along.
I have only tested with 2 so far, but both tapes were able to be cataloged and restored from using the LTO4 tape drive I borrowed.
Its crazy to see that my old drive can read from backups its has recently written without issue, but that it sees tapes that it wrote more 3 months as unusable or corrupt.
0
 
LVL 20

Accepted Solution

by:
SelfGovern earned 500 total points
Comment Utility
Sometimes a drive slowly drifts out of spec -- alignment, for instance -- and for whatever reason, the drive's internal sensors aren't able to catch it.  Note: this is pretty rare in my experience.
It's like a gun with a scope that used to be dead on but is a bit loose and has been drifting off-target.  If you've been continually using the rifle as it gets worse, you'll compensate and shoot dead-on (i.e., your drive can read the tapes as they are now).  But if a gunsmith were to 'fix' your scope, you'd find that you couldn't hit the target anymore, even though you could before, when the scope was sighted in correctly (i.e., you can't hit the targets or read the tapes you used to be able to read).

HP has tools like TapeAssure (free), and especially Tape Assure Advanced (for which they charge) which can actively monitor tape drives for things like compression, speed, wear, and possible failures.  Other vendors may have similar diagnostics.   It's worthwhile to keep an eye on tape drives using the tools available.
0
 
LVL 1

Author Closing Comment

by:sloutz
Comment Utility
I'm in the process of procuring new equipment, but as that will take a bit of time I'm closing out this ticket for now.
Thanks for the support and the interesting analogy to help explain my equipment's unique malfunction.
0

Featured Post

Get up to 2TB FREE CLOUD per backup license!

An exclusive Black Friday offer just for Expert Exchange audience! Buy any of our top-rated backup solutions & get up to 2TB free cloud per system! Perform local & cloud backup in the same step, and restore instantly—anytime, anywhere. Grab this deal now before it disappears!

Join & Write a Comment

The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
ADCs have gained traction within the last decade, largely due to increased demand for legacy load balancing appliances to handle more advanced application delivery requirements and improve application performance.
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now