fault SQL backup files that wont restore - how common

pma111 used Ask the Experts™
Have you ever had situations where a backup of a SQL database would not restore, and if so what kinds of issues can mean a SQL backup cannot restore into a live or test instance. we are reviewing how well our DBA's test restores, so it would be useful to identify how common it is to find that a MSSQL backup file doesn't restore and how routinely a test restore should be performed for critical database backups. For infor these are all 2008 R2 databases.
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
John TsioumprisSoftware & Systems Engineer
Let me give some more bad situations...
A badly destination folder...not enough space...the backup fails
A backup application that has archiving option to save space....a minor corruption and everything goes ....
IT Engineer
Distinguished Expert 2017
Many cases and from the top of my head here are the more common ones I've faced:
  1. Corrupted backup file
  2. Missing differential or Transaction log backup
  3. Not compatible SQL Server versions
  4. Use the WITH RECOVERY option to soon
  5. Forget to use WITH RECOVERY option
  6. Not enough space for Restore
Steve WalesSenior Database Administrator
Other issues that could affect a restore:
* DBA not available (on a flight, in a hospital, on vacation on a beach in Mexico).  Probably only affects a smaller business where there may not be a backup resource available.
* Lack of documentation on the restore process
* If you still use tape, a broken tape can ruin your day too

I tend to restore to test or dev from a backup of each of my databases at least monthly.  The business quite often wants a data refresh more frequently than that.

In the years I've been doing this (I've been a DBA for 16 years) I've been unable to restore a database from the most recent backup twice.  

The first time was a broken tape during a restore.  We went back to the previous backup and I had redundant copies of logs so got around it (it was a production restore too, after a disk failure)

Second time was disk failure on the NAS device we backed up to and the process to copy to the cloud wasn't copying that directory.  Fortunately that was a test restore so we fixed the problem so it wouldn't be a production issue.

There's an old adage that states your backups are only as good as the last time you tested your restores.

The two main things that are going to cause you problems are human error and mechanical / technological failure.  Make sure you have redundant copies of everything and make sure your processes are well documented and frequently tested.

We also declare a mock disaster here where I work once a year. and run through a complete DR scenario.  In the last 3 years:

3 years ago: We had a customer declare a disaster.  We went through a full scenario from building a new VM, pulling down complete backups from the cloud and running a restore.
2 years ago:  I was "hit by a bus" and "in surgery" the same day we had a "disk fail".  This was a dry run as opposed to a physical case where they ran what they'd do past me as a test of my documentation.  We found a few faults in the documentation where they couldn't work out what to do (I'm the only DBA here) - and that pointed out weak spots that we corrected
1 year ago: We had an actual hardware failure that caused a short outage requiring a recovery.  We got to put our practice to the test.

There are so many things that can go wrong in this field.  Just document, test and make sure you keep going through the cycle.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial