fault SQL backup files that wont restore - how common

Have you ever had situations where a backup of a SQL database would not restore, and if so what kinds of issues can mean a SQL backup cannot restore into a live or test instance. we are reviewing how well our DBA's test restores, so it would be useful to identify how common it is to find that a MSSQL backup file doesn't restore and how routinely a test restore should be performed for critical database backups. For infor these are all 2008 R2 databases.
LVL 4
pma111Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

John TsioumprisSoftware & Systems EngineerCommented:
Let me give some more bad situations...
A badly destination folder...not enough space...the backup fails
A backup application that has archiving option to save space....a minor corruption and everything goes ....
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
Many cases and from the top of my head here are the more common ones I've faced:
  1. Corrupted backup file
  2. Missing differential or Transaction log backup
  3. Not compatible SQL Server versions
  4. Use the WITH RECOVERY option to soon
  5. Forget to use WITH RECOVERY option
  6. Not enough space for Restore
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Steve WalesSenior Database AdministratorCommented:
Other issues that could affect a restore:
* DBA not available (on a flight, in a hospital, on vacation on a beach in Mexico).  Probably only affects a smaller business where there may not be a backup resource available.
* Lack of documentation on the restore process
* If you still use tape, a broken tape can ruin your day too

I tend to restore to test or dev from a backup of each of my databases at least monthly.  The business quite often wants a data refresh more frequently than that.

In the years I've been doing this (I've been a DBA for 16 years) I've been unable to restore a database from the most recent backup twice.  

The first time was a broken tape during a restore.  We went back to the previous backup and I had redundant copies of logs so got around it (it was a production restore too, after a disk failure)

Second time was disk failure on the NAS device we backed up to and the process to copy to the cloud wasn't copying that directory.  Fortunately that was a test restore so we fixed the problem so it wouldn't be a production issue.

There's an old adage that states your backups are only as good as the last time you tested your restores.

The two main things that are going to cause you problems are human error and mechanical / technological failure.  Make sure you have redundant copies of everything and make sure your processes are well documented and frequently tested.

We also declare a mock disaster here where I work once a year. and run through a complete DR scenario.  In the last 3 years:

3 years ago: We had a customer declare a disaster.  We went through a full scenario from building a new VM, pulling down complete backups from the cloud and running a restore.
2 years ago:  I was "hit by a bus" and "in surgery" the same day we had a "disk fail".  This was a dry run as opposed to a physical case where they ran what they'd do past me as a test of my documentation.  We found a few faults in the documentation where they couldn't work out what to do (I'm the only DBA here) - and that pointed out weak spots that we corrected
1 year ago: We had an actual hardware failure that caused a short outage requiring a recovery.  We got to put our practice to the test.

There are so many things that can go wrong in this field.  Just document, test and make sure you keep going through the cycle.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.