Solved

Backup Failures - what goes wrong

Posted on 2013-01-02
8
221 Views
Last Modified: 2013-01-14
Can I ask regardless of what backup solution you use:

1) Have you ever had backup jobs fail? If so what caused them to fail, and is it quite common for them to fail? Or a once a year type scenario?

2) What mechanisms do you have in place to identify a backup job that didnt complete successfully, and what can you do about a failed backup job? i.e. if it failed it failed, theres no much you can do about it?
0
Comment
Question by:pma111
  • 3
  • 2
  • 2
  • +1
8 Comments
 
LVL 57

Accepted Solution

by:
giltjr earned 167 total points
Comment Utility
1) Yes.  Various reasons.  Not enough space left in the backup pool.  Files being locked that should not be locked.  I would say a few times a year we may have jobs fail.

2) Any good backup tool should provide you with something to says "backup failed."  On our z/OS mainframe the jobs end with a non-zero condition code.  On our distributed systems I'm not sure about today, but they used to produce a report that showed that everything was good or if there was a problem.  This was reviewed by our operations group to verify.

It depends on the reason the backup failed, but typically if it is a very important file (or group of files) we do what is needed to get the backup run.  If it was a not so important file we would just make sure it worked the next backup.
0
 
LVL 19

Assisted Solution

by:strivoli
strivoli earned 167 total points
Comment Utility
If you are using Win the low-cost solution is the built-in. The preferred solution is Symantec Backup Exec. Backups are no different from many other things in IT: analyze, test, set, check. If backups fail you usually receive an advice. If backups do not fail you should run a restore periodically and check the restored data.
0
 
LVL 3

Author Comment

by:pma111
Comment Utility
>.1) Yes.  Various reasons.  Not enough space left in the backup pool.  Files being locked that should not be locked.  I would say a few times a year we may have jobs fail.


Re files being locked, can you provide an example? I.e. users with files open? Or something else (please keep answers management / low tech freindly)...
0
 
LVL 18

Assisted Solution

by:irweazelwallis
irweazelwallis earned 166 total points
Comment Utility
Answer to question one has a few answers

Media Failure - if you are writing to tapes then these can fail, it doesn't happen very often due to tape but can happen due to tape drives. This depends on age of hardware and media and its a linear relationship between the age of them and the likelyhood of them failing.

If you are using Disk as a primary backup medium then freespace can cause regular failures if capacity planning has not been undertaken or budget it tight then regular failures due to this can happen frequently

this is something you can control with maintenance and correct budgeting

Backup Configuration Issue
This is dependant on application you are backing up and changes to the backup routines.

If you regularly change backups i.e. adding and removing sources and destinations then this can lead to failure

Some applications are less reliable than others and often have problems with backing up this is down to system admins to document the common errors and frequency of the them
i.e. file share backups file in use errors


Question 2
Alerting is common to almost all (if not all) backup applications and and scripted ones should have alerts built in so you can note when backups fail

mechanisms for dealing with it need to be weighed up with the follow (not exhaustive but just the ones i can think of)

Nature of the error - i.e. hardware error that needs to be fixed, issue with the application being backed up, issue with the backup application
criticality of the error i.e. with the file server example used could potentially be ignored
Importance of data - i.e. audit, regulatory, disaster recovery requirement
Time since last backup/last full backup
time it would take to rerun the backup
impact of re-running the backup at a particular time
time to re-run the backup
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 3

Author Comment

by:pma111
Comment Utility
>>If you regularly change backups i.e. adding and removing sources and destinations then this can lead to failure


Can you elaborate a little on this? Especially what you mean by sources/destinations?
0
 
LVL 3

Author Comment

by:pma111
Comment Utility
>>i.e. file share backups file in use errors


And can you elaborate on this too?
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
"Files being locked that should not be locked."  

This is typically on our z/OS system.  Under z/OS when you open a file you can open it "exclusive" use only.  Meaning no other process can open the file while your process has it open.   This is because when you open a file the system has no clue if you are going to open the file for read only or write.  This prevents two process from trying to update the same file at the same time.  

Unfortunately there is no real equivalent of this on most distributed systems.  Which is why you can have two, or more, processes update the same file.
0
 
LVL 18

Expert Comment

by:irweazelwallis
Comment Utility
you have a backup job that does file server backups from fileserver1

changing the source means adding filesserver2 into that backup job

changing the destination means writing to different disks, different tape drive

this can lead to errors if its not been tested fully.

Some of the less enterprise level backup solutions can give more issues that others.

Using backup exec in previous roles there were no end of issues that caused mutliple failures across multiple job.
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
new to networking configuration 6 30
How to add replacement disk to HP RAID ? 16 37
Nic to NIC 5 42
Printer Settings 3 58
Outsource Your Fax Infrastructure to the Cloud (And come out looking like an IT Hero!) Relative to the many demands on today’s IT teams, spending capital, time and resources to maintain physical fax servers and infrastructure is not a high priority.
The article will include the best Data Recovery Tools along with their Features, Capabilities, and their Download Links. Hope you’ll enjoy it and will choose the one as required by you.
This tutorial will walk an individual through the process of installing of Data Protection Manager on a server running Windows Server 2012 R2, including the prerequisites. Microsoft .Net 3.5 is required. To install this feature, go to Server Manager…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!

Get 1:1 Help Now