We help IT Professionals succeed at work.

with daily backups, what's the RPO we can commit to?

if we're taking backups daily ie every 24 hours (timings
of the backups (the backups can end between 1am to
4am, depending on how much is there to backup),
what's the RPO (Recovery Point Objective) we can
commit to?  24hrs or 30 hrs or it's 48hrs?

In the current risk management doc, it's indicated
 as 48 hrs but since I joined not long ago, checking
if this is correct?
Comment
Watch Question

Fractional CTO
Distinguished Expert 2018
Commented:
3x general approaches.

1) Realtime Replication. Where several database instances maintain realtime mirrors of each other.

RPO - Zero data loss.

2) Background Replication. Where a hot spare is run with data pulls at some frequency, then data overwrites hot spare data. I normally use this approach for projects generating substantial cash. Usually doing an hourly data pull.

RPO - 1 hour data loss.

3) Nightly backups. Best used with sites with a data access pattern of read often, write rarely.

RPO - 24 hour data loss, which might be zero data loss if no writes have occurred.

You'll choose your approach, primarily based on money throughput, so how much data loss can be tolerated as a function of lost revenue or lost time due to any after backup restore actions to manually restore any data lost when using approach #2 or #3.
kevinhsiehNetwork Engineer
Commented:
Good question. I would say that the RPO you can meet is safely stayed as 24 hours + time to complete the backups.

My logic being that backups jobs don't start at a consistent time, especially if multiple jobs are queued and then run.

The more common definition seems to be how often you schedule your jobs. It is certainly not 2x your job schedule, unless you are putting into your RPO guarantee an expectation that backups jobs may fail and need to be rerun.
nociSoftware Engineer
Distinguished Expert 2018
Commented:
If  a  large time window exists question do you have a point in time the backup is made. Crucial for databases otherwise all data can be stale.
Can you create a checkpoint on storage while all IO is flushed, quiescent for a while the snapshot is taken, and then continue the operation AND backup the snapshot.
The snapshot can be removed  after backup has been made..

Are restore practiced?

Author

Commented:
Thanks for the inputs.

>Are restore practiced?
Only once a year during DR or when a recovery is raised.
We backup to Data Domain (not to tapes)
Only once a year during DR or when a recovery is raised.
We backup to Data Domain (not to tapes)

That's an extremely lax (worthless) recovery test schedule.  At a very, very minimum, you should have a recovery test every quarter.  However, I would do it more frequently for financial industries.  I usually do a monthly recovery test for any place that does financials, with an occasional, extra, out of band "surprise test" at random times during the year.  Having said that, you don't need to do a full blown recovery test, just spot checks on different servers and different file systems, each month, although I do it more frequently.  You do need to do full system recoveries every year to make sure that a full failure can be recovered from.

If you're not testing your backups on a regular basis, you are not doing backups.  You're just going through the motions and pretending it works.  Backups must always be tested to be certain they work.  I've seen far too many companies fail on this that I stress this to them all the time.  There's far too many recovery failures that they've just wasted all that money they've been spending on backups, that they might as well have not spent it.  Once a year isn't enough, and you're wasting money running backups when you don't test the recovery.
nociSoftware Engineer
Distinguished Expert 2018

Commented:
Also you have to take into account that restore may take even longer than taking backups.
(after detecting failure... so there will be loss of productivity DURING the restore.).

besides RPO,, also RTO needs to be known.
kevinhsiehNetwork Engineer
Commented:
Yes, restore time can be significantly longer, especially with Data Domain.

Of course, if you had RPO of 5 minutes, it's neigh impossible to do an actual restore that quickly, so it shouldn't be a surprise that RTO can be greater than RPO.

Author

Commented:
>RTO needs to be known
Yes, we do buffer in the restoration time to be about double the time
it took to restore from the last DR exercise.

I've seen in a previous bank that almost every week, there's a restoration
request to restore selected files/folder (not entire server) but they lack
the capacity/resource to do full restore quarterly (though in another bank,
they do it monthly) : can't we use the following reasons that our backups
have been tested:
a) we monitor backups for any errors
b) because it resides in data domain, we use sort of "explorer" tool to
    read/open the backups periodically (quarterly) to verify?
c) in the days I work for a German bank, the OpenVMS command has
    an option  "backup/verify" which reads back the tapes each time
    after a backup: isnt this a good alternative to doing full restore?
kevinhsiehNetwork Engineer

Commented:
Reading/verify is insufficient. A tape drive can do that but not actually restore.

You should be testing full restore of systems, as well as partial restores.
nociSoftware Engineer
Distinguished Expert 2018
Commented:
OpenVMS Backup/verify is Good as long as your data is stable.  So you will need to make a backup from  a snapshot and release the snapshot AFTER the verify. (snapshot needs to be made in the storage backend).
OpenVMS Backup /verify makes a backup and then rewinds the tapes and does generate the backup again and checks if data read from the tape is still the same.... That is different from many backup / verifies that only read back the tape and check for read errors. (Also VMSBACKUP can loose an entire backup block every (by default 10 blocks) and recover that.
In the case of backing up Oracle/RDB make sure the online RMU/BACKUP is made first.  to store database backups with point in time.
Oracle/RDBMS needs some other tooling.