Avatar of mkramer777
mkramer777Flag for United States of America

asked on 

disaster recovery plan

We do not have a disaster recovery plan in place for our servers, network, and client computers.  A little info on what we are doing to backup data and then I would like some recomendations on what kind of a plan I should have and where I should begin.
We have all server VM's being backed up by VEEAM to a QNAP device off site. These include a file server where all of the users documents reside. This is done at 2 locations.   I have VEEAM replicate data back and forth between QNAPS at each location.  I also have an external usb drive attached to each QNAP that I swap out with 2 usb drives that are off site.  I do this weekly.  On top of the user's files being backed up on the file server I also have installed Carbonite backup which is running continuously on each computer in the company.
OK.  I know this is a huge question but I would like some guidance on what steps should be written down in case the server that hosts the VM's (I use VMWARE) crashes.  
Disaster RecoveryStorage SoftwarevSphereVeeamVMware

Avatar of undefined
Last Comment
Gerald Connolly
Avatar of Ron Malmstead
Ron Malmstead
Flag of United States of America image

In addition to your backup plan, which seems to be adequate according to your description,.. you need a "RECOVERY PLAN".  You have the first part (disaster plan), but you need the second part (recovery plan).  Most think of a "Disaster Recovery Plan", as simply the backup portion... but the process for recovering different types of resources (email, database, file shares, accounting software)....varies greatly, and is rarely as easy as clicking "restore" unless we are just talking about a single file.  

1.  Auditing:  What is being backed up (identify business critical resources), how often is it "successful", how often does it "fail"?  What are the failures, and have they been corrected?   Who get's the notifications and what do they do with them?  This "Audit" process should be scheduled....  Monthly, weekly, ...whatever you deem necessary for business continuity.  You may add resources that are not immediately included in your backup process, but nonetheless are critical for business continuity. (this is why a periodic Audit is very important)

2.  Documentation:  EVERYTHING involved in your backup process, should be DOCUMENTED.  The service accounts, the resources, the personnel involved, the step-by-step restore process, the frequency, AND ANY CHANGES THEREOF.  The entire plan should be documented, in case the person(s) involved in your disaster recover plan get hit by a bus, or the plane they were all on goes down. (not joking)

3.  Demonstration:  Periodically, you need to DEMONSTRATE, that the critical resource backups you are getting... can actually result in a full restore, for critical business processes.  Each one, should be periodically checked... for example... FINANCIAL DATA.  Your tech personnel should be able to demonstrate, at least quarterly, that they can restore the backups for those critical resources to full functionality.  Your process for doing so may vary, but it is critical that you test the restore process and DOCUMENT the steps so a third grader can follow it.  This is very important, because you will discover nuances, failures, and other things you "can't know"... until you actually try to restore something.  You may find in this process, that you are "missing something", that should be backed up.  In many cases, people don't find these things out until a failure occurs.  You don't want to be that person.

My advice is based on my experience in IT SOX compliance.  Your mileage may vary, and you may find some of my suggestions to be overkill for your business size/model... but the principles are the same no matter what your company size.  Your critical resources for business continuity, are what your disaster AND recovery plan, is all about.
Avatar of Robert
Flag of United States of America image

Blurred text
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of madunix

Make sure you have 
  • DRP established and aligned with your RTO and RPO objectives
  • Detailed Information System procedures to facilitate the recovery of capacity on a backup site following an incident
  • DRP documented
  • Included different threats that the Plan will achieve have been identified
  • Identified critical services/processes
  • Tested different scenarios that the DR Plan will achieve defined
  • Simultaneous Risk Analysis 
  • Different DR strategies established
  • Maintain DRP

Having a plan in place is one thing.. Without testing the plan it is useless and a waste of resources. A quarterly/semi-annual/annual test is required.. it will show up any deficiencies.
Avatar of Mr Tortu(r)e
Mr Tortu(r)e
Flag of France image


the method and theory seems to have been given already.
From another perspective, maybe more concretely, for DR you need another room or better another building or better another site, with the fine newtork infrastructure to replace your first site. And it will need enough compute (ESX servers) and storage to restart all your activity (or only the main part of it, depending on your choices and budget).
Then you ll need a DR solution. You could keep with Veeam as you already have it, if you are satisfied, or another, to replicate your prod site to 2nd site and then failover (manually or automatically, depending on your solution).

With Veeam you could create replica VMs from prod VMs, or from backup. But maybe it depends, and you ll need to reconfigure all your backup/replication policy, because you already have several job types depending each other (maybe 1 backup and 2 backup copy? idk)
Avatar of madunix

When we talk about BCP, You should understand the three concepts of
  • BIA
  • BCP 
  • DRP 
BCP talks about the business processes, while DRP is for specific resources needed by a business process. For example, in DRP, you define the RTO and RPO and disaster recovery strategies. DRP's RPO and RTO should support the BCP's RPO and RTO.

User generated image

I am giving more light to DRP. Per NIST, “DRP is a written plan for processing critical applications in the event of a major hardware or software failure or destruction of facilities.
A written plan for recovering one or more information systems at an alternate facility in response to a major hardware or software failure or destruction of facilities.”
Implementing the IT Disaster Recovery Plan (DRP) is a process in itself and is the subject of a dedicated methodology. However, you should see the IT Disaster Recovery Plan as a detailed Information System procedure to facilitate the recovery of capacity on a backup site following an incident. The IT Disaster Recovery backup Site shall host the backup systems for the critical information system.
The IT DRP is only one of the components of the IT backup system. Remember, an incident does not always lead to a site switchover or to switching production machines onto backup machines if you have built fault tolerance systems. Fault tolerance means the capability of systems to continue to operate in the event of failure of one or more system components. The most important thing to implement redundancy measures to mitigate hardware failures is a severe impact on availability by using fail-over techniques such as active-passive and active-active.
It would be best if you had Data backup plans that aim to describe and control the company’s data and restore it after a partial failure (hardware or software fault) or a total disaster.
The data backup plan specifies the following information:
  • Technical scope
  • Backup strategy (content, scheduling)
  • Backup procedures & tools
  • Restoration procedures
  • Testing procedures

IT Disaster Recovery - should be treated as part of your overall Business Continuance Planning (BCP)! @MadUnix has already covered the DR side of things and the grey area between BCP and DR.

BCP should take into account EVERY thing that could affect you business processes! ie what happens if your Help Desk or Order takers, lose power to their desks - to the loss of a Computer room and or the entire building. In a previous life i had a customer (Pharmaceutical Company) that included "denial of access to site, due to demonstrations" as part of their BCP planning! I mean that could be important if you need to power up a server and you cannot get on-site to do it!!

You only have to look at what happened to OVH in Germany to see what can go wrong if the plan does not fit reality, some of their customers have lost everything even though they thought that they had a backup strategy in place (NB a backup strategy, not a BCP)!

VMware, a software company founded in 1998, was one of the first commercially successful companies to offer x86 virtualization. The storage company EMC purchased VMware in 1994. Dell Technologies acquired EMC in 2016. VMware’s parent company is now Dell Technologies. VMware has many software products that run on desktops, Microsoft Windows, Linux, and macOS, which allows the virtualizing of the x86 architecture. Its enterprise software hypervisor for servers, VMware vSphere Hypervisor (ESXi), is a bare-metal hypervisor that runs directly on the server hardware and does not require an additional underlying operating system.

Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews


IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo