Using VMWare for DRP with Intersystems' Cache database

This question is about using VMWare infrastructure for DRP, for a critical machine holding an application based on Intersystem's Cache database.

We're trying to build a disaster recovery plan for a critical machine in our organization. The pysical machine is Windows 2003 server with a database named Cache (by Intersystems). The OS is insalled on the machine's mirrored disks, and the data volume is located on a SAN LUN (on EMC CX3-40).
We have an additional EMC CX3-20 that we intend to locate on a remote DR site.
The plan is to use P2V to make the production machine a VM, and then locate the VM machine on an EMC LUN.
Then, replicate this machine to the DR site, and put it on the remote CX3-20, and make sure the DR VM is synced with the production VM.

The idea is that if something bad happens to the production VM, we *manually* start the remote DR VM one. The RPO and RTO are not that critical. The RPO could be few hours, and the RTO is one day.

The questions are as follows:
A. What VMWARE infrastructure should we use to replicate the production VM machine to the VM on the DR site?
B. How do we sync between the two machines? Should it be done using VMWare infrastructure, or better using storage to storage replication tools (such as EMC's) ?
C. How can we make sure that if the production VM machine is not functioning anymore (and of course there are several scenarios of why this could happen), the DR VM *database* will be consistent?

I'll explain more about question C : I guess we could solve the 'consitency' question by going to hot backup mode on the production machine several times day(flushing all RAM to the disk etc.), and while in backup mode we could replicate the changes to the remote DR VM. But going to hot backup mode makes the production VM database work very slow. Is there a better way suggested by VMWARE for VM to VM replication, while making sure the database is consistent in the remote machine?

Who is Participating?

Improve company productivity with a Business Account.Sign Up

robocatConnect With a Mentor Commented:

VMWare Infrastructure (ESX) doesn't provide replication tools, so you need an external tool.

If your EMC box already has storage replication built-in, that's probably the best and cheapest way to go.

As far as database consistency goes, it depends on the capabilities of your EMC box. You could use a scenario like this:

- put database in hotbackup mode
- take a snapshot of the storage LUN on your EMC box
- take database out of hot backup mode
- replicate the storage snapshot to the remote site

This way that database will only be in hot backup mode for a few seconds.

itaymAuthor Commented:
Robocat, thanks for your answer.
I am aware of EMC's storage replication tool (and snapshot capabilities). But using these tools forces us going to hot backup mode. In this particular system, it may cause us some troubles, and I'm trying to avoid going to hot backup mode (unless there's no other choice).

I'm going back to VMWare's solutions: Isn't VMWare's Site Recovery Manager capable of replicating a production VM to a DR one? And if so, does it have to take the database to hot backup mode and then replicate it, or does SRM has other ways of replicating the entire VM (with the DB of course), while making sure the remote DR's database is consistent?

davismisbehavisConnect With a Mentor Commented:
The problem with SRM is that it's for full Site Recovery,  it's a full site disaster recovery option.  Although you say DR your dealing with a single server failure recovery.

You might want to take a look at Platespin protect block level replication.  Platespin introduces the concept of workload protection, i.e. OS, Data, whatever sits on the box

You wouldn't even have to P2V your physical server as it supports P2V, V2V, P2I, V2I,  we actively use it to protect mission critical SQL servers and replicate the physical server to a virtual machine each night.  In the event of a server failure we can start up last nights VM replica and have a pretty good RPO and a less than 1 hour RTO.  This doesn't sound like to much a problem for you as your RPO and RTO aren't to important.

The problem you would have on the EMC side is the fact your using Cache,  EMC Replication products utilise VSS for SQL in order to ensure DB consistency but I doubt they have a product to deal with Cache.

Some of the new vSphere features like fault tolerance and Data Recovery may well be what your looking for but that's a little bit off yet.
robocatConnect With a Mentor Commented:

Site Recovery Manager doesn't deal with replication, you still need a replication solution. VMWare recommends that you use some kind of storage replication.

If you don't want to put your database in hot backup mode (even if only for a few seconds) then there's no way to ensure 100% consistency on database level.

Fortunately, this often isn"t necessary. Taking a storage based snapshot and replicating this is called "crash consisten". This means that the replicated snapshot is as good as if the server had crashed at that time. Most databases have some kind of consistency checking & repair tool, to repair any damage to the database. We've had very good results replicating VMs this way.

Can you loose a bit of data this way ? Yes, but since there will always be some time between replications you"ll loose some data anyway.

itaymAuthor Commented:
Thanks for the answers, sorry for the delay.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.