Disaster Recovery Using VMWare ESX
Posted on 2009-05-26
I'm currently implementing a DR solution using VMWare ESX.
We have 15 VM's which run on our 2-host ESX cluster. Four of these are 'critical' - Exchange 2007, a DC, web server and file server, and I am going to replicate these to our DR site just across the road. The DR site has an ESX host attached to it's own iSCSI SAN (same setup as primary site).
I'm using esXpress from PHD to back up the critical VM's (both system and data disks/vmdks) to an NFS target. It's installed on both production ESX hosts and on the 3rd 'DR' esx host.
The backups run 3 times a day. As the backups are block-level rather than incremental or full, they complete pretty quickly. The 3rd ESX host is set to restore those backups from the NFS target on a schedule, so at the DR site I have a working copy of my four critical VM's.
My question is less VMWare-related and more AD/Exchange related. In the event of one or more critical VM's going down, the plan is to simply log onto the 3rd ESX host and connect the replicated VM's virtual network adapter to the network. As the replicated machines have the same IP's, names etc (they are an exact copy), I'm pretty sure this should work, certainly in the case of the file server. My question is, am I likely to run into issues bringing up DC's and Exchange servers that are snapshots from a few hours previously?. From an AD point of view, we have a second physical DC, so DC that I bring up at the DR site should be bought up to date via AD replication - it's just Exchange 2007 I'm not sure about now.
For example, one thing I'm unsure about is that the clock on the replicated VM will be a few hours behind the actual time when I first bring it up - ie at 15.00 the main Exchange server goes down, I bring up the replicated Exchange VM, which is from the backup that ran at 0900, therefore the system clock initially will be somewhere around 09:00. Once it synchs with the domain the clock should roll forward, but is that likely to cause issues with logging etc?
Can anyone see any other potential flaws in this plan..? Obviously I'm going to test it all, but it would be nice to be aware of any potemtial pitfalls at the outset.
Thanks for your time.