VMware: Site Recovery Manager

We currently have 3 ESXi hosts, vCenter 5.0, 40 VMs, all sitting on top of our NetApp in our office. We also have the same Stand-By environment in our Las Vegas datacenter. We use NetApp's SnapMirror replication to replicate the volumes.

In a disaster, our DR plan would currently break the replication between the NetApps, convert the volumes from Read-Only to normal volumes, and then start registering the VMs at the DR site.

A lot of manual labor.

We are now looking into VMware Site Recovery Manager. I reviewed this brief video, but it doesn't really tell you the details of SRM. http://www.youtube.com/watch?v=OBQNuT_z9ro

Questions:
1. How does SRM replicate the VMs to the DR -- vSphere Replication or SAN (NetApp) Replication? (In the video, around 7:00 min mark, they mention vSphere Replication)

2. Does SRM failover automatically, or does a human have to initiate the failover process?

3. How does SRM integrate with NetApp? (We don't want to re-invent the wheel)

4. Any helpful links to learn more about SRM and how it integrates with NetApp?
LVL 8
pzozulkaAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Paul SolovyovskySenior IT AdvisorCommented:
Questions:
1. How does SRM replicate the VMs to the DR -- vSphere Replication or SAN (NetApp) Replication? (In the video, around 7:00 min mark, they mention vSphere Replication)

2. Does SRM failover automatically, or does a human have to initiate the failover process?

-Yes, but this is what you want since system doesn't know if interruption is a 10 min. outage or a hole in the ground of your HQ


3. How does SRM integrate with NetApp? (We don't want to re-invent the wheel)

YES, very well.  It uses SRA (middlewhere provided by Netapp) to integrate Netapp and will perform automated faiilover, fail-back, and bubble testing.  

4. Any helpful links to learn more about SRM and how it integrates with NetApp?

http://www.globbtv.com/microsite/18/Adjuntos/TR-4064.PDF

TR-4064 Netapp doc.

I would recommend upgrading vmware to 5.1 and installing SRM 5.1 as it it more stable than 5.0
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
asavenerCommented:
1. How does SRM replicate the VMs to the DR -- vSphere Replication or SAN (NetApp) Replication? (In the video, around 7:00 min mark, they mention vSphere Replication)
The data is replicated using the SnapMirror, but SRM has a driver that will control he SnapMirror process.  Stub VMs are placed on the hosts/vCenter at the recovery site.


2. Does SRM failover automatically, or does a human have to initiate the failover process?
Manually.  It has a planned migration, where the VMs are shut down, migrated, and restarted and unplanned migration where it brings the replicated machines online in whatever state they were in.

3. How does SRM integrate with NetApp? (We don't want to re-invent the wheel)
You have to install the NetApp SnapMirror driver in the SRM product.  Then you do most of your configuration through SRM.
0
Paul SolovyovskySenior IT AdvisorCommented:
You want to make sure that you have flex clone license on the Netapp controllers, otherwise you will not be able to test easily.
0
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

johnkerry8652Commented:
SRM does not fail over automatically. you have to carry this out as a manual process, as per the above advice.

The VMWARE SRM product is very good.
One of the downsides though is that in Version 4 of SRM, you had SRM placeholders (above these were called STUBS) which show up in your list of VM's that are running in your 2nd vCenter, which shows you >all< of the VMs contained within this 2nd Datacenter, which includes all of these stubs (for SRM VMs) but powered off, as well as showing you all the live VMs (powered on) running on this vCenter.

The problem with the version 4 of SRM is that these SRM placeholders appear and look like they are your original VMs, therefore making it difficult (from your 2nd vCenter) to actually know which VMs are real VM's and which VMs are part of SRM (unless you have all of these written down .) This minor "issue" with SRM has been resolved with SRM 5.x

I would take the above advice, as of August 2013, ensure your vCenter is using VMware 5.1 update 1b and then introduce and test SRM 5.1. All of these V5.x products are better than the 4.x products.

SRM licenses have to be purchased from VMware and arranged to allow a specific number of VMs to be covered by the SRM process e.g SRM licenses purchased for 25 VMs, for 50 VMs etc.

SRM also attaches itself inside your VMware vCenter Interface. It is a nice product and its something that's really worth thinking about.

The good thing about it is that it can be tested, to show you what happens when the SRM is activated, however. as above, you do have to make the decision to envoke SRM - it's a manual process and this is probably a good idea e.g. we are moving to the 2nd site or to the 2nd computer room (due to the event x) and we are doing this NOW !

As per your note, SRM software can be configured to work with a number of storage products: Dell Equallogic, Netapp and others. These use replicated datastores, to ensure that you have a reasonably up-to-date set of VMs are available to you, for the time when you need to invoke SRM. You just need to keep an eye on what VMs you are going to need in each location, to count up the number of licenses, ensure that enough storage space is available to replicate VMs onto the 2nd site.
0
asavenerCommented:
Migrations are also performed on a per-datastore basis, not a per-VM basis.  

(We have a dedicated datastore for periodically testing (and training new admins on) the migration process.)
0
pzozulkaAuthor Commented:
What would happen if for some reason our primary (protected) site goes down completely where I don't have the ability to start the DR process manually from there, and cannot gracefully shut down the VMs.
0
Paul SolovyovskySenior IT AdvisorCommented:
The point of SRM is that you start the recovery from the DR site if there's a full system outage.  Keep in mind that this is not for business continuity purposes since once you start using the DR site your primary site data is now invalid as only the data in the DR is being updated so you would have to replicate your data back to the primary site for it work while you're running your environment from DR.  This can be done in parallel.
0
asavenerCommented:
What would happen if for some reason our primary (protected) site goes down completely where I don't have the ability to start the DR process manually from there, and cannot gracefully shut down the VMs.
That would be an "unplanned migration."

You perform the migration at the recovery site (you have to have a vCenter and SRM server at both the protected and the recovery sites).

The VMs are booted from the replicated LUNs/Datastores.  Essentially, it's the same thing that happens when a host fails and HA moves the VM to a new host and boots it.

There are some IP address issues to consider.  We force all protected VMs to use DHCP (with reservations) to avoid having to manually change the IPs of the migrated guests.
0
pzozulkaAuthor Commented:
Thanks. Although all of our VMs have static IPs set on each individual server, our stand-by (recovery) site has a cloned network topology setup. In other words, the recovery site, has the same VLANs and subnets setup on all swithes, as well as all ESXi virtual switches. So I don't think IP addresses should be a problem.
0
asavenerCommented:
We considered that approach, but it limits our ability to perform functional tests of the failover/recovery process.
0
pzozulkaAuthor Commented:
How so?
0
asavenerCommented:
If we were to migrate one or more LUNs to the recovery site, the VMs on the LUN would be isolated because the routing to the original site would still be in active.
0
Paul SolovyovskySenior IT AdvisorCommented:
what we have done for our customer is create several vlans on the DR site for this purpose, this allows us to keep the same IPs and avoid issues but also allows us to perform bubble testing that will mount the VMs on the test VLAN and allow from multiple hosts to connect to the same VLANs and talk to each other.  We can then configure the firewall for the same VLAN at the DR site and perform full scale testing in the bubble.
0
asavenerCommented:
I don't disagree with that approach, necessarily.  When we designed this, one of the decisions made was to make it as independent of the networking team as possible.

Both approaches are valid; it just depends on what your requirements are.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.