Go Premium for a chance to win a PS4. Enter to Win


Moving 2008 Server Failover Cluster Nodes into geographically different locations

Posted on 2014-03-18
Medium Priority
Last Modified: 2014-03-27
We are currently building a secondary facility to act as our Hot Site for disaster recovery.  The secondary site is directly connected via fiber so speed and network connectivity is not an issue.   We currently have several 2 node 2008 Server failover clusters.  SQL and File Servers and Exchange.    We would like to physically move 1 node to facility B and keep 1 node at the primary data center in facility A.   We have A SAN that houses data at Facility A and we do Async replication to another SAN at facility B.   The data will likely only be 4 hours different at most at any time.   We know we will have to manually mount volumes and such if the primary facility gets destroyed.

Again, assuming we have line speeds and connectivity between the buildings they really can't tell they are 5 feed apart or 500 meteres apart.

With the above said.   I'm trying to see why this would be a good or bad idea.  At first glance, I see this a simple physical move of the box.   The IP infrastructure will all stay the same.  Again just thinking of the other building as an addition to our primary site just 500 meters away.

With our setup, is there any reason to invest in other software like DoubleTake Availability.  I've seen demos of this and looks very interesting, but almost looks like its own version of Microsoft Clustering.    Is this something we could leverage into our existing setup or is it completely unrelated and unnecessary.

Any info enlightening us on potential implications or pitfalls to certain issues splitting the cluster nodes into two facilities.

Question by:rdelrosario
  • 4
  • 4
LVL 20

Expert Comment

ID: 39939986
I want to make sure I am understand you question correctly.

You want to take your current cluster which as two nodes at one site and keep the existing cluster but place the nodes in two physically different sites and have the cluster failover to the other in a DR event...

Is this correct?

What version are you using Windows 2008 R2? What type of quorum model are you using?
LVL 28

Expert Comment

by:Ryan McCauley
ID: 39947924
DoubleTake appears to compete more with a feature like SQL AlwaysOn (mirroring to a second location with hot failover) or Oracle's Goldengate (real-time replication) than against traditional windows failover clustering. If you're just looking to move one of your two cluster nodes to a new location, you can do that as long as you maintain connectivity to the same SAN.

However, you mention a four-hour lag on your data at the second datacenter - are you intending to use this location as protection against user error or accidental deletions, rather than just a disaster recover/failover location? If so, then  you'd need to use something like Log Shipping (which can be done with a defined lag) or you'd need to use a product like Goldengate or DoubleTake to accomplish this. The native SQL mirroring and AlwaysOn will strive to keep the replica as current as possible.

If I'm misunderstanding your requirements, please add some detail around the lag you've mentioned.

Author Comment

ID: 39954304
To be clear, yes we have 2 node clusters.  We are moving 1 node to the failover site.   We will be only using the failover site if the primary site is unavailable - disaster - fire, data center flooded...   We are doing async replication with Dell Equalogic ISCSI SAN Array.   They are firing off Deltas every 4 hours between the two buildings.   The way I see it... the Equalogic can't be automated to mount to alternate servers in case of a windows cluster fail.  So the steps I see occuring are:

1.  Facility A blows up.
2.  run the Equalogic Admin Console to "promote" the latest aysnc'd volumes and present them to the cluster node available at the secondary site.  This is where we could be at most 4 hours behind on the deltas.
3.  My assumption is that the cluster will fail because the data won't be available, but we could in short order present the volumes in facility b to the other cluster node - perhaps having to force quorum - but we will likely use node and file share majority with file shares available from either site somewhere rented in the cloud.

Want to know if this is a viable common approach or is this just plain bad news.   We have limited budget which is why we are just moving a node rather than adding nodes in the cluster.
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 28

Expert Comment

by:Ryan McCauley
ID: 39956712
I'd strongly encourage you to add a third node to the cluster - moving your second node to another data center will add disaster recovery, but you'll lose the advantages of HA that you have in your current configuration. With both cluster nodes in the same data center and connected to the same storage, you can do maintenance and you're protected from hardware failure. If you move one mode to another data center and replicate storage (with a delay of any kind) and require manual failover, you're now wide open to hardware failure and downtime on your primary node. Additionally, how will you apply OS/SQL patches with minimal downtime if your second node can't come online for maintenance?

You'll be adding DR, but at the cost of local HA you already have. If you can spare a few thousand dollars (servers are cheap in the scheme of things), you should consider getting a new node, installing it in the new data center, and then adding it the cluster as a third (passive) node. Then it's available for disaster recovery, but you still have two primary nodes that you can use when performing maintenance and handling unexpected failures.

Does that make sense? Your failover plan to the new data center makes sense (the steps sound right), to answer your question directly.

Author Comment

ID: 39958958
I believe I follow your email.  However, I may have left off an important piece of information.  That is both nodes are available as long as the fiber between the buildings is active.   Both nodes will have access to the REAL TIME DATA/LOGS on the LUNS on the primary SAN array in facility A (primary).   So in essence node 1 and 2 are physically attached to the sam equipment (SAN, SWITCHES) albeit through Fiber.  The only time it will act any different is if and when the other node is unreachable.  I assume doing maintenance patches and such.  It will work the same as always assuming no Disasaster has taken place.

So to reiterate.  NODE 1 and NODE 2 physically attach to the SAME LUNS on the SAME PRIMARY SAN located in facility A.   Yes node 2 will be in facility 2, but will still attach to the primary SAN (real time data) in Facility A.    -  Unless I'm missing something mainenenance can be had as usual.   HA still exists, but now depends on an additional point of failure (the fiber connection/transceiever and far end switch).

With the above clarified, am I missing something or does this now make this a common/recommended approach or still pose unrealistic high levels of risk.
LVL 28

Expert Comment

by:Ryan McCauley
ID: 39959208
Ah - I was thinking that you were going to move node B to a second facility with only replicated storage, and that was a huge concern. It sounds like you'll still have HA in place, but I'd encourage you to confirm that storage speed and network responsiveness will still be within tolerance when the node is located remotely to the SAN. If you've got dark fiber between the two data centers, you're probably well within tolerances, but you'll want to confirm it before it's needed.

That said, I'd still encourage you to add a third node in the remote location and leave the current pair intact in their current location - even though the sites are connected by fiber, you'll be without HA while you physically transport the node to the new site (potentially a few days, depending on how far it is), and then you'll have to do some failover testing ASAP once the node is racked and connected to ensure that you're not doing your failover testing when the primary node goes down and you need it.

Author Comment

ID: 39959266
Ok, from the sounds of it.  I assume you think this is an good approach aside from deploying a 3rd node (being a better option).

Note we are running OM4 Fiber building are only 300 meters apart.  We have each segment - heartbeat, cluster and other required segments on their own VLANS talking at 20Gbit over fiber.  Considering we are only using 1Gbit Ethernet cards - we have plenty of line speed capacity between buildings.

With that said.   -  any other worries other than what you stated?
LVL 28

Accepted Solution

Ryan McCauley earned 2000 total points
ID: 39959620
Ah - 300M is nothing at all, distance-wise. I was under the impression you were taking it to another city, for some reason :) You're probably looking just at a couple of hours of HA risk by the time you unrack, transport, and reconnect the server in its new home.

Other than that, I think you're okay - you've considered normal patching schedules as well as what your DR plan would be if you lost the primary SAN, so I think you're covered.

Author Closing Comment

ID: 39959673
Quick response

Featured Post

Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Resolving an irritating Remote Desktop connection that stops your saved credentials from being used.
Mailbox Corruption is a nightmare every Exchange DBA wishes he never has. Recovering from it can be super-hectic if not entirely futile. And though techniques like the New-MailboxRepairRequest cmdlet have been designed to help with fixing minor corr…
This tutorial will give a short introduction and overview of Backup Exec 2012 and how to navigate and perform basic functions. Click on the Backup Exec button in the upper left corner. From here, are global settings for the application such as conne…
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…

971 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question