Moving 2008 Server Failover Cluster Nodes into geographically different locations

We are currently building a secondary facility to act as our Hot Site for disaster recovery.  The secondary site is directly connected via fiber so speed and network connectivity is not an issue.   We currently have several 2 node 2008 Server failover clusters.  SQL and File Servers and Exchange.    We would like to physically move 1 node to facility B and keep 1 node at the primary data center in facility A.   We have A SAN that houses data at Facility A and we do Async replication to another SAN at facility B.   The data will likely only be 4 hours different at most at any time.   We know we will have to manually mount volumes and such if the primary facility gets destroyed.

Again, assuming we have line speeds and connectivity between the buildings they really can't tell they are 5 feed apart or 500 meteres apart.

With the above said.   I'm trying to see why this would be a good or bad idea.  At first glance, I see this a simple physical move of the box.   The IP infrastructure will all stay the same.  Again just thinking of the other building as an addition to our primary site just 500 meters away.

With our setup, is there any reason to invest in other software like DoubleTake Availability.  I've seen demos of this and looks very interesting, but almost looks like its own version of Microsoft Clustering.    Is this something we could leverage into our existing setup or is it completely unrelated and unnecessary.

Any info enlightening us on potential implications or pitfalls to certain issues splitting the cluster nodes into two facilities.

Who is Participating?
Ryan McCauleyData and Analytics ManagerCommented:
Ah - 300M is nothing at all, distance-wise. I was under the impression you were taking it to another city, for some reason :) You're probably looking just at a couple of hours of HA risk by the time you unrack, transport, and reconnect the server in its new home.

Other than that, I think you're okay - you've considered normal patching schedules as well as what your DR plan would be if you lost the primary SAN, so I think you're covered.
I want to make sure I am understand you question correctly.

You want to take your current cluster which as two nodes at one site and keep the existing cluster but place the nodes in two physically different sites and have the cluster failover to the other in a DR event...

Is this correct?

What version are you using Windows 2008 R2? What type of quorum model are you using?
Ryan McCauleyData and Analytics ManagerCommented:
DoubleTake appears to compete more with a feature like SQL AlwaysOn (mirroring to a second location with hot failover) or Oracle's Goldengate (real-time replication) than against traditional windows failover clustering. If you're just looking to move one of your two cluster nodes to a new location, you can do that as long as you maintain connectivity to the same SAN.

However, you mention a four-hour lag on your data at the second datacenter - are you intending to use this location as protection against user error or accidental deletions, rather than just a disaster recover/failover location? If so, then  you'd need to use something like Log Shipping (which can be done with a defined lag) or you'd need to use a product like Goldengate or DoubleTake to accomplish this. The native SQL mirroring and AlwaysOn will strive to keep the replica as current as possible.

If I'm misunderstanding your requirements, please add some detail around the lag you've mentioned.
Network Scalability - Handle Complex Environments

Monitor your entire network from a single platform. Free 30 Day Trial Now!

rdelrosarioAuthor Commented:
To be clear, yes we have 2 node clusters.  We are moving 1 node to the failover site.   We will be only using the failover site if the primary site is unavailable - disaster - fire, data center flooded...   We are doing async replication with Dell Equalogic ISCSI SAN Array.   They are firing off Deltas every 4 hours between the two buildings.   The way I see it... the Equalogic can't be automated to mount to alternate servers in case of a windows cluster fail.  So the steps I see occuring are:

1.  Facility A blows up.
2.  run the Equalogic Admin Console to "promote" the latest aysnc'd volumes and present them to the cluster node available at the secondary site.  This is where we could be at most 4 hours behind on the deltas.
3.  My assumption is that the cluster will fail because the data won't be available, but we could in short order present the volumes in facility b to the other cluster node - perhaps having to force quorum - but we will likely use node and file share majority with file shares available from either site somewhere rented in the cloud.

Want to know if this is a viable common approach or is this just plain bad news.   We have limited budget which is why we are just moving a node rather than adding nodes in the cluster.
Ryan McCauleyData and Analytics ManagerCommented:
I'd strongly encourage you to add a third node to the cluster - moving your second node to another data center will add disaster recovery, but you'll lose the advantages of HA that you have in your current configuration. With both cluster nodes in the same data center and connected to the same storage, you can do maintenance and you're protected from hardware failure. If you move one mode to another data center and replicate storage (with a delay of any kind) and require manual failover, you're now wide open to hardware failure and downtime on your primary node. Additionally, how will you apply OS/SQL patches with minimal downtime if your second node can't come online for maintenance?

You'll be adding DR, but at the cost of local HA you already have. If you can spare a few thousand dollars (servers are cheap in the scheme of things), you should consider getting a new node, installing it in the new data center, and then adding it the cluster as a third (passive) node. Then it's available for disaster recovery, but you still have two primary nodes that you can use when performing maintenance and handling unexpected failures.

Does that make sense? Your failover plan to the new data center makes sense (the steps sound right), to answer your question directly.
rdelrosarioAuthor Commented:
I believe I follow your email.  However, I may have left off an important piece of information.  That is both nodes are available as long as the fiber between the buildings is active.   Both nodes will have access to the REAL TIME DATA/LOGS on the LUNS on the primary SAN array in facility A (primary).   So in essence node 1 and 2 are physically attached to the sam equipment (SAN, SWITCHES) albeit through Fiber.  The only time it will act any different is if and when the other node is unreachable.  I assume doing maintenance patches and such.  It will work the same as always assuming no Disasaster has taken place.

So to reiterate.  NODE 1 and NODE 2 physically attach to the SAME LUNS on the SAME PRIMARY SAN located in facility A.   Yes node 2 will be in facility 2, but will still attach to the primary SAN (real time data) in Facility A.    -  Unless I'm missing something mainenenance can be had as usual.   HA still exists, but now depends on an additional point of failure (the fiber connection/transceiever and far end switch).

With the above clarified, am I missing something or does this now make this a common/recommended approach or still pose unrealistic high levels of risk.
Ryan McCauleyData and Analytics ManagerCommented:
Ah - I was thinking that you were going to move node B to a second facility with only replicated storage, and that was a huge concern. It sounds like you'll still have HA in place, but I'd encourage you to confirm that storage speed and network responsiveness will still be within tolerance when the node is located remotely to the SAN. If you've got dark fiber between the two data centers, you're probably well within tolerances, but you'll want to confirm it before it's needed.

That said, I'd still encourage you to add a third node in the remote location and leave the current pair intact in their current location - even though the sites are connected by fiber, you'll be without HA while you physically transport the node to the new site (potentially a few days, depending on how far it is), and then you'll have to do some failover testing ASAP once the node is racked and connected to ensure that you're not doing your failover testing when the primary node goes down and you need it.
rdelrosarioAuthor Commented:
Ok, from the sounds of it.  I assume you think this is an good approach aside from deploying a 3rd node (being a better option).

Note we are running OM4 Fiber building are only 300 meters apart.  We have each segment - heartbeat, cluster and other required segments on their own VLANS talking at 20Gbit over fiber.  Considering we are only using 1Gbit Ethernet cards - we have plenty of line speed capacity between buildings.

With that said.   -  any other worries other than what you stated?
rdelrosarioAuthor Commented:
Quick response
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.