Moving 2008 Server Failover Cluster Nodes into geographically different locations

Posted on 2014-03-18
Last Modified: 2014-03-27
We are currently building a secondary facility to act as our Hot Site for disaster recovery.  The secondary site is directly connected via fiber so speed and network connectivity is not an issue.   We currently have several 2 node 2008 Server failover clusters.  SQL and File Servers and Exchange.    We would like to physically move 1 node to facility B and keep 1 node at the primary data center in facility A.   We have A SAN that houses data at Facility A and we do Async replication to another SAN at facility B.   The data will likely only be 4 hours different at most at any time.   We know we will have to manually mount volumes and such if the primary facility gets destroyed.

Again, assuming we have line speeds and connectivity between the buildings they really can't tell they are 5 feed apart or 500 meteres apart.

With the above said.   I'm trying to see why this would be a good or bad idea.  At first glance, I see this a simple physical move of the box.   The IP infrastructure will all stay the same.  Again just thinking of the other building as an addition to our primary site just 500 meters away.

With our setup, is there any reason to invest in other software like DoubleTake Availability.  I've seen demos of this and looks very interesting, but almost looks like its own version of Microsoft Clustering.    Is this something we could leverage into our existing setup or is it completely unrelated and unnecessary.

Any info enlightening us on potential implications or pitfalls to certain issues splitting the cluster nodes into two facilities.

Question by:rdelrosario
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
LVL 20

Expert Comment

ID: 39939986
I want to make sure I am understand you question correctly.

You want to take your current cluster which as two nodes at one site and keep the existing cluster but place the nodes in two physically different sites and have the cluster failover to the other in a DR event...

Is this correct?

What version are you using Windows 2008 R2? What type of quorum model are you using?
LVL 28

Expert Comment

by:Ryan McCauley
ID: 39947924
DoubleTake appears to compete more with a feature like SQL AlwaysOn (mirroring to a second location with hot failover) or Oracle's Goldengate (real-time replication) than against traditional windows failover clustering. If you're just looking to move one of your two cluster nodes to a new location, you can do that as long as you maintain connectivity to the same SAN.

However, you mention a four-hour lag on your data at the second datacenter - are you intending to use this location as protection against user error or accidental deletions, rather than just a disaster recover/failover location? If so, then  you'd need to use something like Log Shipping (which can be done with a defined lag) or you'd need to use a product like Goldengate or DoubleTake to accomplish this. The native SQL mirroring and AlwaysOn will strive to keep the replica as current as possible.

If I'm misunderstanding your requirements, please add some detail around the lag you've mentioned.

Author Comment

ID: 39954304
To be clear, yes we have 2 node clusters.  We are moving 1 node to the failover site.   We will be only using the failover site if the primary site is unavailable - disaster - fire, data center flooded...   We are doing async replication with Dell Equalogic ISCSI SAN Array.   They are firing off Deltas every 4 hours between the two buildings.   The way I see it... the Equalogic can't be automated to mount to alternate servers in case of a windows cluster fail.  So the steps I see occuring are:

1.  Facility A blows up.
2.  run the Equalogic Admin Console to "promote" the latest aysnc'd volumes and present them to the cluster node available at the secondary site.  This is where we could be at most 4 hours behind on the deltas.
3.  My assumption is that the cluster will fail because the data won't be available, but we could in short order present the volumes in facility b to the other cluster node - perhaps having to force quorum - but we will likely use node and file share majority with file shares available from either site somewhere rented in the cloud.

Want to know if this is a viable common approach or is this just plain bad news.   We have limited budget which is why we are just moving a node rather than adding nodes in the cluster.
Does Powershell have you tied up in knots?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

LVL 28

Expert Comment

by:Ryan McCauley
ID: 39956712
I'd strongly encourage you to add a third node to the cluster - moving your second node to another data center will add disaster recovery, but you'll lose the advantages of HA that you have in your current configuration. With both cluster nodes in the same data center and connected to the same storage, you can do maintenance and you're protected from hardware failure. If you move one mode to another data center and replicate storage (with a delay of any kind) and require manual failover, you're now wide open to hardware failure and downtime on your primary node. Additionally, how will you apply OS/SQL patches with minimal downtime if your second node can't come online for maintenance?

You'll be adding DR, but at the cost of local HA you already have. If you can spare a few thousand dollars (servers are cheap in the scheme of things), you should consider getting a new node, installing it in the new data center, and then adding it the cluster as a third (passive) node. Then it's available for disaster recovery, but you still have two primary nodes that you can use when performing maintenance and handling unexpected failures.

Does that make sense? Your failover plan to the new data center makes sense (the steps sound right), to answer your question directly.

Author Comment

ID: 39958958
I believe I follow your email.  However, I may have left off an important piece of information.  That is both nodes are available as long as the fiber between the buildings is active.   Both nodes will have access to the REAL TIME DATA/LOGS on the LUNS on the primary SAN array in facility A (primary).   So in essence node 1 and 2 are physically attached to the sam equipment (SAN, SWITCHES) albeit through Fiber.  The only time it will act any different is if and when the other node is unreachable.  I assume doing maintenance patches and such.  It will work the same as always assuming no Disasaster has taken place.

So to reiterate.  NODE 1 and NODE 2 physically attach to the SAME LUNS on the SAME PRIMARY SAN located in facility A.   Yes node 2 will be in facility 2, but will still attach to the primary SAN (real time data) in Facility A.    -  Unless I'm missing something mainenenance can be had as usual.   HA still exists, but now depends on an additional point of failure (the fiber connection/transceiever and far end switch).

With the above clarified, am I missing something or does this now make this a common/recommended approach or still pose unrealistic high levels of risk.
LVL 28

Expert Comment

by:Ryan McCauley
ID: 39959208
Ah - I was thinking that you were going to move node B to a second facility with only replicated storage, and that was a huge concern. It sounds like you'll still have HA in place, but I'd encourage you to confirm that storage speed and network responsiveness will still be within tolerance when the node is located remotely to the SAN. If you've got dark fiber between the two data centers, you're probably well within tolerances, but you'll want to confirm it before it's needed.

That said, I'd still encourage you to add a third node in the remote location and leave the current pair intact in their current location - even though the sites are connected by fiber, you'll be without HA while you physically transport the node to the new site (potentially a few days, depending on how far it is), and then you'll have to do some failover testing ASAP once the node is racked and connected to ensure that you're not doing your failover testing when the primary node goes down and you need it.

Author Comment

ID: 39959266
Ok, from the sounds of it.  I assume you think this is an good approach aside from deploying a 3rd node (being a better option).

Note we are running OM4 Fiber building are only 300 meters apart.  We have each segment - heartbeat, cluster and other required segments on their own VLANS talking at 20Gbit over fiber.  Considering we are only using 1Gbit Ethernet cards - we have plenty of line speed capacity between buildings.

With that said.   -  any other worries other than what you stated?
LVL 28

Accepted Solution

Ryan McCauley earned 500 total points
ID: 39959620
Ah - 300M is nothing at all, distance-wise. I was under the impression you were taking it to another city, for some reason :) You're probably looking just at a couple of hours of HA risk by the time you unrack, transport, and reconnect the server in its new home.

Other than that, I think you're okay - you've considered normal patching schedules as well as what your DR plan would be if you lost the primary SAN, so I think you're covered.

Author Closing Comment

ID: 39959673
Quick response

Featured Post

Do you have a plan for Continuity?

It's inevitable. People leave organizations creating a gap in your service. That's where Percona comes in.

See how relies on Percona to:
-Manage their database
-Guarantee data safety and protection
-Provide database expertise that is available for any situation

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When speed and performance are vital to revenue, companies must have complete confidence in their cloud environment.
For anyone that has accidentally used newSID with Server 2008 R2 (like I did) and hasn't been able to get the server running again because you were unlucky (as I was) and had no backups - I was able to get things working by doing a Registry Hive rec…
This tutorial will walk an individual through configuring a drive on a Windows Server 2008 to perform shadow copies in order to quickly recover deleted files and folders. Click on Start and then select Computer to view the available drives on the se…
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…

628 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question