Solved

Moving 2008 Server Failover Cluster Nodes into geographically different locations

Posted on 2014-03-18
9
421 Views
Last Modified: 2014-03-27
We are currently building a secondary facility to act as our Hot Site for disaster recovery.  The secondary site is directly connected via fiber so speed and network connectivity is not an issue.   We currently have several 2 node 2008 Server failover clusters.  SQL and File Servers and Exchange.    We would like to physically move 1 node to facility B and keep 1 node at the primary data center in facility A.   We have A SAN that houses data at Facility A and we do Async replication to another SAN at facility B.   The data will likely only be 4 hours different at most at any time.   We know we will have to manually mount volumes and such if the primary facility gets destroyed.

Again, assuming we have line speeds and connectivity between the buildings they really can't tell they are 5 feed apart or 500 meteres apart.

With the above said.   I'm trying to see why this would be a good or bad idea.  At first glance, I see this a simple physical move of the box.   The IP infrastructure will all stay the same.  Again just thinking of the other building as an addition to our primary site just 500 meters away.

With our setup, is there any reason to invest in other software like DoubleTake Availability.  I've seen demos of this and looks very interesting, but almost looks like its own version of Microsoft Clustering.    Is this something we could leverage into our existing setup or is it completely unrelated and unnecessary.

Any info enlightening us on potential implications or pitfalls to certain issues splitting the cluster nodes into two facilities.

Thanks
0
Comment
Question by:rdelrosario
  • 4
  • 4
9 Comments
 
LVL 19

Expert Comment

by:compdigit44
ID: 39939986
I want to make sure I am understand you question correctly.

You want to take your current cluster which as two nodes at one site and keep the existing cluster but place the nodes in two physically different sites and have the cluster failover to the other in a DR event...

Is this correct?

What version are you using Windows 2008 R2? What type of quorum model are you using?
0
 
LVL 28

Expert Comment

by:Ryan McCauley
ID: 39947924
DoubleTake appears to compete more with a feature like SQL AlwaysOn (mirroring to a second location with hot failover) or Oracle's Goldengate (real-time replication) than against traditional windows failover clustering. If you're just looking to move one of your two cluster nodes to a new location, you can do that as long as you maintain connectivity to the same SAN.

However, you mention a four-hour lag on your data at the second datacenter - are you intending to use this location as protection against user error or accidental deletions, rather than just a disaster recover/failover location? If so, then  you'd need to use something like Log Shipping (which can be done with a defined lag) or you'd need to use a product like Goldengate or DoubleTake to accomplish this. The native SQL mirroring and AlwaysOn will strive to keep the replica as current as possible.

If I'm misunderstanding your requirements, please add some detail around the lag you've mentioned.
0
 

Author Comment

by:rdelrosario
ID: 39954304
To be clear, yes we have 2 node clusters.  We are moving 1 node to the failover site.   We will be only using the failover site if the primary site is unavailable - disaster - fire, data center flooded...   We are doing async replication with Dell Equalogic ISCSI SAN Array.   They are firing off Deltas every 4 hours between the two buildings.   The way I see it... the Equalogic can't be automated to mount to alternate servers in case of a windows cluster fail.  So the steps I see occuring are:

1.  Facility A blows up.
2.  run the Equalogic Admin Console to "promote" the latest aysnc'd volumes and present them to the cluster node available at the secondary site.  This is where we could be at most 4 hours behind on the deltas.
3.  My assumption is that the cluster will fail because the data won't be available, but we could in short order present the volumes in facility b to the other cluster node - perhaps having to force quorum - but we will likely use node and file share majority with file shares available from either site somewhere rented in the cloud.

Want to know if this is a viable common approach or is this just plain bad news.   We have limited budget which is why we are just moving a node rather than adding nodes in the cluster.
0
Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

 
LVL 28

Expert Comment

by:Ryan McCauley
ID: 39956712
I'd strongly encourage you to add a third node to the cluster - moving your second node to another data center will add disaster recovery, but you'll lose the advantages of HA that you have in your current configuration. With both cluster nodes in the same data center and connected to the same storage, you can do maintenance and you're protected from hardware failure. If you move one mode to another data center and replicate storage (with a delay of any kind) and require manual failover, you're now wide open to hardware failure and downtime on your primary node. Additionally, how will you apply OS/SQL patches with minimal downtime if your second node can't come online for maintenance?

You'll be adding DR, but at the cost of local HA you already have. If you can spare a few thousand dollars (servers are cheap in the scheme of things), you should consider getting a new node, installing it in the new data center, and then adding it the cluster as a third (passive) node. Then it's available for disaster recovery, but you still have two primary nodes that you can use when performing maintenance and handling unexpected failures.

Does that make sense? Your failover plan to the new data center makes sense (the steps sound right), to answer your question directly.
0
 

Author Comment

by:rdelrosario
ID: 39958958
I believe I follow your email.  However, I may have left off an important piece of information.  That is both nodes are available as long as the fiber between the buildings is active.   Both nodes will have access to the REAL TIME DATA/LOGS on the LUNS on the primary SAN array in facility A (primary).   So in essence node 1 and 2 are physically attached to the sam equipment (SAN, SWITCHES) albeit through Fiber.  The only time it will act any different is if and when the other node is unreachable.  I assume doing maintenance patches and such.  It will work the same as always assuming no Disasaster has taken place.

So to reiterate.  NODE 1 and NODE 2 physically attach to the SAME LUNS on the SAME PRIMARY SAN located in facility A.   Yes node 2 will be in facility 2, but will still attach to the primary SAN (real time data) in Facility A.    -  Unless I'm missing something mainenenance can be had as usual.   HA still exists, but now depends on an additional point of failure (the fiber connection/transceiever and far end switch).

With the above clarified, am I missing something or does this now make this a common/recommended approach or still pose unrealistic high levels of risk.
0
 
LVL 28

Expert Comment

by:Ryan McCauley
ID: 39959208
Ah - I was thinking that you were going to move node B to a second facility with only replicated storage, and that was a huge concern. It sounds like you'll still have HA in place, but I'd encourage you to confirm that storage speed and network responsiveness will still be within tolerance when the node is located remotely to the SAN. If you've got dark fiber between the two data centers, you're probably well within tolerances, but you'll want to confirm it before it's needed.

That said, I'd still encourage you to add a third node in the remote location and leave the current pair intact in their current location - even though the sites are connected by fiber, you'll be without HA while you physically transport the node to the new site (potentially a few days, depending on how far it is), and then you'll have to do some failover testing ASAP once the node is racked and connected to ensure that you're not doing your failover testing when the primary node goes down and you need it.
0
 

Author Comment

by:rdelrosario
ID: 39959266
Ok, from the sounds of it.  I assume you think this is an good approach aside from deploying a 3rd node (being a better option).

Note we are running OM4 Fiber building are only 300 meters apart.  We have each segment - heartbeat, cluster and other required segments on their own VLANS talking at 20Gbit over fiber.  Considering we are only using 1Gbit Ethernet cards - we have plenty of line speed capacity between buildings.

With that said.   -  any other worries other than what you stated?
0
 
LVL 28

Accepted Solution

by:
Ryan McCauley earned 500 total points
ID: 39959620
Ah - 300M is nothing at all, distance-wise. I was under the impression you were taking it to another city, for some reason :) You're probably looking just at a couple of hours of HA risk by the time you unrack, transport, and reconnect the server in its new home.

Other than that, I think you're okay - you've considered normal patching schedules as well as what your DR plan would be if you lost the primary SAN, so I think you're covered.
0
 

Author Closing Comment

by:rdelrosario
ID: 39959673
Quick response
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you ever consider purchasing any Daossoft Software Products, DON'T expect any meaningful support - This article should convince you why!
Learn how the use of a bunch of disparate tools requiring a lot of manual attention led to a series of unfortunate backup events for one company.
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question