Exchange 2010 File Share Witness and Alternate FSW

Situation:  
•      Exchange 2010 needs to be able to failover to a secondary datacenter, if the primary datacenter loses power or experiences some kind of catastrophic failure.

Current Scenario:
•      Four VM mailbox servers (A, B, C, D)
•      Two VM CAS servers (Cas-A, Cas-B)
•      Two hardware network load balancers (NLB1, NLB2)
•      All MBX servers, CAS servers and NLBs are in one AD site.
•      One DC, NLB2, Cas-B and MBX server “D” are all running in the secondary datacenter.
•      Secondary datacenter is connected to the primary datacenter via a 1GB fiber connection.
•      Currently running a DAG that consists of one active DB on each MBX server and three passive DBs on each MBX server.  The three passive DBs consist of one copy each of the other three servers.
•      File Share Witness (FSW) is currently set to Cas-A in the primary datacenter, per what I understand to be the best practice of putting your FSW in the physical location that contains the greatest number of MBX servers.
•      Additionally, the IT Director has mandated that the FSW remain in the primary datacenter due to the likelihood of the link between the

Problem:
•      I can set the FSW to CAS-B in the secondary datacenter.  If I do this and the primary datacenter goes down, then nothing else needs to be done.  The DBs in the DAG will failover to server D and both NLB2 and Cas-B will handle all the client traffic.
•      However, if I set the FSW to the secondary datacenter and the connection between the two goes down (disaster, fiber cut, etc), then we lose the ability to connect to the DBs in the primary datacenter.
•      I know I can simply change the FSW to Cas-A and it will work.  However, if I do this and the secondary datacenter comes back up, then I am going to wind up with the system in a “split-brain” scenario.
•      So…I can set an Alternate FSW.  But if I do that, then it is my understanding that I need to enable the Datacenter Activation Coordination (DAC) mode, so that when the primary datacenter comes back up, I don’t wind up with a “split-brain” scenario.  I would do that with this PS command:
•      Set-DatabaseAvailabilityGroup –Identity DAG1 –DatacenterActivationMode DagOnly

Okay…so the steps I have been able to piece together for using the Alternate Witness Server in case the primary datacenter goes down are as follows.  Please correct me if they’re wrong:
1.      Stop the clustering service on Server D
2.      Run this PS command from Server D:  
a.      Stop-DatabaseAvailabilityGroup -Identity DAG1 -Mailboxserver ServerA –Configurationonly
b.      Stop-DatabaseAvailabilityGroup -Identity DAG1 -Mailboxserver ServerB –Configurationonly
c.      Stop-DatabaseAvailabilityGroup -Identity DAG1 -Mailboxserver ServerC -Configurationonly
3.      Run this PS command from Server D:  
a.      Restore-DatabaseAvailabilityGroup -Identity DAG1 -Mailboxserver ServerD –Configurationonly

4.      Mount the databases on Server D.

At this point, all DBs should be running on Server D.

To fail back to the primary datacenter, I will need to:
1.      Bring up Servers A, B, and C.
2.      Connect to one of the servers.
3.      Run these PS commands:
a.      Move-ActiveMailboxDatabase ExchDB1 -ActivateOnServer ServerA -MountDialOverride: GoodAvailability
b.      Move-ActiveMailboxDatabase ExchDB2 -ActivateOnServer ServerB -MountDialOverride: GoodAvailability
c.      Move-ActiveMailboxDatabase ExchDB3 -ActivateOnServer ServerC -MountDialOverride: GoodAvailability
4.      Restart the databases on the primary servers (A, B, C).
5.      Disable the Activation bit on the databases on Server D:
a.      Suspend-MailboxDatabaseCopy -Identity ExchDB1\ServerA -ActivationOnly
b.      Suspend-MailboxDatabaseCopy -Identity ExchDB2\ServerB -ActivationOnly
c.      Suspend-MailboxDatabaseCopy -Identity ExchDB3\ServerC -ActivationOnly

So what do you folks think?  Am I overthinking it or did I get it right?

Thank you,

Jim
Captain TactInfrastructure Operations, SeniorAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

AkhaterSolutions ArchitectCommented:
1) You should know that DAG does NOT take into consideration links failures so, no matter what you do, if the link between the main and the secondary datacenter goes down one of the two will go down (the one without FSW)

2) an Alternate Share Witness will NOT solve your problem, and Alternate Share witness becomes effective only when you perform a datacenter switch over, it is NOT used automatically

3) you should enable DAC mode in all cases
Captain TactInfrastructure Operations, SeniorAuthor Commented:
@Akhater - Yes, I was aware of point #1, that's why I was proposing setting the FSW on the CAS server in the primary datacenter and setting the Alternate FSW on the CAS server in the secondary datacenter.

I know the Alternate is not an automatic thing, hence the PS commands to shut down the DAG, tell the DAG to ignore the non-responding DAG members and restart the DAG.  Then the last set of PS commands were (I thought) to bring the original DAG members back online.
Captain TactInfrastructure Operations, SeniorAuthor Commented:
BTW...so if my proposed solution WON'T solve the problem, then what do you suggest that WILL solve my problem?
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

Captain TactInfrastructure Operations, SeniorAuthor Commented:
@Akhater - You know, I re-read my postings and it sounds like I'm being confrontational, but I'm not trying to be...so please accept my apology if it sounded like I was trying to pick a fight.

I'm new to Exchange 2010, so I'm still trying to wrap my head around several concepts and I'm getting frustrated that I can't seem to find any one site that documents exactly what I would need to do in this scenario.

Thanks in advance, for all the help you can give me.  :-)
AkhaterSolutions ArchitectCommented:
Don't worry I didn't take it wrong I was just away

OK it looks that points 1 2 and 3 are covered then maybe I miss understood the initial question, I thought you were wondering how you can keep the primary and DR datacenter working if the link goes down.

Are you trying to figure out how to do a datacenter swtichover and failback ?
Captain TactInfrastructure Operations, SeniorAuthor Commented:
Exactly!  We have offices in 11 different geographical locations, connected in a Hub/Spoke Topology.  The primary site is the center, with the secondary site as one of the spokes.  We have BGP in place, so if something catastrophic happened at the main branch, the other 10 locations would connect to the secondary site.

I'm simply trying to figure out how to keep Exchange 2010 running, if the primary site or the connection into the primary site goes down, then fail back to the primary site once it's back up and running, without encountering a split-brain situation.
AkhaterSolutions ArchitectCommented:
do you have active users connecting to both locations at the same time ?

i.e. live database in both dataceters or all DBs are up in one Datacenter and will switch to the second if you have a failure ?
Captain TactInfrastructure Operations, SeniorAuthor Commented:
We have active users in both locations, but we can change that if we need to.

Currently MBX servers A, B, and C, as well as CAS-A are all in the primary Datacenter.  MBX server D,  and CAS-B are in the secondary datacenter.  Each MBX server has one active DB and 3 passive DBs.

All four MBX servers have live users connecting to them.
AkhaterSolutions ArchitectCommented:
the issue with this design is if the link goes down MBX D will go do nothing you can do about it...

since you have 3 MBX then the FSW will not be used at all

enabling DAC will prevent split brain of happening but not in the case of the link going down

how to perform a datacenter switchover

http://technet.microsoft.com/en-us/library/dd351049.aspx

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Captain TactInfrastructure Operations, SeniorAuthor Commented:
Okay...so let me reiterate and make sure I understand what it is you are saying.

1.  Enable DAC mode, regardless of anything else I do, in order to avoid split-brain syndrom in the future.

2.  Since I have an odd number of servers in the primary datacenter, I will always have a quorum, even if I don't have an FSW.

3.  Set the FSW to the CAS server in the secondary datacenter, so that if the primary datacenter blows up, the MBX server in the secondary datacenter can continue to run.
AkhaterSolutions ArchitectCommented:
1. Yes

2. You have the mojority of nodes and it is even not odd, 2 out of 3 mbx

3. You have 3 nodes the FSW will not be used, when the primary Datacenter blows perform a datacenter switch ove like in the article above.

4. If thelink between primary and secondary is down you stillhave problems since  secondary datacenter will lose quorum andshutdown, bringingit up manually is you FORCING splitbrain. He only solution to this is havin all active dbs in the primary one
Captain TactInfrastructure Operations, SeniorAuthor Commented:
@Akhater - So...I have one more question that I can't remember the answer to.

If I have 4 active databases, am I required to have 4 MBX servers?
AkhaterSolutions ArchitectCommented:
no where did you bring this idea from ? you could very easily have 4 active databases on 2 or 3 servers (even on 1 if no DAG)
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Exchange

From novice to tech pro — start learning today.