Link to home
Start Free TrialLog in
Avatar of Captain Tact
Captain TactFlag for United States of America

asked on

Exchange 2010 File Share Witness and Alternate FSW

Situation:  
•      Exchange 2010 needs to be able to failover to a secondary datacenter, if the primary datacenter loses power or experiences some kind of catastrophic failure.

Current Scenario:
•      Four VM mailbox servers (A, B, C, D)
•      Two VM CAS servers (Cas-A, Cas-B)
•      Two hardware network load balancers (NLB1, NLB2)
•      All MBX servers, CAS servers and NLBs are in one AD site.
•      One DC, NLB2, Cas-B and MBX server “D” are all running in the secondary datacenter.
•      Secondary datacenter is connected to the primary datacenter via a 1GB fiber connection.
•      Currently running a DAG that consists of one active DB on each MBX server and three passive DBs on each MBX server.  The three passive DBs consist of one copy each of the other three servers.
•      File Share Witness (FSW) is currently set to Cas-A in the primary datacenter, per what I understand to be the best practice of putting your FSW in the physical location that contains the greatest number of MBX servers.
•      Additionally, the IT Director has mandated that the FSW remain in the primary datacenter due to the likelihood of the link between the

Problem:
•      I can set the FSW to CAS-B in the secondary datacenter.  If I do this and the primary datacenter goes down, then nothing else needs to be done.  The DBs in the DAG will failover to server D and both NLB2 and Cas-B will handle all the client traffic.
•      However, if I set the FSW to the secondary datacenter and the connection between the two goes down (disaster, fiber cut, etc), then we lose the ability to connect to the DBs in the primary datacenter.
•      I know I can simply change the FSW to Cas-A and it will work.  However, if I do this and the secondary datacenter comes back up, then I am going to wind up with the system in a “split-brain” scenario.
•      So…I can set an Alternate FSW.  But if I do that, then it is my understanding that I need to enable the Datacenter Activation Coordination (DAC) mode, so that when the primary datacenter comes back up, I don’t wind up with a “split-brain” scenario.  I would do that with this PS command:
•      Set-DatabaseAvailabilityGroup –Identity DAG1 –DatacenterActivationMode DagOnly

Okay…so the steps I have been able to piece together for using the Alternate Witness Server in case the primary datacenter goes down are as follows.  Please correct me if they’re wrong:
1.      Stop the clustering service on Server D
2.      Run this PS command from Server D:  
a.      Stop-DatabaseAvailabilityGroup -Identity DAG1 -Mailboxserver ServerA –Configurationonly
b.      Stop-DatabaseAvailabilityGroup -Identity DAG1 -Mailboxserver ServerB –Configurationonly
c.      Stop-DatabaseAvailabilityGroup -Identity DAG1 -Mailboxserver ServerC -Configurationonly
3.      Run this PS command from Server D:  
a.      Restore-DatabaseAvailabilityGroup -Identity DAG1 -Mailboxserver ServerD –Configurationonly

4.      Mount the databases on Server D.

At this point, all DBs should be running on Server D.

To fail back to the primary datacenter, I will need to:
1.      Bring up Servers A, B, and C.
2.      Connect to one of the servers.
3.      Run these PS commands:
a.      Move-ActiveMailboxDatabase ExchDB1 -ActivateOnServer ServerA -MountDialOverride: GoodAvailability
b.      Move-ActiveMailboxDatabase ExchDB2 -ActivateOnServer ServerB -MountDialOverride: GoodAvailability
c.      Move-ActiveMailboxDatabase ExchDB3 -ActivateOnServer ServerC -MountDialOverride: GoodAvailability
4.      Restart the databases on the primary servers (A, B, C).
5.      Disable the Activation bit on the databases on Server D:
a.      Suspend-MailboxDatabaseCopy -Identity ExchDB1\ServerA -ActivationOnly
b.      Suspend-MailboxDatabaseCopy -Identity ExchDB2\ServerB -ActivationOnly
c.      Suspend-MailboxDatabaseCopy -Identity ExchDB3\ServerC -ActivationOnly

So what do you folks think?  Am I overthinking it or did I get it right?

Thank you,

Jim
Avatar of Akhater
Akhater
Flag of Lebanon image

1) You should know that DAG does NOT take into consideration links failures so, no matter what you do, if the link between the main and the secondary datacenter goes down one of the two will go down (the one without FSW)

2) an Alternate Share Witness will NOT solve your problem, and Alternate Share witness becomes effective only when you perform a datacenter switch over, it is NOT used automatically

3) you should enable DAC mode in all cases
Avatar of Captain Tact

ASKER

@Akhater - Yes, I was aware of point #1, that's why I was proposing setting the FSW on the CAS server in the primary datacenter and setting the Alternate FSW on the CAS server in the secondary datacenter.

I know the Alternate is not an automatic thing, hence the PS commands to shut down the DAG, tell the DAG to ignore the non-responding DAG members and restart the DAG.  Then the last set of PS commands were (I thought) to bring the original DAG members back online.
BTW...so if my proposed solution WON'T solve the problem, then what do you suggest that WILL solve my problem?
@Akhater - You know, I re-read my postings and it sounds like I'm being confrontational, but I'm not trying to be...so please accept my apology if it sounded like I was trying to pick a fight.

I'm new to Exchange 2010, so I'm still trying to wrap my head around several concepts and I'm getting frustrated that I can't seem to find any one site that documents exactly what I would need to do in this scenario.

Thanks in advance, for all the help you can give me.  :-)
Don't worry I didn't take it wrong I was just away

OK it looks that points 1 2 and 3 are covered then maybe I miss understood the initial question, I thought you were wondering how you can keep the primary and DR datacenter working if the link goes down.

Are you trying to figure out how to do a datacenter swtichover and failback ?
Exactly!  We have offices in 11 different geographical locations, connected in a Hub/Spoke Topology.  The primary site is the center, with the secondary site as one of the spokes.  We have BGP in place, so if something catastrophic happened at the main branch, the other 10 locations would connect to the secondary site.

I'm simply trying to figure out how to keep Exchange 2010 running, if the primary site or the connection into the primary site goes down, then fail back to the primary site once it's back up and running, without encountering a split-brain situation.
do you have active users connecting to both locations at the same time ?

i.e. live database in both dataceters or all DBs are up in one Datacenter and will switch to the second if you have a failure ?
We have active users in both locations, but we can change that if we need to.

Currently MBX servers A, B, and C, as well as CAS-A are all in the primary Datacenter.  MBX server D,  and CAS-B are in the secondary datacenter.  Each MBX server has one active DB and 3 passive DBs.

All four MBX servers have live users connecting to them.
ASKER CERTIFIED SOLUTION
Avatar of Akhater
Akhater
Flag of Lebanon image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Okay...so let me reiterate and make sure I understand what it is you are saying.

1.  Enable DAC mode, regardless of anything else I do, in order to avoid split-brain syndrom in the future.

2.  Since I have an odd number of servers in the primary datacenter, I will always have a quorum, even if I don't have an FSW.

3.  Set the FSW to the CAS server in the secondary datacenter, so that if the primary datacenter blows up, the MBX server in the secondary datacenter can continue to run.
1. Yes

2. You have the mojority of nodes and it is even not odd, 2 out of 3 mbx

3. You have 3 nodes the FSW will not be used, when the primary Datacenter blows perform a datacenter switch ove like in the article above.

4. If thelink between primary and secondary is down you stillhave problems since  secondary datacenter will lose quorum andshutdown, bringingit up manually is you FORCING splitbrain. He only solution to this is havin all active dbs in the primary one
@Akhater - So...I have one more question that I can't remember the answer to.

If I have 4 active databases, am I required to have 4 MBX servers?
no where did you bring this idea from ? you could very easily have 4 active databases on 2 or 3 servers (even on 1 if no DAG)