How to fix Exchange 2010 Split Brain

Hi folks, i was going through some DAG documentation and they mentioned should the DAG not be configured correctly (using DAC) in the event of a datacentre failover, when the primary datacentre came back online it could cause a split brain scenario.

So my question is what would be the procedure to fix a split brain Exchange 2010 DAG.

Thanks
Raymond BrooksAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Amit KumarCommented:
Please follow this article

DAC is helpful when we have expanded our Exchange DAG to multiple datacenter.

Datacenter Activation Coordination (DAC) mode is a property setting for a database availability group (DAG). DAC mode is disabled by default and should be enabled for all DAGs with two or more members that use continuous replication. DAC mode shouldn't be enabled for DAGs in third-party replication mode unless specified by the third-party vendor.
If a catastrophic failure occurs that affects the DAG (for example, a complete failure of one of the datacenters), DAC mode is used to control the startup database mount behavior of a DAG. When DAC mode isn't enabled, and a failure occurs that affects multiple servers in the DAG, when a majority of the DAG members are restored after the failure, the DAG will restart and attempt to mount databases. In a multi-datacenter configuration, this behavior could cause split brain syndrome, a condition that occurs when all networks fail, and DAG members can't receive heartbeat signals from each other. Split brain syndrome can also occur when network connectivity is severed between the datacenters. Split brain syndrome is prevented by always requiring a majority of the DAG members (and in the case of DAGs with an even number of members, the DAG's witness server) to be available and interacting for the DAG to be operational. When a majority of the members are communicating, the DAG is said to have quorum.
For example, consider a scenario where the first datacenter contains two DAG members and the witness server, and the second datacenter contains two other DAG members. If the first datacenter loses power and you activate the DAG in the second datacenter (for example, by activating the alternate witness server in the second datacenter), if the first datacenter is restored without network connectivity to the second datacenter, the active databases within the DAG may enter a split brain condition.
0
Raymond BrooksAuthor Commented:
Thanks for the explanation, however it doesn't answer the question of; How do you fix a split brain situation should it occur?
0
Amit KumarCommented:
If DAC is enabled then Split brain will not occur, as DACP always start a mailbox server with Active Manager bit as 0. Until that server does not get information from other live members that you can mount DB or not it does not mount DBs.

Yes! there is manual effort to make online Secondary datacenter, even in Exchange 2013 it can be automatic as well if you can create FSW on third site which is available in case of primary site is down.
0
Simplify Active Directory Administration

Administration of Active Directory does not have to be hard.  Too often what should be a simple task is made more difficult than it needs to be.The solution?  Hyena from SystemTools Software.  With ease-of-use as well as powerful importing and bulk updating capabilities.

Raymond BrooksAuthor Commented:
I've requested that this question be deleted for the following reason:

did not find solution
0
Amit KumarCommented:
You must accept the solution as we have given required input for your questions.
0
Raymond BrooksAuthor Commented:
I will accept the solution to close this post but you did answer my question.  

Instead you gave me what DAC was and what would happened if it were on.

My question is how to FIX split brain, after it has occurred, not prevent it from occurring.
0
Amit KumarCommented:
DAC is only option to prevent split brain in case of multisite Exchange infra. can you explain your infra in details and what is happening?

See there are many circumstances. sometimes if Windows cluster has any issue so issue can occur. please at least give me more idea about your issue.
0
Raymond BrooksAuthor Commented:
I was recently doing a lab, where i was able to pause the Exchange VM, and do a datacenter failover, when i unpaused the Exchange servers believed it still had quorum and within exchange you could observe the databases on both DAG members being listed as mounted instead of 1 healthy and 1 mounted. I essentially achieved split brain. So i was just curious, if that were to happen in a real life scenairo, however unlikely, What would i need to do to get out of state and fix my data.
0
Amit KumarCommented:
See until replication does not happen on paused site it will show you situation as previous, but when it will get replicated it will fix all things automatic, you just need to follow appropriate steps to make secondary site online. Replication here I am talking about is AD site replication.

I am giving you some step which I have been followed in my lab with successful results and honestly I did fall in split brain in my case.

For DAG Creation:
---------------------------

Set-DatabaseAvailabilityGroup -id <DAG Name> -WitnessServer <Witness/HTC Server Name> -WitnessDirectory C:\FSW –AlternateWitnessServer <Witness/HTC Server Name from other site> - AlternateWitnessDirectory C:\FSW

Set-DatabaseAvailabilityGroup -Identity <DAG Name>  -DatabaseAvailabilityGroupIPAddresses <Virtual IP from Primary Site,Virtual IP from Secondary Site>

Set-DatabaseAvailabilityGroup -Identity <DAG Name>  -DatacenterActivationMode DAGOnly


To allow RCP Access while database active on secondary site: (After applying Exchange 2010 SP2 RU3/4)
----------------------------------------------

Set-DatabaseAvailabilityGroup -ID <DAG Name>  -AllowCrossSiteRpcClientAccess:$true


To allow Silent redirection between both sites OWA: (After applying Exchange 2010 SP2)
-------------------------------------------------------------

Set-OWAVirtualDirectory -Identity "Contoso\owa (Default Web site)" -CrossSiteRedirectType <Silent/Manual> - (Exchange Server 2010 SP2)


To block/unrestrict Database Copy automatic active on secondary site’s mailbox servers:
-----------------------------------------------

Set-MailboxServer -Identity <mailbox server 1in passive site> -DatabaseCopyAutoActivationPolicy:<Blocked/unrestricted>
Set-MailboxServer -Identity <mailbox server2 in passive site> -DatabaseCopyAutoActivationPolicy:<Blocked/unrestricted>

“Blocked” can be used while none of site failure and will be used for passive site.
“Unrestricted” will be used to activate Passive site while primary site is failure.


Switchover while primary site is down temporarily or permanently:
----------------------------------------

Set-MailboxServer -Identity <mailbox server 1in passive site> -DatabaseCopyAutoActivationPolicy:unrestricted
Set-MailboxServer -Identity <mailbox server 2 in passive site> -DatabaseCopyAutoActivationPolicy:unrestricted

Stop-DatabaseAvailabilityGroup -Identity <DAG Name>   -MailboxServer <mailbox server 1in failed active site>  -ConfigurationOnly
Stop-DatabaseAvailabilityGroup -Identity <DAG Name>   -MailboxServer <mailbox server21in failed active site> -ConfigurationOnly
Stop-DatabaseAvailabilityGroup -Identity <DAG Name>  -ActiveDirectorySite <Failed Site Name> -ConfigurationOnly

Restore-DatabaseAvailabilityGroup -id <DAG Name> -ActiveDirectorySite <Passive site Name> -AlternateWitnessServer <CAS Server in passive site> -AlternateWitnessDirectory C:\FSW

Set-MailboxDatabase -id <DB Name> -RpcClientAccessServer <CAS Array in passive site>


Switchback to primary site:
----------------------------------------------

cluster node <mailbox server1 in failed active site> /forcecleanup
cluster node <mailbox server2 in failed active site>  /forcecleanup

Start-DatabaseAvailabilityGroup -id <DAG name>-ActiveDirectorySite xavtnd

Set-MailboxDatabase -id <DB Name>-RpcClientAccessServer <CAS Array in Active site>
1

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Raymond BrooksAuthor Commented:
Yup thats the answer i was looking for, thank you very much =D
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Exchange

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.