Exchange Site Reslience procedure

AID: 5325
  • Status: Published

1180 points

  • ByOsmoze
  • TypeTutorial
  • Posted on2011-05-02 at 01:56:23
The context:

One of the problems I encounter when trying to deploy the Exchange 2010 DAG is how to get site resilience with two nodes. I have been looking for such procedure around Microsoft documentation, and TechNet forums but none of their procedures worked for my costumer case !!! So what I did is making my own one based of the understanding of how DAG works and the benefit of enabling the DAC mode.

The main project was to setup a backup site for Exchange. For my customer with a limited budget, it was not a solution to get a another server for site resilience, so I had to deal with the existing resources. Once that challenge was accepted, it was up to me to find a way to do that with no additional cost .

In the first place, I thought it was not possible to set up DAG with site resilience in this case (two nodes), because the documentation indicated that a minimum of 3 nodes would be required to have DAC (Datacenter Activation Mode) mode activated ! With Exchange 2010 SP1 it was rectified to two nodes so things can get done!

The Principle :

  • Two sites
  • Two Exchange servers with all roles installed (MBX, HT, CAS)
  • Two Witness servers, one in each location (One primary and the second is the alternative for activation procedure)
  • Both Exchange servers are Domain controller even if It said on MS TechNet this case is not supported)
  • Installation of Exchange (refer to the earlier post here)
  • Setting up DAG (refer to the earlier post here)

 
Site Resilience procedure .

Scenario:

  • SRV1, WS1 in the primary DC, SRV2, WS2  in the Backup DC .
  • Leased line with 1Mb bw between sites
  • SRV1 hold the DB active state, SRV2 hold the  DB copy in Standby /Healthy state .
  • Users connect to SRV1 CAS
  • Manual activation of the copy works .


If the leased line goes down, users will still have access to their mailing system; because of SRV1 still can get the quorum for letting his DB mounted as far as the WS1 is accessible! if the WS1 become unreachable, within a minute SRV1 will dismount his DB and show it as Disconnected /Healthy state.

It’s normal behavior of DAG to protect against DB corruption as the only server how can get the quorum have the right to mount his db.

In the other side (backup DC ), the remaining SRV2 isolated with the LL down have no access to WS1 and will not be able to mount his DB  (will be  in dismounted /Healthy state ).

With the LL established the only notified change is the SRV2 (backup DC ) will resynchronize hi DB copy with the DB on the SRV1. IF SRV1 goes down for any reason, the LL is UP and WS1 is reachable, simply the DAG automatic switchover will active the copy on the SRV2. But in this case , users will not have access to their mailing system because initially the use the CAS role in SRV1 , but now he’s no more available. We have two alternative here:

Changing the  RPCClientAccessServer attribute on the database to point into the CAS on SRV2 using the commandlet : Set-MailBoxDataBase -Identity DB -RPCClientAccessServer FQDN-SRV2 . Or, second alternative is changing DNS record for CAS1 to CAS2. In the first one, we’ll need to repair outlook profile so it brings the new configuration. I would recommend to do it with DNS it’s the simplest way.

Site Resilience

Now let’s consider that the primary DC encounters a disaster and is no more available.

SRV1 and the WS1 are down, users lost their connectivity to exchange, and on backup DC , SRV2 can’t mount his DB ,as long he do not have access to the WS and cannot get the quorum the following procedure will use the alternate Witness server WS2 , and will force the quorum , do SRV2 will be able to mount his DB copy .

In the first place we need to stop the DAG on SRV1 so we can exclude it later using the command Stop-DatabaseAvailibiltyGroup -Identity DAG -MailBoxServer FQDN-SRV1

It will take a while trying to contact SRV1 who is unreachable, at the end, it will show the error that he couldn’t update the configuration on SRV1. Don’t worry it’s not important in this case, only the configuration on SRV2 have to be updated.

We can verify that SRV1 is stopped in DAG by the commandlet : Get-DatabaseAvailibilityGroup -Identity DAG | fl name,*server *

The result is the names and the state of DAG members, if the previous commandlet works, SRV1 should be in StoppedServers and SRV2 in the StartedServers list. Next step is to stop the Cluster service to restart it with the quorum.

The most important steps come next, after this one SRV2 should mount his DB copy, the restore-DatabaseAvailibilityGroup command is a suit of actions that are:
  • Excluding the failed member from DAG,
  • Using the preconfigured Alternate WitnesseServer to bring the quorum
  • Starting the cluster service with the quorum forced.
  • Rebuilding DAG with the WS2 and SRV2 .


The syntax is : Restore-DatabaseAvailabilityGroup –Identity Dag –ActiveDirectorySite Default-First-Site-Name

Actually we can add some others switches to this command but this simple one should do the work in our case. Once the command executed, it will take few seconds to SRV2 to automatically mount his Database copy.

At this point users still can’t get access to their Mailing system , and we need to tell them to point to the SRV2 CAS by changing whether the DNS record for the CAS to point to SRV2 or changing the RPCClientAccessServer attribute on the DB Set-MailboxDatabase db –RpcClientAccessServer  FQDN-SRV2

Once it’s done you need to repair the Outlook profile to update the configuration on it .

This is it, now you have fully working backup DC , until the primary DC comes back, the remaining question is what’s the procedure to apply when the primary DC is back online after you restore your servers from tapes or any other backing up media .?

Well it’s simple here’s the answer :

Now the backup SRV2 holds the active DB copy and there was be many mails flow between the time the primary DC is offline. Here the DAC mode is useful ensuring that when the Primary DC comes online , SRV1 will not takeover and mount his DB !

First you have to start the DAG on the SRV1 using: Start-DatabaseAvailibilityGroup -identity DAG -MailBoxServer FQDN SRV1 .

Once it’s done , SRV1 will show his DB as synchronizing (from the SRV2 copy )  .after finishing the synchronization ,the DB in SRV1 will be healthy . You can monitor the logs queue and the replayed ones using  Get-MailBoxDatabaseCopyStatut -identity db\SRV2

Now you can manually activate the DB on SRV1 and get things back to normal. And finally, the one last thing to do is to modify the DNS record for the CAS to SRV1 or to revert the RPCClientAccessServer  the same way we did before.

Hope this helps you.
Asked On
2011-05-02 at 01:56:23ID5325
Tags

Exchange 2010

,

DAG

,

DATACENTER

,

switchover

Topic

Exchange Email Server

Views
572

Comments

Add your Comment

Please Sign up or Log in to comment on this article.

Join Experts Exchange Today

Gain Access to all our Tech Resources

Get personalized answers

Ask unlimited questions

Access Proven Solutions

Search 3.2 million solutions

Read In-Depth How-To Guides

1000+ articles, demos, & tips

Watch Step by Step Tutorials

Learn direct from top tech pros

And Much More!

Your complete tech resource

See Plans and Pricing

30-day free trial. Register in 60 seconds.

Loading Advertisement...

Top Exchange Experts

  1. demazter

    724,144

    Sage

    1,580 points yesterday

    Profile
    Rank: Genius
  2. alanhardisty

    714,931

    Sage

    4,220 points yesterday

    Profile
    Rank: Genius
  3. jjmck

    275,745

    Guru

    1,030 points yesterday

    Profile
    Rank: Genius
  4. Rajkumar-MCITP

    268,093

    Guru

    0 points yesterday

    Profile
    Rank: Guru
  5. apache09

    245,757

    Guru

    1,500 points yesterday

    Profile
    Rank: Genius
  6. akicute555

    178,820

    Guru

    0 points yesterday

    Profile
    Rank: Wizard
  7. amitkulshrestha

    171,436

    Guru

    0 points yesterday

    Profile
    Rank: Genius
  8. acbrown2010

    159,135

    Guru

    1,000 points yesterday

    Profile
    Rank: Genius
  9. Akhater

    153,366

    Guru

    0 points yesterday

    Profile
    Rank: Genius
  10. Neilsr

    137,804

    Master

    0 points yesterday

    Profile
    Rank: Genius
  11. jordannet

    127,611

    Master

    10 points yesterday

    Profile
    Rank: Wizard
  12. GreatVargas

    101,542

    Master

    2,800 points yesterday

    Profile
    Rank: Wizard
  13. HendrikWiese

    95,084

    Master

    2,050 points yesterday

    Profile
    Rank: Sage
  14. limjianan

    94,700

    Master

    0 points yesterday

    Profile
    Rank: Genius
  15. endital1097

    93,710

    Master

    10 points yesterday

    Profile
    Rank: Genius
  16. Anuroopsundd

    92,000

    Master

    0 points yesterday

    Profile
    Rank: Sage
  17. davorin

    91,351

    Master

    0 points yesterday

    Profile
    Rank: Sage
  18. Radweld

    88,729

    Master

    0 points yesterday

    Profile
    Rank: Guru
  19. chakko

    88,498

    Master

    0 points yesterday

    Profile
    Rank: Genius
  20. R--R

    86,699

    Master

    0 points yesterday

    Profile
    Rank: Wizard
  21. e_aravind

    85,998

    Master

    0 points yesterday

    Profile
    Rank: Genius
  22. Exchange_Geek

    85,704

    Master

    0 points yesterday

    Profile
    Rank: Sage
  23. lucid8

    81,809

    Master

    0 points yesterday

    Profile
    Rank: Sage
  24. KalluSureshKumar

    76,534

    Master

    0 points yesterday

    Profile
    Rank: Master
  25. tigermatt

    73,020

    Master

    0 points yesterday

    Profile
    Rank: Genius

Hall Of Fame