Link to home
Start Free TrialLog in
Avatar of wcgplc
wcgplc

asked on

DC replication and failure question

Hello Experts,

I want to test DC site failure between 2 sites connected via a leased line. Both sites have 2 dcs each on Win server 2008. Both sites can communicate with each through our leased line.
Both sites have their own DNS and DHCP servers.  

The plan is to unplug both DC on one site and see if those on the other site take over and vice versa. This will simulate a complete DC site failure.

As I understand we will have to move our fsmo's role over to the active dcs when the others are taken down and also DHCP and DNS. If we did have a complete site DC failure on site1 I can I setup a DHCP and DNS in a disabled state in site2 ready to take over in case of an emergency.
Avatar of Aard Vark
Aard Vark
Flag of Australia image

You don't really have to move FSMO roles unless the partner will be down for some time. Whatever you do, don't turn off the DC in a site and seize its FSMO roles while its offline, unless you plan on performing a metadata cleanup and rebuilding the OS. Don't stick your DHCP, CA, or any other services to be honest on a DC. They're disposable assets in the AD world, you don't want something critical and hard to replace running on a server designed to be replaced on a whim.

Hopefully you're running DHCP separately from your DC's? DC's are DC's, they should not be used for other purposes. For DHCP look at DHCP high availability (either load balanced or fail over mode) and set up 1 DHCP server at each site. This will prevent any DHCP downtime.

AD isn't going to care if it can't talk to its partners for even long periods of times (weeks and even months) as long as the length of time does not exceed the tombstone lifetime period. As long as you don't leave it offline too long, when you power it back up or reconnect it to the network the 2 DC's will catch up like old friends and discuss all the replications.
Avatar of wcgplc
wcgplc

ASKER

Thanks for the Info @learnctx. The main purpose of the exercise is to combat a complete DC site failure on one site so the failed DC's would have to be rebuilt from scratch. Yes, I have DHCP setup on one DC from each site. Ill definitely be looking at "DHCP high availability".  
Should I also have DNS not on a DC?

Below are the site configs:

Site 1:
1 x DC(A) with AD services, DNS, DHCP. DHCP only allocates addresses to our Windows 10 machines from the subnet 12.13.2.x.
1 x DC(B) with AD services and DNS

Site 2:
1 x DC(C) with AD services, DNS, DHCP. DHCP only allocates addresses to our Windows 10 machines from the subnet 12.14.2.x.
1 x DC(D) with AD services and DNS
ASKER CERTIFIED SOLUTION
Avatar of footech
footech
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of wcgplc

ASKER

Sorry by "site" I meant physical offices. Thanks for the info!
So something like:
 - a meteor crashes into the city where site 2 is, taking out the office including the DCs and the client machines (so you don't have to worry about maintaining function for the clients).

In this case everything in site1 should continue humming along.  If any FSMO roles were running on one of the now-destroyed DCs in site2 then the roles would have to be seized by a DC in site1.  Metadata cleanup should be performed for the site2 DCs.


Alternative scenario:
 - DCs at site2 are kept in a room all by themselves.  A hole in the roof over that room allows water through during a storm and the hardware running the DCs is destroyed, however all other equipment in the office is fine, leaving you with a bunch of clients at site2 that need to still do their work.

If they need to make use of resources on AD, all site2 clients would need to be reconfigured to use a different DNS server that has your AD records until a new DC is stood up in site2 (and then reconfigured again to use the new DC in site2 once it's available).  If any FSMO roles were running on one of the now-destroyed DCs in site2 then the roles would have to be seized by a DC in site1.  Metadata cleanup should be performed for the site2 DCs.


Alternative scenario 2:
 - DCs at site2 are VMs running on the same hardware.  The motherboard on that machine goes kaput.  A replacement is sent but won't be there for a few days, but as soon as it arrives (in this scenario), the machine will be able to be powered on and all VMs will function fine.  All other equipment in the office is fine, leaving you with a bunch of clients at site2 that need to still do their work.

If they need to make use of resources on AD, all site2 clients would need to be reconfigured to use a different DNS server that has your AD records until the DCs in site2 come back online (and then reconfigured again to use the DCs in site2).  When the DCs come back online they happily resume their relationship (as long as it hasn't been longer than the tombstone period).


Alternative scenario 3
 - the leased line becomes non-functional for a period of time

Except for direct communication between site1 and site2, everything keeps on working.  Perhaps you would set up a site-to-site VPN if an internet connection is still available until the private line is restored.  When communication between sites is restored, DCs happily resume their relationship (as long as it hasn't been longer than the tombstone period).

BTW - default tombstone lifetime is 60 days if your domain was built with Server 2003 (pre-SP1) or earlier, 180 days if built later.  If any doubt, you have to check the right AD attribute - if it's not set then it's 60 days.

P.S.  Learnctx should also be awarded points.