Solved

Active Directory Inter-Site Replication Recommendations Needed

Posted on 2013-12-30
14
1,277 Views
Last Modified: 2014-01-13
I have attached a generic diagram of what these sites look like now. My intent is to optimize the replication based on cutting down unnecessary traffic and setting up site-links/costs based on WAN links.  As you will see some of these settings were customized, and without much (or any) documentation I am trying to analyze "why" and make changes accordingly.
As you can see some servers were set as bridgeheads and that is a concern, esp as the MainSite.  From what I know this puts all the "replication eggs" in one basket for this site and that probably isn't good. I am thinking of setting at least 1 or two more DCs here to be bridgeheads.  I'm not sure any of the other sites need their servers set to bridgeheads as they all have single DCs.
There are site links for: Site4 to MainSite (includes all sites but Site2), Site4 to Site3 (includes all but Site2), Site5 to Site4 (all but Site2), Mainsite to Site4 (all but Site2), MainSite to Site6 (all sites), MainSite to Site5 (all but Site2), Mainsite to Site3 (all but Site2).  All these links use the default 100/15 cost/repl interval with the exception of "MainSite to Site6" which uses 120/180.  This does have the slowest WAN link and is geographically the furthest.  "Bridge all site links" is enabled but I would like to disable this and potentially set this up manually.
Based on this info, how would you go about optimizing these site links/bridging as well well as bridgehead placement?   Should we go with making 2 more DCs bridgeheads at "Mainsite", maybe bring up another DC at DRsite and have Site2-6 replicate with only that and MainSite only with DRsite? Thoughts?
AD-Sites-Repl-Generic.pdf
0
Comment
Question by:mcburn13
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 2
  • +2
14 Comments
 
LVL 57

Expert Comment

by:Mike Kline
ID: 39746330
What OS are you running on your DCs?  I ask because of the bridgehead question.   Does every site have connectivity back to Main Site?

Are you currently having any replication problems?

How big is your AD?

Thanks

Mike
0
 
LVL 1

Author Comment

by:mcburn13
ID: 39746356
The DCs are at least 2008, and we will be trying to go to a 2008 R2 forest functional level soon (currently the domain level is 2008 but forest 2003). Each site can talk to MainSite.  I've been told of some lag between MainSite and DRsite; repadmin testing shows everything is ok but I want to do more close monitoring.   The ntds.dit is 245MB on MainSiteDC1
0
 
LVL 53

Assisted Solution

by:Will Szymkowski
Will Szymkowski earned 214 total points
ID: 39746383
Based on the diagram you have posted what are the hardware specs for the DC's that are in the main site? It seems to me that you have too many DC's in your Main site. Depending on the number of users that you are authenticating for each DC that has 12/16GB 4/6 Cores can manage up to 10,000v users (roughly).

There is no need to have this many DC's in the main office. As you have stated all remote sites can talk to the head office so why not use hub/spoke topology? Currently right now with your remote sites only having 1 DC per-site you have a single point of failure for DC replicaiton.

Things I would change
- Add 2 DC's per remote site (if site needs to be highly available)
- Do not set preferred bridgehead server at any site (let KCC handle this)
- Decommission some of the lower hardware spec-ed DC's in the main office as this many should not be need (you are creating more admin work for yourself)
- For any sites that are relaying from other remote sites make sure that you have 2 DC's in the site that is getting replicated from

Always allow the KCC to create "automatic connections". If you don't when a DC fails or goes offline (for whatever reason), the KCC will not re-establish new connections to other DC's that are online, which means your replicaiton will fail to other DC's that is using that site as a relay site.

Will.
0
The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.

 
LVL 6

Assisted Solution

by:Brad Held
Brad Held earned 143 total points
ID: 39747556
Also if replication should only be from the primary out to the remote sites.

More information can be found here: http://technet.microsoft.com/en-us/library/cc757117(v=ws.10).aspx

When you define your site links there should be no more then 2 per site link such as
1) Mainsite - Site1
2) MainSite - Site2
3) Mainsite - Site3
4) Mainsite - Site4
5) MainSite - Site5
6) MainSite - DR

You should also define create subnet objects and associate them to the correct site - this will help keep authentication traffic traversing the site links as well

Another thing you can do is limit the number of times per hour replication happens and the time that replication can happen. By default replication happens every 180 minutes inter-site or 8 times per day. This is a number that can be changed on how up to date the information needs to be and how many changes that occur. If you decide replication should only happen off hours once replication occurs it will continue.

I agree with spec01 and let the KCC decide the bridgeheads for you. Once you decide which server is the bridgehead the KCC will assume you know better then it and will not adjust replication connections based on availability of a DC - so if the bridge head is down replication stops.

Although your database is 245mb only changes are replicated so after initial replication size really doesn't matter.
0
 
LVL 1

Author Comment

by:mcburn13
ID: 39751453
Thanks for the responses.  Do you have any documentation that gives guidance on "users per DC"?  10k users per DC seems like ALOT. I have heard anywhere from 100 per DC and up...also will be looking into possibly putting 2nd DCs at branch offices but supposedly the way the WAN is setup (MPLS) user would just authenticate against another branch's DC if the single-DC is unavailable.

I def will be taking the manual bridgehead settings off and going to get the DCs all up to 2008 R2- from what Microsoft says R2 has a new-improved optimization for load balancing of replication.  After that the plan will be to redesign the site links so each link contains a branch office , MainSite and DRSite.  MAY consider doing only branch-Mainsite and possible a lower cost repl from MainSite to DR. In addition I think we at least initially want to set one of our lower WAN speed-geographically further set to replicate LATE as a "lag site" which gives redundancy in case of some kind of catastrophic AD event. At least until we get the AD Recycle Bin with 2008 R2 forest functional.

Now I am testing the PowerShell scripting of removing/adding SiteLinks so that I can do it quickly without having to clunk through the Sites& Services GUI. Also gives me the option to quickly fail back to the previous configuration.
0
 
LVL 53

Assisted Solution

by:Will Szymkowski
Will Szymkowski earned 214 total points
ID: 39751463
The DC sizing based on number of users is all based around what hardware specs your DC's have. Take a look at the below link which helps outline this in detail.

http://blogs.metcorpconsulting.com/tech/?p=740

Will.
0
 
LVL 1

Author Comment

by:mcburn13
ID: 39751552
Thanks for that.   My team also needs to consider that in addition to users there are a plethora of devices and web apps that authenticate against AD all day which I need to get a handle on how to measure.  Going to run some perf mons on AD on one of the DCs to get a better look.
0
 
LVL 53

Assisted Solution

by:Will Szymkowski
Will Szymkowski earned 214 total points
ID: 39751686
Depending on the App you are running when the user initally logs in the KDC sends a TGT to the user. When the user un-encrypts this TGT using their password this is cached on the users machine and will be available for other appliclication or SSO (Single Sign On) apps without having to re-authenticate.

Will.
0
 
LVL 1

Author Comment

by:mcburn13
ID: 39752127
Here is my latest design (attached) , it only has the proposed site links and and site link bridges.  I'd set the cost lower on the 100MB link and higher on the slower links; each site link bridge includes the site link plus the link to the DR site.  I'm still toying with the idea of keeping the transitivity between all the sites but perhaps this will reduce the repl traffic by only having it go from the satellite to the hub and then bridging that link with the DR site link for redundancy. thoughts?
Generic-Hub-Spoke-Link-Bridges.pdf
0
 
LVL 53

Expert Comment

by:Will Szymkowski
ID: 39752147
That does look cleaner than the original one you had posted. I would still also include additional DC's in each respective site as well for redundancy.

Will.
0
 
LVL 37

Assisted Solution

by:Mahesh
Mahesh earned 143 total points
ID: 39757273
I seen your both diagrams

According to me you have pretty good network bandwidth.

If you could post here how many users do you have per site in above diagram , it can help
Also how your production applications are located and infra servers are located ?

If you have all infrastructure servers and application servers in Hub site and if user count is not much in branch sites (MS is insisting RODC \ R/W DC after certain user base), you can even remove Domain Controllers from  branches since you have good network bandwidth.
user can logon in network with cached credentials in case of WAN link failure
In case local file servers are there in branches, you can make them available offline.
If internet connectivity is given to branches through Hub site, then they will not be able to access in any case.
But if they have local internet access, they still can access webmail in case of link failure

In short, you should not put DCs in Hub and spoke model at branches unless you have genuine reason to do so as this will minimize DC foot print and maintenance as well

You can refer \ check MS AD IPD as suggested by others to identify dependencies

Mahesh
0
 
LVL 1

Author Comment

by:mcburn13
ID: 39775452
All very good recommendations.  I still would feel more comfortable having at least once DC in each of these branch offices as they have up to couple of hundred users each (local and satellite VPN'd ) and don't want to rely solely on the WAN link across the country.

Does anyone know what repadmin command or powershell script I can run that will force synchronization between sites? the /syncall will only replicates DCs in the site itself apparently...
0
 
LVL 6

Assisted Solution

by:Brad Held
Brad Held earned 143 total points
ID: 39775620
repadmin /syncall /Aed

http://technet.microsoft.com/en-us/library/cc835086.aspx

•/A Synchronizes all naming contexts that are held on the home server.
•/e Synchronizes domain controllers across all sites in the enterprise. By default, this command does not synchronize domain controllers in other sites.
•/d Identifies servers by distinguished name in messages.
0
 
LVL 37

Accepted Solution

by:
Mahesh earned 143 total points
ID: 39777547
Actually You don't need to force replication every time between all sites manually unless you have real emergency.
AD sites and services replication schedule is there to take care of that and you can vary that depending upon your needs and available bandwidth.

The following example targets all domain controllers in the forest to retrieve summary replication status from each. The example lists the output in a table that has columns for source and destination, and sorts the results based on the longest time since the last successful replication:

repadmin /replsum * /bysrc /bydest /sort:delta
You can redirect its output to txt file and can run this daily as a batch file
http://technet.microsoft.com/en-us/library/cc835092.aspx

Mahesh
0

Featured Post

U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A company’s centralized system that manages user data, security, and distributed resources is often a focus of criminal attention. Active Directory (AD) is no exception. In truth, it’s even more likely to be targeted due to the number of companies …
Resolving an irritating Remote Desktop connection that stops your saved credentials from being used.
This tutorial will walk an individual through the steps necessary to join and promote the first Windows Server 2012 domain controller into an Active Directory environment running on Windows Server 2008. Determine the location of the FSMO roles by lo…
This tutorial will walk an individual through the process of configuring their Windows Server 2012 domain controller to synchronize its time with a trusted, external resource. Use Google, Bing, or other preferred search engine to locate trusted NTP …
Suggested Courses

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question