Active Directory Inter-Site Replication Recommendations Needed

I have attached a generic diagram of what these sites look like now. My intent is to optimize the replication based on cutting down unnecessary traffic and setting up site-links/costs based on WAN links.  As you will see some of these settings were customized, and without much (or any) documentation I am trying to analyze "why" and make changes accordingly.
As you can see some servers were set as bridgeheads and that is a concern, esp as the MainSite.  From what I know this puts all the "replication eggs" in one basket for this site and that probably isn't good. I am thinking of setting at least 1 or two more DCs here to be bridgeheads.  I'm not sure any of the other sites need their servers set to bridgeheads as they all have single DCs.
There are site links for: Site4 to MainSite (includes all sites but Site2), Site4 to Site3 (includes all but Site2), Site5 to Site4 (all but Site2), Mainsite to Site4 (all but Site2), MainSite to Site6 (all sites), MainSite to Site5 (all but Site2), Mainsite to Site3 (all but Site2).  All these links use the default 100/15 cost/repl interval with the exception of "MainSite to Site6" which uses 120/180.  This does have the slowest WAN link and is geographically the furthest.  "Bridge all site links" is enabled but I would like to disable this and potentially set this up manually.
Based on this info, how would you go about optimizing these site links/bridging as well well as bridgehead placement?   Should we go with making 2 more DCs bridgeheads at "Mainsite", maybe bring up another DC at DRsite and have Site2-6 replicate with only that and MainSite only with DRsite? Thoughts?
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Mike KlineCommented:
What OS are you running on your DCs?  I ask because of the bridgehead question.   Does every site have connectivity back to Main Site?

Are you currently having any replication problems?

How big is your AD?


mcburn13Author Commented:
The DCs are at least 2008, and we will be trying to go to a 2008 R2 forest functional level soon (currently the domain level is 2008 but forest 2003). Each site can talk to MainSite.  I've been told of some lag between MainSite and DRsite; repadmin testing shows everything is ok but I want to do more close monitoring.   The ntds.dit is 245MB on MainSiteDC1
Will SzymkowskiSenior Solution ArchitectCommented:
Based on the diagram you have posted what are the hardware specs for the DC's that are in the main site? It seems to me that you have too many DC's in your Main site. Depending on the number of users that you are authenticating for each DC that has 12/16GB 4/6 Cores can manage up to 10,000v users (roughly).

There is no need to have this many DC's in the main office. As you have stated all remote sites can talk to the head office so why not use hub/spoke topology? Currently right now with your remote sites only having 1 DC per-site you have a single point of failure for DC replicaiton.

Things I would change
- Add 2 DC's per remote site (if site needs to be highly available)
- Do not set preferred bridgehead server at any site (let KCC handle this)
- Decommission some of the lower hardware spec-ed DC's in the main office as this many should not be need (you are creating more admin work for yourself)
- For any sites that are relaying from other remote sites make sure that you have 2 DC's in the site that is getting replicated from

Always allow the KCC to create "automatic connections". If you don't when a DC fails or goes offline (for whatever reason), the KCC will not re-establish new connections to other DC's that are online, which means your replicaiton will fail to other DC's that is using that site as a relay site.

Get Blueprints for Increased Customer Retention

The IT Service Excellence Tool Kit has best practices to keep your clients happy and business booming. Inside, you’ll find everything you need to increase client satisfaction and retention, become more competitive, and increase your overall success.

Brad HeldCommented:
Also if replication should only be from the primary out to the remote sites.

More information can be found here:

When you define your site links there should be no more then 2 per site link such as
1) Mainsite - Site1
2) MainSite - Site2
3) Mainsite - Site3
4) Mainsite - Site4
5) MainSite - Site5
6) MainSite - DR

You should also define create subnet objects and associate them to the correct site - this will help keep authentication traffic traversing the site links as well

Another thing you can do is limit the number of times per hour replication happens and the time that replication can happen. By default replication happens every 180 minutes inter-site or 8 times per day. This is a number that can be changed on how up to date the information needs to be and how many changes that occur. If you decide replication should only happen off hours once replication occurs it will continue.

I agree with spec01 and let the KCC decide the bridgeheads for you. Once you decide which server is the bridgehead the KCC will assume you know better then it and will not adjust replication connections based on availability of a DC - so if the bridge head is down replication stops.

Although your database is 245mb only changes are replicated so after initial replication size really doesn't matter.
mcburn13Author Commented:
Thanks for the responses.  Do you have any documentation that gives guidance on "users per DC"?  10k users per DC seems like ALOT. I have heard anywhere from 100 per DC and up...also will be looking into possibly putting 2nd DCs at branch offices but supposedly the way the WAN is setup (MPLS) user would just authenticate against another branch's DC if the single-DC is unavailable.

I def will be taking the manual bridgehead settings off and going to get the DCs all up to 2008 R2- from what Microsoft says R2 has a new-improved optimization for load balancing of replication.  After that the plan will be to redesign the site links so each link contains a branch office , MainSite and DRSite.  MAY consider doing only branch-Mainsite and possible a lower cost repl from MainSite to DR. In addition I think we at least initially want to set one of our lower WAN speed-geographically further set to replicate LATE as a "lag site" which gives redundancy in case of some kind of catastrophic AD event. At least until we get the AD Recycle Bin with 2008 R2 forest functional.

Now I am testing the PowerShell scripting of removing/adding SiteLinks so that I can do it quickly without having to clunk through the Sites& Services GUI. Also gives me the option to quickly fail back to the previous configuration.
Will SzymkowskiSenior Solution ArchitectCommented:
The DC sizing based on number of users is all based around what hardware specs your DC's have. Take a look at the below link which helps outline this in detail.

mcburn13Author Commented:
Thanks for that.   My team also needs to consider that in addition to users there are a plethora of devices and web apps that authenticate against AD all day which I need to get a handle on how to measure.  Going to run some perf mons on AD on one of the DCs to get a better look.
Will SzymkowskiSenior Solution ArchitectCommented:
Depending on the App you are running when the user initally logs in the KDC sends a TGT to the user. When the user un-encrypts this TGT using their password this is cached on the users machine and will be available for other appliclication or SSO (Single Sign On) apps without having to re-authenticate.

mcburn13Author Commented:
Here is my latest design (attached) , it only has the proposed site links and and site link bridges.  I'd set the cost lower on the 100MB link and higher on the slower links; each site link bridge includes the site link plus the link to the DR site.  I'm still toying with the idea of keeping the transitivity between all the sites but perhaps this will reduce the repl traffic by only having it go from the satellite to the hub and then bridging that link with the DR site link for redundancy. thoughts?
Will SzymkowskiSenior Solution ArchitectCommented:
That does look cleaner than the original one you had posted. I would still also include additional DC's in each respective site as well for redundancy.

I seen your both diagrams

According to me you have pretty good network bandwidth.

If you could post here how many users do you have per site in above diagram , it can help
Also how your production applications are located and infra servers are located ?

If you have all infrastructure servers and application servers in Hub site and if user count is not much in branch sites (MS is insisting RODC \ R/W DC after certain user base), you can even remove Domain Controllers from  branches since you have good network bandwidth.
user can logon in network with cached credentials in case of WAN link failure
In case local file servers are there in branches, you can make them available offline.
If internet connectivity is given to branches through Hub site, then they will not be able to access in any case.
But if they have local internet access, they still can access webmail in case of link failure

In short, you should not put DCs in Hub and spoke model at branches unless you have genuine reason to do so as this will minimize DC foot print and maintenance as well

You can refer \ check MS AD IPD as suggested by others to identify dependencies

mcburn13Author Commented:
All very good recommendations.  I still would feel more comfortable having at least once DC in each of these branch offices as they have up to couple of hundred users each (local and satellite VPN'd ) and don't want to rely solely on the WAN link across the country.

Does anyone know what repadmin command or powershell script I can run that will force synchronization between sites? the /syncall will only replicates DCs in the site itself apparently...
Brad HeldCommented:
repadmin /syncall /Aed

•/A Synchronizes all naming contexts that are held on the home server.
•/e Synchronizes domain controllers across all sites in the enterprise. By default, this command does not synchronize domain controllers in other sites.
•/d Identifies servers by distinguished name in messages.
Actually You don't need to force replication every time between all sites manually unless you have real emergency.
AD sites and services replication schedule is there to take care of that and you can vary that depending upon your needs and available bandwidth.

The following example targets all domain controllers in the forest to retrieve summary replication status from each. The example lists the output in a table that has columns for source and destination, and sorts the results based on the longest time since the last successful replication:

repadmin /replsum * /bysrc /bydest /sort:delta
You can redirect its output to txt file and can run this daily as a batch file


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Active Directory

From novice to tech pro — start learning today.