Link to home
Start Free TrialLog in
Avatar of Hallidays
Hallidays

asked on

Replication to one domain controller not working, others are fine.

Hi all,

A little background will help here.

We have multiple DC's across multiple sites, site 1 (head office) has two DC's - DCSite1-001 and DCSite1-002, our branch offices all have a single DC and are connected to Head Office via a full MPLS network. The site links are great, all ping communications are done typically under 1ms.

I have a problem between our main site main DC (DCSite1-001) and one of the remote site DC's (DCSite2-001). All of the other sites DC's can replicate to and from the main site DC, it is only these two machines that have a problem. DCSite2-001 can replicate with DCSite1-002, just not DCSite1-001. Again - all other servers in other sites can replicate fine with DCSite1-001.

I have checked the obvious like time syncs, updates etc but everything looks ok. I have run PortQuery and it gets stuck when connecting from DCSite2-001 to DCSite1-001 using TCP over port 389. The problem i have is this server can connect to another server in site 1, just not the main DC. Servers in other sites can connect to the main DC so it can't be that the port is blocked on the server.

Any ideas?
Avatar of Radhakrishnan
Radhakrishnan
Flag of India image

Hi,

Are there any dfs replication errors in the events logs? if so, can you post it here?. Also, download the MS tool and see if you can pin point anything about the replication

http://www.microsoft.com/en-in/download/details.aspx?id=30005
Avatar of Hallidays
Hallidays

ASKER

Hi,

There is a warning rather than an error but this is relating to DC2 in site 1 as this is where DFS replicates to. AD replication is not working to DC1 in site 1. I am running the AD Replication tool now.


The File Replication Service is having trouble enabling replication from DCSite1-002 to DCSite2-001 for c:\windows\sysvol\domain using the DNS name DCSite1-002.domainname.local. FRS will keep retrying.
 Following are some of the reasons you would see this warning.
 
 [1] FRS can not correctly resolve the DNS name  DCSite1-002.domainname.local from this computer.
 [2] FRS is not running on  DCSite1-002.domainname.local
 [3] The topology information in the Active Directory Domain Services for this replica has not yet replicated to all the Domain Controllers.
 
 This event log message will appear once per connection, After the problem is fixed you will see another event log message indicating that the connection has been established.
Interesting - I ran the AD Replication Tool from DCSite1-001 (the main DC) and checked replication with DCSite2-001 and it says:

Domain Controller "DCSite2-001.domain.local" does not exist or could not be contacted. I can ping the server by both name and IP.
Hi,

Which replication method you are using? as per the error it indicates that it's FRS. Also, it looks to me that some servers are using DFS and some other are FRS?

If this is the case, I would suggest to keep as same on all the sites and see it makes any difference?

If that doesn't work then probably you may need to perform a Burflag as per https://support.microsoft.com/en-us/kb/290762/
Hi, the error is with AD replication rather than file replication. We use DFS for file replication and that seems to be working fine at the moment, it is just the AD that isn't.
What happens when you run repadmin /replsum does this fail?

Do you have connections setup between both of these DC's? Have you let the KCC (knowledge consistency checker) make these connections for you (should be automatic connections)

Will.
Hi,

Okay. Just to make sure that the NTDS.DIT is fine, could you run an Integrity check on the DC's and see it throwing any failures? you can follow this procedures;

Open command prompt and type>>ntdsutil
ntdsutil: files
file maintenance: Integrity

PS - Sometimes it may ask to activate instance, if so, type "activate instance ntds" after the files command.

Perform this on all the DC's and make sure that the NTDS.DIT fine on all the domain controllers.
Hi Will,

repadmin /replsum shows an error between the two DC;s in question

(1726) The remote procedure call failed

It then says "Experienced the following operational errors trying to retrieve replication information: 1053 - DCSite1-001"

All other servers can replicate fine
Hi Rad,

Integrity check is fine.
We are still having this issue - it looks like the DC isn't allowing RPC calls from our remote DC. All other servers are fine and the site links are ok, it is just this remote network that is having the issue.
Have you tried to reboot the DC in the site? If this is something that you cannot correct you might need to demote this domain and re-promote.

Will.
Rebooting the DC with the master roles seems to allow this to work but this specific remote DC is the only one that cannot connect. We have 4 other remote DC's that continue to communicate fine when one is having an issue. I have demoted, promoted, removed and re added to the domain but it isn't making any difference.
Additionally, when the problem reoccurs, which is at random intervals i get the following in the registry:

The Knowledge Consistency Checker (KCC) has detected problems with the following directory partition.
 
Directory partition:
CN=Configuration,DC=Domain,DC=local
 
There is insufficient site connectivity information for the KCC to create a spanning tree replication topology. Or, one or more directory servers with this directory partition are unable to replicate the directory partition information. This is probably due to inaccessible directory servers.
 
User Action
Perform one of the following actions:
- Publish sufficient site connectivity information so that the KCC can determine a route by which this directory partition can reach this site. This is the preferred option.
- Add a Connection object to a directory service that contains the directory partition in this site from a directory service that contains the same directory partition in another site.
 
If neither of the tasks correct this condition, see previous events logged by the KCC that identify the inaccessible directory servers.


Sites are all still connected and everything works fine via IP
This is the event that is logged right after the above:

The Knowledge Consistency Checker (KCC) was unable to form a complete spanning tree network topology. As a result, the following list of sites cannot be reached from the local site.
 
Sites:
CN=Remote-Site,CN=Sites,CN=Configuration,DC=Domain,DC=local
ASKER CERTIFIED SOLUTION
Avatar of Will Szymkowski
Will Szymkowski
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Will,

All of the DC's are connected via an MPLS so the default site like is correct,  we are still having trouble with this issue.

Apologies for the late reply.
Well based on the error message it cannot contact some DC's that are listed in the Default Site Link.

I would suggest checking this again.

Will.
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.