Link to home
Start Free TrialLog in
Avatar of jakert50
jakert50Flag for United States of America

asked on

Active Directory not replicating between sites

I am running into problems with our two sites not replicating with each other. Here's a quick background story:
SITE1 has 2 domains with 2 domain controllers each (DOMAIN.COM and SITE1.DOMAIN.COM), each part of the same site in AD Sites and Services
SITE2 has 1 domain controller (SITE2.DOMAIN.COM) in it's own site in AD Sites and Services.

The DC at SITE2 was having issues booting from time to time, so a temporary DC was set up to transfer roles to. Once the roles were transferred, the original DC was reformatted/reinstalled and rejoined the domain. The roles were transferred back from the temp DC to the original DC (now with a different name, and now running server 2008 instead of server 2003 R2).
(This was also the first 2008 DC in the forest. All adprep commands were run successfully)

About a week later, I checked the logs on all the servers. I noticed several alarming events:

On the DC on SITE2, the following entrie was found:
Log Name:      Application
Source:        Microsoft-Windows-CertificateServicesClient-CertEnroll
Date:          1/26/2009 10:11:39 AM
Event ID:      13
Task Category: None
Level:         Error
Keywords:      Classic
User:          SYSTEM
Computer:      DC1.SITE2.DOMAIN.COM
Description:
Certificate enrollment for Local system failed to enroll for a DomainController certificate from PDC1.FORESTDOMAIN.COM\DC1 (The RPC server is unavailable. 0x800706ba (WIN32: 1722)).

The following is the contents of dcdiag on this server:
Directory Server Diagnosis

Performing initial setup:
   Trying to find home server...
   Home Server = DC1
   * Identified AD Forest.
   Done gathering initial info.

Doing initial required tests

   Testing server: SITE2\DC1
      Starting test: Connectivity
         ......................... DC1 passed test Connectivity

Doing primary tests

   Testing server: SITE2\DC1
      Starting test: Advertising
         ......................... DC1 passed test Advertising
      Starting test: FrsEvent
         ......................... DC1 passed test FrsEvent
      Starting test: DFSREvent
         ......................... DC1 passed test DFSREvent
      Starting test: SysVolCheck
         ......................... DC1 passed test SysVolCheck
      Starting test: KccEvent
         ......................... DC1 passed test KccEvent
      Starting test: KnowsOfRoleHolders
         ......................... DC1 passed test KnowsOfRoleHolders
      Starting test: MachineAccount
         ......................... DC1 passed test MachineAccount
      Starting test: NCSecDesc
         ......................... DC1 passed test NCSecDesc
      Starting test: NetLogons
         ......................... DC1 passed test NetLogons
      Starting test: ObjectsReplicated
         ......................... DC1 passed test ObjectsReplicated
      Starting test: Replications
         ......................... DC1 passed test Replications
      Starting test: RidManager
         ......................... DC1 passed test RidManager
      Starting test: Services
         ......................... DC1 passed test Services
      Starting test: SystemLog
         ......................... DC1 passed test SystemLog
      Starting test: VerifyReferences
         ......................... DC1 passed test VerifyReferences


   Running partition tests on : DomainDnsZones
      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test
         CrossRefValidation

   Running partition tests on : ForestDnsZones
      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test
         CrossRefValidation

   Running partition tests on : SITE2
      Starting test: CheckSDRefDom
         ......................... SITE2 passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... SITE2 passed test CrossRefValidation

   Running partition tests on : Schema
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation

   Running partition tests on : Configuration
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation

   Running enterprise tests on : DOMAIN.com
      Starting test: LocatorCheck
         ......................... DOMAIN.com passed test LocatorCheck
      Starting test: Intersite
         ......................... DOMAIN.com passed test Intersite


This error was reported on the forest domain controller (also the CA) :
Event Type:      Error
Event Source:      NTDS Replication
Event Category:      Replication
Event ID:      1864
Date:            1/25/2009
Time:            11:22:27 PM
User:            NT AUTHORITY\ANONYMOUS LOGON
Computer:      PDC1
Description:
This is the replication status for the following directory partition on the local domain controller.
 
Directory partition:
DC=PDC1,DC=DOMAIN,DC=com
 
The local domain controller has not recently received replication information from a number of domain controllers.   The count of domain controllers is shown, divided into the following intervals.
 
More than 24 hours:
1
More than a week:
1
More than one month:
0
More than two months:
0
More than a tombstone lifetime:
0
Tombstone lifetime (days):
60
 Domain controllers that do not replicate in a timely manner may encounter errors. It may miss password changes and be unable to authenticate. A DC that has not replicated in a tombstone lifetime may have missed the deletion of some objects, and may be automatically blocked from future replication until it is reconciled.
 
To identify the domain controllers by name, install the support tools included on the installation  CD and run dcdiag.exe.
You can also use the support tool repadmin.exe to display the replication latencies of the domain controllers in the forest.   The command is "repadmin /showvector /latency <partition-dn>".

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

...And this is the contents of dcdiag on this server:
Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests

   Testing server: SITE1\PDC1
      Starting test: Connectivity
         ......................... PDC1 passed test Connectivity

Doing primary tests

   Testing server: SITE1\PDC1
      Starting test: Replications
         REPLICATION-RECEIVED LATENCY WARNING
         PDC1:  Current time is 2009-01-26 11:04:42.
            DC=ForestDnsZones,DC=DOMAIN,DC=com
               Last replication recieved from FURY at 2009-01-03 11:15:28.
            CN=Schema,CN=Configuration,DC=DOMAIN,DC=com
               Last replication recieved from FURY at 2009-01-03 11:15:28.
            CN=Configuration,DC=DOMAIN,DC=com
               Last replication recieved from FURY at 2009-01-03 11:15:28.
            DC=SITE2,DC=DOMAIN,DC=com
               Last replication recieved from FURY at 2009-01-03 11:15:28.
         ......................... PDC1 passed test Replications
      Starting test: NCSecDesc
         ......................... PDC1 passed test NCSecDesc
      Starting test: NetLogons
         ......................... PDC1 passed test NetLogons
      Starting test: Advertising
         ......................... PDC1 passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... PDC1 passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... PDC1 passed test RidManager
      Starting test: MachineAccount
         ......................... PDC1 passed test MachineAccount
      Starting test: Services
         ......................... PDC1 passed test Services
      Starting test: ObjectsReplicated
         ......................... PDC1 passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... PDC1 passed test frssysvol
      Starting test: frsevent
         ......................... PDC1 passed test frsevent
      Starting test: kccevent
         ......................... PDC1 passed test kccevent
      Starting test: systemlog
         ......................... PDC1 passed test systemlog
      Starting test: VerifyReferences
         ......................... PDC1 passed test VerifyReferences

   Running partition tests on : ForestDnsZones
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test CrossRefValidation

      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom

   Running partition tests on : DomainDnsZones
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test CrossRefValidation

      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom

   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom

   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom

   Running partition tests on : domain
      Starting test: CrossRefValidation
         ......................... domain passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... domain passed test CheckSDRefDom

   Running enterprise tests on : domain.com
      Starting test: Intersite
         ......................... domain.com passed test Intersite
      Starting test: FsmoCheck
         ......................... domain.com passed test FsmoCheck

(Note that FURY is the name of the original DC in SITE2 that was reinstalled and renamed.)
I've also noticed that once a week for the past two weeks, DC1 in SITE2 has DNS errors with SITE1 and DOMAIN.COM (I need to manually reload from master, then DNS is OK for another week).

To test that the two sites weren't talking to each other, I added a new domain controller to SITE2.DOMAIN.COM. Within SITE2, AD Sites and Services seemed to look normal--it found the new server, and NTDS settings were automatically generated.
When I switched back to the PDC1 in SITE1, Sites and Services still showed the original FURY server--not showing the replaced DC1 server, nor the new temporary DC that I added to test this.

My original assumption was FSMO roles weren't transferred correctly. However, I checked DC1 via ntdsutil and it looks like everything is in order:
Server "DC1" knows about 5 roles
Schema - CN=NTDS Settings,CN=PDC1,CN=Servers,CN=Site1,CN=Sites,CN=Configuration,DC=domain,DC=com
Naming Master - CN=NTDS Settings,CN=PDC1,CN=Servers,CN=Site1,CN=Sites,CN=Configuration,DC=domain,DC=com
PDC - CN=NTDS Settings,CN=DC1,CN=Servers,CN=Site2,CN=Sites,CN=Configuration,DC=domain,DC=com
RID - CN=NTDS Settings,CN=DC1,CN=Servers,CN=Site2,CN=Sites,CN=Configuration,DC=domain,DC=com
Infrastructure - CN=NTDS Settings,CN=DC1,CN=Servers,CN=Site2,CN=Sites,CN=Configuration,DC=domain,DC=com

I had read online somewhere that the certenroll may be a problem with the correct users not being added to the CERTSVC_DCOM_ACCESS group. I since added Domain Computers, Domain Controllers, and Domain Users for all domains to this group.

What else am I missing that I should check into?
Avatar of Mike Kline
Mike Kline
Flag of United States of America image

"The DC at SITE2 was having issues booting from time to time, so a temporary DC was set up to transfer roles to. Once the roles were transferred, the original DC was reformatted/reinstalled and rejoined the domain"
Did you gracefully demote the old DC before you reformatted it (or did you have to run a metadata cleanup)
Avatar of jakert50

ASKER

It was a graceful demote.
Actually, I take that back. The temporary domain controller was gracefully demoted. The original old DC would not demote nicely, so I had to seize the roles.

I don't recall running the cleanup though..
I would run a metadata clean up to see if that old DC is stuck deep down in active directory
http://technet.microsoft.com/en-us/library/cc736378.aspx
Before I finish a cleanup, will this affect any existing connectivity (these are two live sites, and I don't want to take down the other site).

My other concern is how I add the new DC once I remove the old one. It doesn't show up when I list the servers in ntdsutil.

ntdsutil: metadata cleanup
metadata cleanup: remove selected server fury
Binding to localhost ...
Connected to localhost using credentials of locally logged on user.
LDAP error 0x22(34 (Invalid DN Syntax).
Ldap extended error message is 0000208F: NameErr: DSID-031001BA, problem 2006 (B
AD_NAME), data 8350, best match of:
        'CN=Ntds Settings,fury'

Win32 error returned is 0x208f(The object name has bad syntax.)
)
Unable to determine the domain hosted by the DC (5). Please use the connection m
enu to specify it.
Disconnecting from localhost...
metadata cleanup: connect to server fury
Error 80070057 parsing input - illegal syntax?
metadata cleanup: connection
server connections: connect to server avenger
Binding to avenger ...
Connected to avenger using credentials of locally logged on user.
server connections: select operation target
Error 80070057 parsing input - illegal syntax?
server connections: quit
metadata cleanup: select operation target
select operation target: list sites
Found 3 site(s)
0 - CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=domain,DC=com

1 - CN=site1,CN=Sites,CN=Configuration,DC=domain,DC=com
2 - CN=site2,CN=Sites,CN=Configuration,DC=domain,DC=com
select operation target: select site 2
Site - CN=site2,CN=Sites,CN=Configuration,DC=domain,DC=com
No current domain
No current server
No current Naming Context
select operation target: list domains
Found 3 domain(s)
0 - DC=domain,DC=com
1 - DC=site2,DC=domain,DC=com
2 - DC=site1,DC=domain,DC=com
select operation target: list domains in site
Found 1 domain(s)
0 - DC=site2,DC=domain,DC=com
select operation target: select domain 0
Site - CN=site2,CN=Sites,CN=Configuration,DC=domain,DC=com
Domain - DC=site2,DC=domain,DC=com
No current server
No current Naming Context
select operation target: list servers in site
Found 1 server(s)
0 - CN=FURY,CN=Servers,CN=site2,CN=Sites,CN=Configuration,DC=domain,DC=c
om
select operation target: select server 0
Site - CN=site2,CN=Sites,CN=Configuration,DC=domain,DC=com
Domain - DC=site2,DC=domain,DC=com
Server - CN=FURY,CN=Servers,CN=site2,CN=Sites,CN=Configuration,DC=domain
,DC=com
        DSA object - CN=NTDS Settings,CN=FURY,CN=Servers,CN=site2,CN=Sites,CN=C
onfiguration,DC=domain,DC=com
        DNS host name - fury.site2.domain.com
        Computer object - CN=FURY,OU=Domain Controllers,DC=site2,DC=brookssteve
ns,DC=com
No current Naming Context
select operation target: quit
metadata cleanup: remove selected server
To properly remove the requested server from Active Directory, please connect
to a server in the domain site2.domain.com; for example \\agentsmith.site2.domain.com.
metadata cleanup:

Oddly enough, that last line does mention agentsmith, which is my temporary secondary DC that I made to test the replication in the first place..
It will not take down your sites - done this tons of times with clients and testing.
Like Ryan said, the metadatacleanup will not affect connectivity, just get the old references out of AD for the dead server.
I have attempted to remove the server. I believe the server isn't being removed via ntdsutil on PDC1.

On PDC1 in SITE1, I receive the following message:
select operation target: list servers in site
Found 1 server(s)
0 - CN=FURY,CN=Servers,CN=Site2,CN=Sites,CN=Configuration,DC=domain,DC=c
om
select operation target: select server 0
Site - CN=Site2,CN=Sites,CN=Configuration,DC=domain,DC=com
Domain - DC=site2,DC=domain,DC=com
Server - CN=FURY,CN=Servers,CN=Site2,CN=Sites,CN=Configuration,DC=domain
,DC=com
        DSA object - CN=NTDS Settings,CN=FURY,CN=Servers,CN=Site2,CN=Sites,CN=C
onfiguration,DC=domain,DC=com
        DNS host name - fury.site2.domain.com
        Computer object - CN=FURY,OU=Domain Controllers,DC=site2,DC=domain,DC=com
No current Naming Context
select operation target: quit
metadata cleanup: remove selected server
To properly remove the requested server from Active Directory, please connect
to a server in the domain site2.domain.com; for example \\agentsmith.site2.domain.com.

When I connect to agentsmith and attempt to run the same command, the old server FURY does not show up in the server list. Rather, it shows the new DC1 server (as it should be).

From PDC1, if I try to connect to DC1, I get an error message:
server connections: connect to server dc1
Disconnecting from agentsmith...
Binding to dc1...
DsBindW error 0x6ba(The RPC server is unavailable.)

Should I try deleting the reference in Sites and Services?
If there is a reference there in ADSS then yes try removing and rerun
The server reference has been removed in ADSS. Now when I try to run the metadata cleanup, I get the following:

ntdsutil: metadata cleanup
metadata cleanup: connection
server connections: connect to server pdc1
Binding to pdc1 ...
Connected to pdc1 using credentials of locally logged on user.
server connections: quit
metadata cleanup: select operation target
select operation target: list sites
Found 3 site(s)
0 - CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=domain,DC=com

1 - CN=Site1,CN=Sites,CN=Configuration,DC=domain,DC=com
2 - CN=Site2,CN=Sites,CN=Configuration,DC=domain,DC=com
select operation target: select site 2
Site - CN=Site2,CN=Sites,CN=Configuration,DC=domain,DC=com
No current domain
No current server
No current Naming Context
select operation target: list domains in site
Found 0 domain(s)
select operation target:
Your doing this from a working dc right?
Yes, I'm doing this from PDC1.DOMAIN.COM (in site1)--this is the forest domain controller.

Note that if I log into DC1.SITE2.DOMAIN.COM and agentsmith.site2.domain.com, the correct information is still there:
ntdsutil: metadata cleanup
metadata cleanup: connection
server connections: connect to server dc1
Binding to dc1 ...
Connected to dc1 using credentials of locally logged on user.
server connections: quit
metadata cleanup: select operation target
select operation target: list sites
Found 3 site(s)
0 - CN=site1,CN=Sites,CN=Configuration,DC=domain,DC=com
1 - CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=domain,DC=com

2 - CN=site2,CN=Sites,CN=Configuration,DC=domain,DC=com
select operation target: select site 2
Site - CN=site2,CN=Sites,CN=Configuration,DC=domain,DC=com
No current domain
No current server
No current Naming Context
select operation target: list domains in site
Found 1 domain(s)
0 - DC=site2,DC=domain,DC=com
select operation target: select domain 0
Site - CN=site2,CN=Sites,CN=Configuration,DC=domain,DC=com
Domain - DC=site2,DC=domain,DC=com
No current server
No current Naming Context
select operation target: list servers in site
Found 1 server(s)
0 - CN=dc1,CN=Servers,CN=site2,CN=Sites,CN=Configuration,DC=domain,D
C=com
select operation target:
ASKER CERTIFIED SOLUTION
Avatar of jakert50
jakert50
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial