Link to home
Start Free TrialLog in
Avatar of doboszb
doboszb

asked on

AD replication error

Over the last week replication has been failing on our W2003 domain to one of our 3 DC’s.  The event log is showing event 13508 which indicates it is a FRS or DNS issue.  All 3 DC’s are global catalog servers and the particular server not replicating also holds all the FSMO roles.  When I try to check the stats of those roles it shows ERROR as the role holder instead of the server name.  Running dcdiag everything passes except FRSevent which says that there are warning evens within the last 24 hours after the SYSVOL has been shared.  Failing SYSVOL replication problems may cause Group Policy problems.   Any ides what might be causing this issue?  All three DC’s are W2003 SP2.
Avatar of cmackles
cmackles
Flag of United States of America image

Another pair of diagnostic tools you may want to use are repadmin and netdiag

repadmin /showreps >rep.txt
(this will output the replication info to rep.txt)

netdiag /v >net.txt

Another thing you may want to check is if the Kerberos tickets have somehow gotten messed up. This happened to one of my client's networks just recently, and it took me about four hours to figure it out.

Go to the failing server, try to access a share on one of the other DCs. Go to the other DCs and try to access a share on the failing server. Even try it from a workstation. If any of them report "The login name is incorrect" or something like that, then the Kerberos tickets are messed up.

Give that a try and see what those tools tell you.
Avatar of doboszb
doboszb

ASKER

I was able to connect to shares on all of the DC’s from each of the servers.
However the netdiag test showed some errors.  Everything Passed on the DC that isn’t replicating, but the other two DC’s showed these errors when I ran the utility:

DNS Test:
[WARNNING] The DNS entries foe this DC cannot be verified right now on DNS server 192.168.42.80 (the DC that isn’t replicating) Error_timeout

LDAP Test:
[FATAL] Cannot open an LDAP session to 192.168.42.80 (non replicating DC )
[WARNING] Failed to query SPN registration on DC ‘192.168.42.80 (non-replicating DC)
have all 3 DCs use one DNS server you know it's good
make sure all "auto" service on bad DC are running (compare services status with the other two)
you may also want to restart netlogon service, this is not completely neccessary, but just in case you lost SRV records for the bad DC somehow. (netdiag /fix does the same too)

any issue you think is worth mentioning in System Log and Directory Service log on bad DC?
a "dcdiag /v" on bad DC will be helpful
Avatar of doboszb

ASKER

There isnt anything too signifigant in the Event Logs, just the replication errors refering to FRS and DNS possibly being wrong.  I restarted the services and repointed DNS all to the same server.  In terms of changes to the netwrok.  W2003 SP2 is the only thing that has happened in this time frame.  Below are the errors from the dcdiag:



Doing primary tests
   
   Testing server: Middleton\CQC-DB01
      Starting test: Replications
         * Replications Check
         [Replications Check,CQC-DB01] A recent replication attempt failed:
            From CQC-ADC01 to CQC-DB01
            Naming Context: DC=ForestDnsZones,DC=CQC,DC=com
            The replication generated an error (1256):
            The remote system is not available. For information about network troubleshooting, see Windows Help.
            The failure occurred at 2007-03-29 09:53:05.
            The last success occurred at 2007-03-15 18:55:17.
            328 failures have occurred since the last success.
         [CQC-ADC01] DsBindWithSpnEx() failed with error 1722,
         The RPC server is unavailable..
         Printing RPC Extended Error Info:
         Error Record 1, ProcessID is 3084 (DcDiag)        
            System Time is: 3/29/2007 15:45:32:791
            Generating component is 8 (winsock)
            Status is 1722: The RPC server is unavailable.

            Detection location is 323
         Error Record 2, ProcessID is 3084 (DcDiag)        
            System Time is: 3/29/2007 15:45:32:791
            Generating component is 8 (winsock)
            Status is 1237: The operation could not be completed. A retry should be performed.

            Detection location is 313
         Error Record 3, ProcessID is 3084 (DcDiag)        
            System Time is: 3/29/2007 15:45:32:791
            Generating component is 8 (winsock)
            Status is 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

            Detection location is 311
            NumberOfParameters is 3
            Long val: 135
            Pointer val: 0
            Pointer val: 0
         Error Record 4, ProcessID is 3084 (DcDiag)        
            System Time is: 3/29/2007 15:45:32:791
            Generating component is 8 (winsock)
            Status is 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

            Detection location is 318
         [Replications Check,CQC-DB01] A recent replication attempt failed:
            From CQC-ADC01 to CQC-DB01
            Naming Context: DC=DomainDnsZones,DC=CQC,DC=com
            The replication generated an error (1256):
            The remote system is not available. For information about network troubleshooting, see Windows Help.
            The failure occurred at 2007-03-29 09:53:05.
            The last success occurred at 2007-03-15 18:55:17.
            328 failures have occurred since the last success.
         [Replications Check,CQC-DB01] A recent replication attempt failed:
            From CQC-ADC01 to CQC-DB01
            Naming Context: CN=Schema,CN=Configuration,DC=CQC,DC=com
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2007-03-29 09:53:48.
            The last success occurred at 2007-03-15 18:55:17.
            328 failures have occurred since the last success.
            The source remains down. Please check the machine.
         [Replications Check,CQC-DB01] A recent replication attempt failed:
            From CQC-ADC01 to CQC-DB01
            Naming Context: CN=Configuration,DC=CQC,DC=com
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2007-03-29 09:53:26.
            The last success occurred at 2007-03-15 19:40:45.
            656 failures have occurred since the last success.
            The source remains down. Please check the machine.
         [Replications Check,CQC-DB01] A recent replication attempt failed:
            From CQC-ADC01 to CQC-DB01
            Naming Context: DC=CQC,DC=com
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2007-03-29 10:41:33.
            The last success occurred at 2007-03-15 19:47:05.
            1005 failures have occurred since the last success.
            The source remains down. Please check the machine.
         * Replication Latency Check
         REPLICATION-RECEIVED LATENCY WARNING
         CQC-DB01:  Current time is 2007-03-29 10:45:11.
            DC=ForestDnsZones,DC=CQC,DC=com
               Last replication recieved from CQC-ADC01 at 2007-03-15 18:55:17.
               Latency information for 1 entries in the vector were ignored.
                  1 were retired Invocations.  0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc.  0 had no latency information (Win2K DC).  
            DC=DomainDnsZones,DC=CQC,DC=com
               Last replication recieved from CQC-ADC01 at 2007-03-15 18:55:17.
               Latency information for 1 entries in the vector were ignored.
                  1 were retired Invocations.  0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc.  0 had no latency information (Win2K DC).  
            CN=Schema,CN=Configuration,DC=CQC,DC=com
               Last replication recieved from CQC-ADC01 at 2007-03-15 18:59:13.
               Latency information for 5 entries in the vector were ignored.
                  5 were retired Invocations.  0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc.  0 had no latency information (Win2K DC).  
            CN=Configuration,DC=CQC,DC=com
               Last replication recieved from CQC-ADC01 at 2007-03-15 19:40:45.
               Latency information for 5 entries in the vector were ignored.
                  5 were retired Invocations.  0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc.  0 had no latency information (Win2K DC).  
            DC=CQC,DC=com
               Last replication recieved from CQC-ADC01 at 2007-03-15 19:47:08.
               Latency information for 5 entries in the vector were ignored.
                  5 were retired Invocations.  0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc.  0 had no latency information (Win2K DC).  
         * Replication Site Latency Check
         ......................... CQC-DB01 passed test Replications
     
      Starting test: KnowsOfRoleHolders
         Role Schema Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=Servers,CN=Middleton,CN=Sites,CN=Configuration,DC=CQC,DC=com
         Warning: CQC-ADC01 is the Schema Owner, but is not responding to DS RPC Bind.
         RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrative Templates/System/Remote Procedure Call" to enable it.
         [CQC-ADC01] LDAP search failed with error 58,
         The specified server cannot perform the requested operation..
         Warning: CQC-ADC01 is the Schema Owner, but is not responding to LDAP Bind.
         Role Domain Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=Servers,CN=Middleton,CN=Sites,CN=Configuration,DC=CQC,DC=com
         Warning: CQC-ADC01 is the Domain Owner, but is not responding to DS RPC Bind.
         RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrative Templates/System/Remote Procedure Call" to enable it.
         Warning: CQC-ADC01 is the Domain Owner, but is not responding to LDAP Bind.
         Role PDC Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=Servers,CN=Middleton,CN=Sites,CN=Configuration,DC=CQC,DC=com
         Warning: CQC-ADC01 is the PDC Owner, but is not responding to DS RPC Bind.
         RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrative Templates/System/Remote Procedure Call" to enable it.
         Warning: CQC-ADC01 is the PDC Owner, but is not responding to LDAP Bind.
         Role Rid Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=Servers,CN=Middleton,CN=Sites,CN=Configuration,DC=CQC,DC=com
         Warning: CQC-ADC01 is the Rid Owner, but is not responding to DS RPC Bind.
         RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrative Templates/System/Remote Procedure Call" to enable it.
         Warning: CQC-ADC01 is the Rid Owner, but is not responding to LDAP Bind.
         Role Infrastructure Update Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=Servers,CN=Middleton,CN=Sites,CN=Configuration,DC=CQC,DC=com
         Warning: CQC-ADC01 is the Infrastructure Update Owner, but is not responding to DS RPC Bind.
         RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrative Templates/System/Remote Procedure Call" to enable it.
         Warning: CQC-ADC01 is the Infrastructure Update Owner, but is not responding to LDAP Bind.
         ......................... CQC-DB01 failed test KnowsOfRoleHolders
ASKER CERTIFIED SOLUTION
Avatar of strongline
strongline

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of doboszb

ASKER

I was able to reset the secure channel, but it look like the same issues presist.  The only new error in the event viewer is that the direcotry partition has not been backed upin a number of days.
Avatar of doboszb

ASKER

More information:  I just tried to look at the DC's through REPLMON and when I try to add the failing DC I get this error:

The server could not be contacted or you had insufficient permissions to read the status of the server.  
did you reboot the DC after resetting secure channel? please do if no