doboszb
asked on
AD replication error
Over the last week replication has been failing on our W2003 domain to one of our 3 DC’s. The event log is showing event 13508 which indicates it is a FRS or DNS issue. All 3 DC’s are global catalog servers and the particular server not replicating also holds all the FSMO roles. When I try to check the stats of those roles it shows ERROR as the role holder instead of the server name. Running dcdiag everything passes except FRSevent which says that there are warning evens within the last 24 hours after the SYSVOL has been shared. Failing SYSVOL replication problems may cause Group Policy problems. Any ides what might be causing this issue? All three DC’s are W2003 SP2.
ASKER
I was able to connect to shares on all of the DC’s from each of the servers.
However the netdiag test showed some errors. Everything Passed on the DC that isn’t replicating, but the other two DC’s showed these errors when I ran the utility:
DNS Test:
[WARNNING] The DNS entries foe this DC cannot be verified right now on DNS server 192.168.42.80 (the DC that isn’t replicating) Error_timeout
LDAP Test:
[FATAL] Cannot open an LDAP session to 192.168.42.80 (non replicating DC )
[WARNING] Failed to query SPN registration on DC ‘192.168.42.80 (non-replicating DC)
However the netdiag test showed some errors. Everything Passed on the DC that isn’t replicating, but the other two DC’s showed these errors when I ran the utility:
DNS Test:
[WARNNING] The DNS entries foe this DC cannot be verified right now on DNS server 192.168.42.80 (the DC that isn’t replicating) Error_timeout
LDAP Test:
[FATAL] Cannot open an LDAP session to 192.168.42.80 (non replicating DC )
[WARNING] Failed to query SPN registration on DC ‘192.168.42.80 (non-replicating DC)
have all 3 DCs use one DNS server you know it's good
make sure all "auto" service on bad DC are running (compare services status with the other two)
you may also want to restart netlogon service, this is not completely neccessary, but just in case you lost SRV records for the bad DC somehow. (netdiag /fix does the same too)
any issue you think is worth mentioning in System Log and Directory Service log on bad DC?
a "dcdiag /v" on bad DC will be helpful
make sure all "auto" service on bad DC are running (compare services status with the other two)
you may also want to restart netlogon service, this is not completely neccessary, but just in case you lost SRV records for the bad DC somehow. (netdiag /fix does the same too)
any issue you think is worth mentioning in System Log and Directory Service log on bad DC?
a "dcdiag /v" on bad DC will be helpful
ASKER
There isnt anything too signifigant in the Event Logs, just the replication errors refering to FRS and DNS possibly being wrong. I restarted the services and repointed DNS all to the same server. In terms of changes to the netwrok. W2003 SP2 is the only thing that has happened in this time frame. Below are the errors from the dcdiag:
Doing primary tests
Testing server: Middleton\CQC-DB01
Starting test: Replications
* Replications Check
[Replications Check,CQC-DB01] A recent replication attempt failed:
From CQC-ADC01 to CQC-DB01
Naming Context: DC=ForestDnsZones,DC=CQC,D C=com
The replication generated an error (1256):
The remote system is not available. For information about network troubleshooting, see Windows Help.
The failure occurred at 2007-03-29 09:53:05.
The last success occurred at 2007-03-15 18:55:17.
328 failures have occurred since the last success.
[CQC-ADC01] DsBindWithSpnEx() failed with error 1722,
The RPC server is unavailable..
Printing RPC Extended Error Info:
Error Record 1, ProcessID is 3084 (DcDiag)
System Time is: 3/29/2007 15:45:32:791
Generating component is 8 (winsock)
Status is 1722: The RPC server is unavailable.
Detection location is 323
Error Record 2, ProcessID is 3084 (DcDiag)
System Time is: 3/29/2007 15:45:32:791
Generating component is 8 (winsock)
Status is 1237: The operation could not be completed. A retry should be performed.
Detection location is 313
Error Record 3, ProcessID is 3084 (DcDiag)
System Time is: 3/29/2007 15:45:32:791
Generating component is 8 (winsock)
Status is 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
Detection location is 311
NumberOfParameters is 3
Long val: 135
Pointer val: 0
Pointer val: 0
Error Record 4, ProcessID is 3084 (DcDiag)
System Time is: 3/29/2007 15:45:32:791
Generating component is 8 (winsock)
Status is 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
Detection location is 318
[Replications Check,CQC-DB01] A recent replication attempt failed:
From CQC-ADC01 to CQC-DB01
Naming Context: DC=DomainDnsZones,DC=CQC,D C=com
The replication generated an error (1256):
The remote system is not available. For information about network troubleshooting, see Windows Help.
The failure occurred at 2007-03-29 09:53:05.
The last success occurred at 2007-03-15 18:55:17.
328 failures have occurred since the last success.
[Replications Check,CQC-DB01] A recent replication attempt failed:
From CQC-ADC01 to CQC-DB01
Naming Context: CN=Schema,CN=Configuration ,DC=CQC,DC =com
The replication generated an error (1722):
The RPC server is unavailable.
The failure occurred at 2007-03-29 09:53:48.
The last success occurred at 2007-03-15 18:55:17.
328 failures have occurred since the last success.
The source remains down. Please check the machine.
[Replications Check,CQC-DB01] A recent replication attempt failed:
From CQC-ADC01 to CQC-DB01
Naming Context: CN=Configuration,DC=CQC,DC =com
The replication generated an error (1722):
The RPC server is unavailable.
The failure occurred at 2007-03-29 09:53:26.
The last success occurred at 2007-03-15 19:40:45.
656 failures have occurred since the last success.
The source remains down. Please check the machine.
[Replications Check,CQC-DB01] A recent replication attempt failed:
From CQC-ADC01 to CQC-DB01
Naming Context: DC=CQC,DC=com
The replication generated an error (1722):
The RPC server is unavailable.
The failure occurred at 2007-03-29 10:41:33.
The last success occurred at 2007-03-15 19:47:05.
1005 failures have occurred since the last success.
The source remains down. Please check the machine.
* Replication Latency Check
REPLICATION-RECEIVED LATENCY WARNING
CQC-DB01: Current time is 2007-03-29 10:45:11.
DC=ForestDnsZones,DC=CQC,D C=com
Last replication recieved from CQC-ADC01 at 2007-03-15 18:55:17.
Latency information for 1 entries in the vector were ignored.
1 were retired Invocations. 0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc. 0 had no latency information (Win2K DC).
DC=DomainDnsZones,DC=CQC,D C=com
Last replication recieved from CQC-ADC01 at 2007-03-15 18:55:17.
Latency information for 1 entries in the vector were ignored.
1 were retired Invocations. 0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc. 0 had no latency information (Win2K DC).
CN=Schema,CN=Configuration ,DC=CQC,DC =com
Last replication recieved from CQC-ADC01 at 2007-03-15 18:59:13.
Latency information for 5 entries in the vector were ignored.
5 were retired Invocations. 0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc. 0 had no latency information (Win2K DC).
CN=Configuration,DC=CQC,DC =com
Last replication recieved from CQC-ADC01 at 2007-03-15 19:40:45.
Latency information for 5 entries in the vector were ignored.
5 were retired Invocations. 0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc. 0 had no latency information (Win2K DC).
DC=CQC,DC=com
Last replication recieved from CQC-ADC01 at 2007-03-15 19:47:08.
Latency information for 5 entries in the vector were ignored.
5 were retired Invocations. 0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc. 0 had no latency information (Win2K DC).
* Replication Site Latency Check
......................... CQC-DB01 passed test Replications
Starting test: KnowsOfRoleHolders
Role Schema Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=S ervers,CN= Middleton, CN=Sites,C N=Configur ation,DC=C QC,DC=com
Warning: CQC-ADC01 is the Schema Owner, but is not responding to DS RPC Bind.
RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrati ve Templates/System/Remote Procedure Call" to enable it.
[CQC-ADC01] LDAP search failed with error 58,
The specified server cannot perform the requested operation..
Warning: CQC-ADC01 is the Schema Owner, but is not responding to LDAP Bind.
Role Domain Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=S ervers,CN= Middleton, CN=Sites,C N=Configur ation,DC=C QC,DC=com
Warning: CQC-ADC01 is the Domain Owner, but is not responding to DS RPC Bind.
RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrati ve Templates/System/Remote Procedure Call" to enable it.
Warning: CQC-ADC01 is the Domain Owner, but is not responding to LDAP Bind.
Role PDC Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=S ervers,CN= Middleton, CN=Sites,C N=Configur ation,DC=C QC,DC=com
Warning: CQC-ADC01 is the PDC Owner, but is not responding to DS RPC Bind.
RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrati ve Templates/System/Remote Procedure Call" to enable it.
Warning: CQC-ADC01 is the PDC Owner, but is not responding to LDAP Bind.
Role Rid Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=S ervers,CN= Middleton, CN=Sites,C N=Configur ation,DC=C QC,DC=com
Warning: CQC-ADC01 is the Rid Owner, but is not responding to DS RPC Bind.
RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrati ve Templates/System/Remote Procedure Call" to enable it.
Warning: CQC-ADC01 is the Rid Owner, but is not responding to LDAP Bind.
Role Infrastructure Update Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=S ervers,CN= Middleton, CN=Sites,C N=Configur ation,DC=C QC,DC=com
Warning: CQC-ADC01 is the Infrastructure Update Owner, but is not responding to DS RPC Bind.
RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrati ve Templates/System/Remote Procedure Call" to enable it.
Warning: CQC-ADC01 is the Infrastructure Update Owner, but is not responding to LDAP Bind.
......................... CQC-DB01 failed test KnowsOfRoleHolders
Doing primary tests
Testing server: Middleton\CQC-DB01
Starting test: Replications
* Replications Check
[Replications Check,CQC-DB01] A recent replication attempt failed:
From CQC-ADC01 to CQC-DB01
Naming Context: DC=ForestDnsZones,DC=CQC,D
The replication generated an error (1256):
The remote system is not available. For information about network troubleshooting, see Windows Help.
The failure occurred at 2007-03-29 09:53:05.
The last success occurred at 2007-03-15 18:55:17.
328 failures have occurred since the last success.
[CQC-ADC01] DsBindWithSpnEx() failed with error 1722,
The RPC server is unavailable..
Printing RPC Extended Error Info:
Error Record 1, ProcessID is 3084 (DcDiag)
System Time is: 3/29/2007 15:45:32:791
Generating component is 8 (winsock)
Status is 1722: The RPC server is unavailable.
Detection location is 323
Error Record 2, ProcessID is 3084 (DcDiag)
System Time is: 3/29/2007 15:45:32:791
Generating component is 8 (winsock)
Status is 1237: The operation could not be completed. A retry should be performed.
Detection location is 313
Error Record 3, ProcessID is 3084 (DcDiag)
System Time is: 3/29/2007 15:45:32:791
Generating component is 8 (winsock)
Status is 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
Detection location is 311
NumberOfParameters is 3
Long val: 135
Pointer val: 0
Pointer val: 0
Error Record 4, ProcessID is 3084 (DcDiag)
System Time is: 3/29/2007 15:45:32:791
Generating component is 8 (winsock)
Status is 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
Detection location is 318
[Replications Check,CQC-DB01] A recent replication attempt failed:
From CQC-ADC01 to CQC-DB01
Naming Context: DC=DomainDnsZones,DC=CQC,D
The replication generated an error (1256):
The remote system is not available. For information about network troubleshooting, see Windows Help.
The failure occurred at 2007-03-29 09:53:05.
The last success occurred at 2007-03-15 18:55:17.
328 failures have occurred since the last success.
[Replications Check,CQC-DB01] A recent replication attempt failed:
From CQC-ADC01 to CQC-DB01
Naming Context: CN=Schema,CN=Configuration
The replication generated an error (1722):
The RPC server is unavailable.
The failure occurred at 2007-03-29 09:53:48.
The last success occurred at 2007-03-15 18:55:17.
328 failures have occurred since the last success.
The source remains down. Please check the machine.
[Replications Check,CQC-DB01] A recent replication attempt failed:
From CQC-ADC01 to CQC-DB01
Naming Context: CN=Configuration,DC=CQC,DC
The replication generated an error (1722):
The RPC server is unavailable.
The failure occurred at 2007-03-29 09:53:26.
The last success occurred at 2007-03-15 19:40:45.
656 failures have occurred since the last success.
The source remains down. Please check the machine.
[Replications Check,CQC-DB01] A recent replication attempt failed:
From CQC-ADC01 to CQC-DB01
Naming Context: DC=CQC,DC=com
The replication generated an error (1722):
The RPC server is unavailable.
The failure occurred at 2007-03-29 10:41:33.
The last success occurred at 2007-03-15 19:47:05.
1005 failures have occurred since the last success.
The source remains down. Please check the machine.
* Replication Latency Check
REPLICATION-RECEIVED LATENCY WARNING
CQC-DB01: Current time is 2007-03-29 10:45:11.
DC=ForestDnsZones,DC=CQC,D
Last replication recieved from CQC-ADC01 at 2007-03-15 18:55:17.
Latency information for 1 entries in the vector were ignored.
1 were retired Invocations. 0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc. 0 had no latency information (Win2K DC).
DC=DomainDnsZones,DC=CQC,D
Last replication recieved from CQC-ADC01 at 2007-03-15 18:55:17.
Latency information for 1 entries in the vector were ignored.
1 were retired Invocations. 0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc. 0 had no latency information (Win2K DC).
CN=Schema,CN=Configuration
Last replication recieved from CQC-ADC01 at 2007-03-15 18:59:13.
Latency information for 5 entries in the vector were ignored.
5 were retired Invocations. 0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc. 0 had no latency information (Win2K DC).
CN=Configuration,DC=CQC,DC
Last replication recieved from CQC-ADC01 at 2007-03-15 19:40:45.
Latency information for 5 entries in the vector were ignored.
5 were retired Invocations. 0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc. 0 had no latency information (Win2K DC).
DC=CQC,DC=com
Last replication recieved from CQC-ADC01 at 2007-03-15 19:47:08.
Latency information for 5 entries in the vector were ignored.
5 were retired Invocations. 0 were either: read-only replicas and are not verifiably latent, or dc's no longer replicating this nc. 0 had no latency information (Win2K DC).
* Replication Site Latency Check
......................... CQC-DB01 passed test Replications
Starting test: KnowsOfRoleHolders
Role Schema Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=S
Warning: CQC-ADC01 is the Schema Owner, but is not responding to DS RPC Bind.
RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrati
[CQC-ADC01] LDAP search failed with error 58,
The specified server cannot perform the requested operation..
Warning: CQC-ADC01 is the Schema Owner, but is not responding to LDAP Bind.
Role Domain Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=S
Warning: CQC-ADC01 is the Domain Owner, but is not responding to DS RPC Bind.
RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrati
Warning: CQC-ADC01 is the Domain Owner, but is not responding to LDAP Bind.
Role PDC Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=S
Warning: CQC-ADC01 is the PDC Owner, but is not responding to DS RPC Bind.
RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrati
Warning: CQC-ADC01 is the PDC Owner, but is not responding to LDAP Bind.
Role Rid Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=S
Warning: CQC-ADC01 is the Rid Owner, but is not responding to DS RPC Bind.
RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrati
Warning: CQC-ADC01 is the Rid Owner, but is not responding to LDAP Bind.
Role Infrastructure Update Owner = CN=NTDS Settings,CN=CQC-ADC01,CN=S
Warning: CQC-ADC01 is the Infrastructure Update Owner, but is not responding to DS RPC Bind.
RPC Extended Error Info not available. Use group policy on the local machine at "Computer Configuration/Administrati
Warning: CQC-ADC01 is the Infrastructure Update Owner, but is not responding to LDAP Bind.
......................... CQC-DB01 failed test KnowsOfRoleHolders
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I was able to reset the secure channel, but it look like the same issues presist. The only new error in the event viewer is that the direcotry partition has not been backed upin a number of days.
ASKER
More information: I just tried to look at the DC's through REPLMON and when I try to add the failing DC I get this error:
The server could not be contacted or you had insufficient permissions to read the status of the server.
The server could not be contacted or you had insufficient permissions to read the status of the server.
did you reboot the DC after resetting secure channel? please do if no
repadmin /showreps >rep.txt
(this will output the replication info to rep.txt)
netdiag /v >net.txt
Another thing you may want to check is if the Kerberos tickets have somehow gotten messed up. This happened to one of my client's networks just recently, and it took me about four hours to figure it out.
Go to the failing server, try to access a share on one of the other DCs. Go to the other DCs and try to access a share on the failing server. Even try it from a workstation. If any of them report "The login name is incorrect" or something like that, then the Kerberos tickets are messed up.
Give that a try and see what those tools tell you.