AD Replication fails to Server 2008 for ~12 hours after reboot

Posted on 2008-10-15
Last Modified: 2008-10-25

First, let me give you some information on our current setup: We have a main site and two branch offices. Two DCs are installed at our main site, and one is at each branch office. The main site and the two branch offices are each in a different subnet and in different sites on Active Directory Sites and Services.

The DCs are:
DC1 (main site, Server 2003 x86)
DC2 (main site, Server 2003 R2 x64)
DC3 (branch office 1, Server 2003 x86)
DC4 (branch office 2, Server 2008 x64)

The two branch offices are connected to the mainsite over a 10Mbit Fibre Connection. The connection is very stable and all ports are open between the sites.

DC4 is the newest DC in our setup and I've been having this problem from the moment i've ran dcpromo on DC4. I have even reinstalled DC4, but it did not change anything.

For debugging reasons i have set DC4 to replicate with all other DCs. I first tried to replicate only with one DC at our main site, but this had the very same effect. Once I reboot DC4, it will not have any INBOUND replication for roughly 12 hours. OUTBOUND replications work fine. Running repadmin /showrepl gives nothing but successes on DCs 1-3 (also replications with DC4 are reported successful), but fails miserably for every try on DC4.

repadmin /showrepl ran against DC4 gives two different errors. For DC=domain, DC=tld ; Configuration ; Schema, the error is "result 1396 (0x574): Logon Failure: The target account name is incorrect.". For DomainDnsZones and ForestDnsZones, the error is "result 1256 (0x4e8):The remote system is not available. For information about network troubleshooting, see Windows Help."

Additionally, I have a couple of errors 1311 and 1645 in Event Viewer, along with 1925 and 1566 Warnings (they pop up every 15 minutes, so I assume for every replication try).

As mentioned, this only happens for the first 12 hours after a reboot of DC4. Once this certain time (or number of retries?) has passed, all event errors / warnings stop and changed items are replicated. repadmin /showrepl shows no errors anymore and the replications works fine as far as I can tell.

Any ideas?
Question by:Scripting_Guy
  • 4
  • 2

Author Comment

ID: 22723013
Ok, I once again had this effect and it just went from "all bad" to "working smoothly". I rebooted the macine this morning and it came back up at 8:44 am. First Replication Error was at 9:04 am. Retries were made every 15 minutes

Last Replication Error was at 6:04 pm. 6:19 pm was the first successful replication. If my math is correct, it took exactly 40 failed retries or 10 hours until it started working.

LVL 31

Expert Comment

by:Henrik Johansson
ID: 22736058
Ensure that site link replication is configured in both directions for all DCs to avoid dead ends.

Author Comment

ID: 22752716
This is the case for all connections.
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.


Author Comment

ID: 22752730
Apparently nobody has an idea about this issue. Therefore I will open a support call with Microsoft by Monday morning about this issue and will let you know what they figured out (supposed my boss is willing to spend the cash, but I assume he is).

Accepted Solution

Scripting_Guy earned 0 total points
ID: 22759212
Microsoft called me back this afternoon and they are aware of the problem. It took them about 1/2 year to fix it, and the solution is this hotfix here:

The problems occur if you ever had a authoritive restore of your User krbtgt (kerberos account). In our case, we deleted a couple of users that should not have been deleted some time ago, and we restored the whole User OU. The authoritive restore will increase the version number of all items by 100'000, making it 100'002 instead of 2 for the krbtgt user. This causes the problem.

Although, the problem described in the KB has absolutely nothing to do with my problem, this hotfix will solve it. Note that the hotfix in this article is not downloadable directly from the Microsoft Homepage, you have to call them / write an email so they send you the links and passwords via email. You have to install the hotfix on all 2003 DCs and reboot them, reboot the 2008 Server afterwards and the replications work.

Maybe this will help someone who has the same issue as we had.
LVL 31

Expert Comment

by:Henrik Johansson
ID: 22759510
Good to hear it was solved.

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

OfficeMate Freezes on login or does not load after login credentials are input.
This article shows how to deploy dynamic backgrounds to computers depending on the aspect ratio of display
This tutorial will show how to push an installation of Backup Exec to an additional server in both 2012 and 2014 versions of the software. Click on the Backup Exec button in the upper left corner. From here, select Installation and Licensing, then I…
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now