Active DIrectory Replication Issue

We have a single domain that had 2 sites for over 3yrs connected via ISA Server 2006 VPN.  DC1 and DC2 have been replicating without issues.  We added a 3rd site with ISA Server 2006 VPN and all three sites can communicate with each others servers without issues.  Just last week we put in a new server in site 3 to be a DC.  Created the site in ADSS and then ran a DCPROMO on DC3.  DC3 is getting updates from both DC1 and DC2, but neither DC1 or DC2 get any updates from DC3.  We have AD integrated DNS and there is an A record in DNS at all three sites and DC1 and DC2 can ping DC3.  From DC1 and DC2 you can view SYSVOL and Netlogon Shares.  But in site 3 on DC1 and DC2 the NTDS connection will not populate for site 3.  DCDIAG on DC3 shows no errors.  If I try to manually create a connection in ADSS for site 3 on DC1 or DC2 to DC3 I get either the directory property can't be found in cache when I check the topology or if I manually add a connection to site 3 and then replicate I get an RPC server is unavailable.  Not sure what to check next.  I think it is a DNS issue somewhere on DC1 and DC2.  But DC3 is getting the replication of users/computers/ etc in AD from DC1 and DC2 just not the other way.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Have you tried stagering your DNS servers in your TCP/IP properties? So Primary DNS Server for each DC is itself and secondary is an alternat DC. So DNS would be:

Then make sure your site links are configured in Sites and Services. The Topology Checker will create the replication toplogy for you once you have the site built and link creatted.
problem occurs because of time synchronization.

Have u checked the time , may be time has not been synchronized with dc 2 and dc 3 ,

net time \\mypdc /set /y
AIMAviationAuthor Commented:
I have ran the net time 3 times already and as far as the DNS setup, it is already the way described.  The primary DNS is self and secondary is a different dc.  When I try to use the topology checker in both DC1 and DC2 on site 3 I get the directory property cannot be found in cache error.
Introducing the "443 Security Simplified" Podcast

This new podcast puts you inside the minds of leading white-hat hackers and security researchers. Hosts Marc Laliberte and Corey Nachreiner turn complex security concepts into easily understood and actionable insights on the latest cyber security headlines and trends.

I am wondering if it is a communication issue? Could be firewall, VPN or routing? Have you tested the connection well from both sides? Like ping form DC3 to DC 1 and from DC1 to DC3. Maybe even copy a file both ways.
Oh, P.S. try the above with both names and IP addresses, that could uncover a name resolution issue.

Please check the FRS logs on DC3. I am sure you are not getting 13516. It may be giving you 13508.
Please update us about the same.

-- If this is so, the see if you can see Sysvol and Netlogon on DC 3. (Command: Net Share).
-- Download PortQryUI.exe and run it on DC3 for DC1 and DC2 by selecting Domain and Replication. It will tell you if there are any ports blocked.


AIMAviationAuthor Commented:
Ok for Encrypted pings by name and IP from all DCa to all DCa work fine.  I am able to copy files to shares between DCs by name and ip with no issue.  It may be a firewall block, but that seems unlikely with all ISA servers having almost identical configs and the fact that DC3 is getting new info from DC1 and DC2 being populated into active directory fine.  Just any changes made to AD or DNS on DC3 doesn't go to DC1 or DC2.

You are correct it is gibing 13508 errors.  And of the options either DNS or number 3 about the replication topology seems most likely the cause.  On DC3 with Net Share I see both Sysvol and Netlogon.  I will download portqryui and report the results.  

Thanks for the suggestions.  
It's just weird that DC3 seems get updated for DNS and AD but can't replicate any updates from it to DC1 to DC2.  
AIMAviationAuthor Commented:

Downloaded portQryUI 1.0.  There was no option for Domain and Replication just Domains and Trusts.  When I compared the results of the PortQryUI from DC1 to DC2 and DC1 to DC3 the return codes and listening/filtered looked identical on all except for TCP port 42 nameserver service.  It shows DC3 as not listening but DC2 is.  I have not used this utility before, so not sure exactly what I am looking at.  Nothing jumped out as blocked, but once again that could be just my lack of experience with this.  If you want I can attach the results from the all 3 DCs
Try running dcdiag from each DC from the command line. It will test your replication and tell you if there are any errors. It may point you at the cause.
port 42 is your DNS services. If not listening, you could certainly have the problems you are seeing.

Look up  DNS rules for ISA to make sure the ISA firewall isn't blocking DNS.

Also go to DC3 or the new DC and navigate to the command prompt. Now type these commands:

IPconfig /flushdns
IPconfig /registerdns
Net stop Netlogon
Net start Netlogon

Furthermore, if you are using Service Pack 1 on any servers, download an install SP2. There is a discrepancy in SP1 code that can knock down the DNS server service.

With that said, your sites are configured with a VPN tunnel between them, What you will want to do is make sure that your External site DCs also have listed on their nic card, the primary site DNS server as an Alternate DNS server for DNS resolution. Then, these sites will see the corporate site for DNS and be able to replicate DNS and AD data between them.

For your VPN routes, does your VPN connection see the route to the outer site without it going through the default route?
AIMAviationAuthor Commented:
Ran all the commands listed in your post.  All servers are SP2 windows 2003.  Ran the portqryui on local host on DC3 and it showed not listening on port 42.  Just to be safe I checked all rules in ISA and verified that DNS and other services were on the allow list.  Made sure of the DNS on local nic is pointed to Self and external site DC as an alternative.

VPN is handled through ISA VPN connection routing and the servers don't have static routes to each other.  The routing table shows the routes all routing through the default gateway of the respective ISA servers.

Ran a netdiag on DC3 and all results came back passed.
Hmm, DCDIAG is usually pretty good at pointing out errors. Did you try any additional switches like verbose or comprehensive. Also it may be beneficial to run it on all DC's, the issue could be on a replication partner and not on DC3 necessarily. It may be a good idea to add all of your DC's to the names servers tab in DNS if they are not already there.

It sounds like it may not be a configuration issue, you may have something broken. I have run into similar symptoms when I got a corrupted directory partition. Stuff would replicate one way sometimes.... Unfortunately I had to place a ticket with MS to fix that one. They fixed it right up no problem though.
The default tests for DCdiag will not usually look at DNS. What you usually get is DNS tests omitted.

So, what you need to do is type:

DCdiag /test:dns

If DCdiag shows no problems, then it's not a DC error. We should move onto troubleshooting routing through the VPN tunnel. If I am not mistaken, through a VPN tunnel, it's best to have a static route, rather than use the default route. Otherwise you may see a problem with the route not being established.
Hmm. That default route thing could be something to look at. With ISA you only have a default route set on your external NIC, leave the internal blank. You need to add routes internally for VPN traffic. Type: route print at the command line of ISA, that will show the routes. You can add static routes by using route add command. You could also install a routing protocol like RIP.

Did you add the new subnets for your new site to the "Internal Network" in ISA on both of your existing sites?

Quick question. You mentioned not being able to recieve computer/user updates, does DNS replicate properly?
AIMAviationAuthor Commented:
DC3 is receiving new DNS and AD information from DC1 and DC2.  Today I just added a new user and computer in both site 1 (dc1) and site 2 (dc2) and both are showing in DC3 AD and DNS records.  I ran dcdiag /v on all three DCs.  It shows replication errors to DC3 from all servers and on DC1 and DC2 I get the following errors also:  

[Replications Check,DC1] A recent replication attempt failed:
   From DC3 to DC1
   Naming Context: DC=Domain,DC=local
   The replication generated an error (8524):
   The DSA operation is unable to proceed because of a DNS lookup failure.
   The failure occurred at 2010-04-05 11:57:54.
   The last success occurred at (never).
   15 failures have occurred since the last success.
   The guid-based DNS name 3bd9b321-fc54-4daa-a7ba-878c073e5eef._msdcs.Domain
    is not registered on one or more DNS servers.

ISA has the VPN subnets for internal and everything else appears to be flowing properly.  I will test if I add an A record to DNS on DC3 to see if it appears on DC1 or DC2 but I doubt it will.
Look in event logs for journal wrap errors. They could be found in the FRS events in the 13000's

Another thing we should check is the SRV records. Look at your MSDCS file folders within the forward lookup zone and see if they are greyed out. I don't beleive that will be the case with a clean DCdiag.

You may simply be in journal wrap with good DNS. Unless you have problems with communicating via hosts, you may have a good DNS, just a connection timeout or a problem with the connection gave you a partial data replication set. When you get a partial data replication set, the FRS service stops and you can end up in journal wrap. To fix this you might have to use the burflag method to check this.

I wrote an article that pertains to this, on how replications is most likely caused by DNS but isn't always.

To check your SRV records look for this:

Also if this replication hasn't happened within 90 days, I beleive, then you may have a tombstoned server. That should show up on DCdiag reports, though.

Furthermore a greyed out DNS SRV delegation record looks like this, but will also show up on DCdiag reports as can't find DC.

It's the SRV records that is used to replicate this data. Host records are used for the initial connection.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
AIMAviationAuthor Commented:
I'll look into that ChiefIT.  I'll report on what I find.  

AIMAviationAuthor Commented:
Ok this is going to sound weird, I added WINS to DC3 and updated the config of the nic to include WINS for DC3 pointing to self.  I rebooted DC1 and DC3 and now noticed an automatically generated connection in site 3 on DC1, not orginally there.  I will monitor this.

Also found a white paper for AD replication over firewalls.  I may follow the suggestion for the limited RPC method to fix the port for AD and FRS.  Back to more testing first then, will try this.

Let's define what services do what, to make things a little easier for you.

WINS will replicate the MASTER BROWSER LIST between sites. This is a list of printer and file shares that you use in "my network places" and usually mapping to a share. So, it deals with printers and shares, but not the AD database, DNS database, or distributive file share replication, (like the sysvol and netlogon share).

DNS is a pointer for FRS> FRS is used for AD, DNS and Syvol/Netlogon shares.

DNS is a pointer for DFSR> DFSR (in 2003 R2 and newer) is used for most CIFS (Common Internet File shares, like network shares), replication.

NETBIOS broadcasts is used for the browselist. Netbios will broadcast out the locations of the files and printers to all within the broadcast domain. This means netbios broadcasts are NOT routeable. They will not go through a VPN tunnel, across NAT, Across most firewalls, to different subnets, to a different VLAN, etc....  WINS makes those netbios broadcasts routeable. Or an LMhost connection between the site master browsers to the DOMAIN master browser makes the browselist routeable.

1) FRS and DFS replication replicates the actual data.
2) Netbios is used as a pointer to Distributive shares and printers on that broadcast domain.
WINS also does host name resolution. If dns was broken across the WAN it is possible that WINS could be masking that issue. I find DNS is often almost worthless across WAN links and I always use WINS in addition to DNS for name resolution.

Of course it is also possible that the addition of WINS just happened at the same time that AD started working correctly and it is a total fluke.
AIMAviationAuthor Commented:
Doubt it was a fluke, been testing things one at a time and waiting a day or so between tests.  But anyway you put it, replication has happened.  Still see event log issues and still need to look into the errors/warnings that keep showing up, but after a full day of changes all 3 DCs in all 3 sites appear to replicate.

Thankyou Chief and Encrypted for the help.
AIMAviationAuthor Commented:
Thanks for the experts knowlegde and help with this issue.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Active Directory

From novice to tech pro — start learning today.