Link to home
Start Free TrialLog in
Avatar of GregBooth
GregBoothFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Intersite AD Replication Issue

I have a site that is connected to our main newtork via VPN.
I have configured the sunbnets for the site and everything seemed to be working fine. Foolishly I never checked the replication status and now when I run DCDIAG on the site Domain Controller I  get Latency errors and also Warnings that the Domain Owner, PDC Owner, Rid Owner Infrastructure Owner is not responding.

When running Netdiag on the site DC it passes but warns "Failed to query SPN registration on DC <DC1>" and "Failed to query SPN registration on DC <DC2>"

So I've obviously got a replication/topology problem.. any ideas how to resolve?

Thanks in advance.
Avatar of James
James
Flag of Ireland image

Have you got Reverse DNS Zones setup in DNS? I would suggest setting up Reverse DNS Zones with PTR records pointing to all DCs.
Assuming you can ping/traceroute to each other (i.e. your networking is sound...)? If not where does it get stuck? If you can't ping dc to dc what about site to site generally? You miht want to try pinging with a smaller MTU to see if that gets through....

If networking is fine.....

Most AD issues are caused by DNS. Are you getting any errors relating to service records? Are the DC's registered correctly on both DNS servers?

May also be an RPC endpoint issue.... try connecting logging on to the dc consoles locally and open ntdsutil and try to connect to the dc ntds instance;

run NTDSUtil
type Metadata cleanup, press enter
type Connections, press enter
type Connect to server localhost

Do you get a BindDSw error?
Avatar of GregBooth

ASKER

DNS and reverse DNS is setup, pinging site works no problem, VPN working no problem.
When I run the NTDSUTIL on the remote site DC and connect to localhost it connects ok. If I run NTDSUTIL on the site DC and connect to my DC at head office it says "The RPC server is unavailable"

:(
Have you checked the firewalls on the DCs to make sure there not blocking communication?
You could try from the command prompt netstat -a to make sure the DCs are listening on the port number for RPC.
ref Firewalls I have an IPSEC VPN configured between a DrayTek Vigor2710 ADSL router and a Billion BiGuard S10 Firewall. I am assuming that RPC will be tunnelled through the VPN and not be filtered... is that a wrong assumption?
That's correct. I was referring to the firewalls on the DCs "Windows Firewall".
Have you any Stale Domain Controllers in your Domain that you might have demoted recently?
Has the ISTP generated/removed the replication links in Sites and Services?
When I go to the Site DC - AD Sites and Services and try to manually start replication to DC on Default-First-Site I get " The naming context is in the process of being removed or is not replicated from the specified server"

For some reason it seems that the replication issues are because of the VPN conenction.
If you do a "net share" on each DC are the replication shares (sysvol etc) listed?
In Active Directory Sites and Services you should be moving each DC to its correct Site and Subnet. They should not be Default-Site-First.
The site DC is in it's own site with correct subnet. The HQ domain controllers are on Default-First-Site with correct Subnet. According to the Mircrosoft info I found this should not be a problem. Is it?
Each DC should be in its own site with its own subnet assigned.... if the DC is promo'd onsite this should happen automatically... if not it will require moving...
@ GregBooth, would it be possible for you to upload the output from DCdiag report? Also, can you try using a combination of portqry and telnet to see if there is connectivity for Active Directory such port 389, 135 etc, between the 2 sites.


Thank you,

JBond2010
When doing the following

portqry -n <problem_server> -o 1094,1025,1029,6004
Each Port Query returns NOT LISTENING

To me this is more evidence that it's an RPC issue and that RPC traffic is not getting through my VPN?
This is definitely the problem.
If you have any filters applied try disabling them.
Also, check the Windows Firewall on the DCs and make sure AD ports are not blocked.
Windows Firewall is not an issue where are filters applied?
The Site DC was setup at HQ prior to going onsite and there were no issues when it was at HQ.

Thanks everyone for their help so far.
Are the Windows Firewall on the DCs enabled? If so turn off the Windows Firewall to test the replication. Also, were there any changes make on the Router/Firewall VPN? This is where you need to check if there are any filters applied. Check any Firewall Rules or Filters on your Router/Firewall and see if this is blocking the AD ports.
Windows Firewall disabled, no filters on firewall/router.

When checking Operations Masters in ADUC RID, PDC and Infrastructure marked as ERROR

Starting toi lose the will to live. LOL
Can you run DCdiag on both DCs and upload the output information from both.


Thank you,

JBond2010
I've attached the DCDIAG outputs from my 3 DC's

I have 2 DC's at HQ 1 running PDC, RID GC etc and is the main DC with a second DC running in case DC1 is ever down.

I then have the DC that is connected via VPN to our remote Site.

 HQDC1.log HQDC2.log site-dc.log
Ok GregBooth, I have gotten to the source of the problem looking through the site-dc.log. The is with this error -

This latency is over the Tombstone Lifetime of 60 days!
SOLUTION
Avatar of James
James
Flag of Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
MTU size on both sides is 1500. I assume I'd have to do the MTU registry change to all DC's?

Thanks.
Pinging all my DC's allow MTU's up to 1472.
When I do Portqry 3268 and 3269 locally on my site DC it's NOT LISTENING. I'm assuming it need to listen on 3268 and 3269 for AD replication?
Ports 3268 and 3269 are for the Global Catalog. Are the DCs GLobal Catalogs?
portqry 135 to my site DC is now LISTENING as are the GC ports.

I decided to try to demote Site DC and then Promote again but when running DCPROMO it fails with "Logon Failure: The target account name is incorrect"
The fact that you can portqry 135 and GC ports is good news.
To get around the issue of "Logon Failure: The target account name is incorrect" wait for replication to complete. Replication should now be working because the ports are responding.
Still no joy replicating.

DCDIAG still displaying Tombstone warnings and Domain Owner, PDC Owner, Rid Owner and INfrastructure Owner not responding!

:-(

I can't do a DC promo because it's not replicated... should I demote it and specify it as last Server in Forest? Then manually delete it from the HQ DC's?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial