Solved

Active Directory Replication & DFS Not Working

Posted on 2016-10-24
10
84 Views
Last Modified: 2016-10-26
I have a domain with three domain controllers on three different sites. Two of the domain controllers have DFS setup between them

The servers are :

WIN-ADC-SRVR1 which is in the main office and holds the FSMO roles and has DFS setup to WIN-ADC-SRVR2
WIN-ADC-SRVR2 which is in the factory and has DFS setup to WIN-ADC-SRVR1
WIN-ADC-SRVR3 which is located in another office and is used for just local logon authentication.

Since Sunday the AD replication has stopped working and so has the DFS replication on the data folder that is setup between WIN-ADC-SRVR1 AND WIN-ADC-SRVR2 but it would seem that active directory replication is working between WIN-ADC-SRVR1 AND WIN-ADC-SRVR3.

I have checked DNS and all seems to be working there and the servers can resolve each other via IP address and FQDN and even the SID listed in the DNS server.

Nothing has changed with the VPN links between the sites and although the links are just ADSL the data sent between the office and the factory server is only very small and there has never been a problem of this scale.

If anyone can provide any information that could help me with this issue, it would be greatly appreciated.  If logs are needed please let me know.

I have already attached the dcdiag log as well as the dcdiagDNS and repadmin tests from each server

Thanks Andy
LOGS.zip
0
Comment
Question by:AndyBooker1
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
10 Comments
 
LVL 40

Expert Comment

by:footech
ID: 41857866
I would investigate the errors you're seeing in regards to KccEvent.  Dig down in the details and result codes from the replication errors (those should also be in the event logs).

If I were you I would configure the NIC settings on each DC to use itself as preferred DNS, and another DC as alternate.  Normally I'd recommend putting another as preferred and itself as alternate, but you've only got one DC per site.
0
 

Author Comment

by:AndyBooker1
ID: 41857933
The NIC setting on the DC on all site is set to 127.0.0.1, which is what it has always been.

I have had a look at the errors in the logs and most of them point to the remote procedure call has failed, but the service is running on all of the servers and is set to automatic.  When I have googled the issue, it mostly talks about DNS.

The DNS is working fine as I can ping each of the server my name and it resolves.  Also I can ping the server by its SID in DNS which also resolved to the correct IP address.
0
 
LVL 40

Expert Comment

by:footech
ID: 41858577
That doesn't change my recommendation.

Just a minor detail, but it's a GUID for the DC which is in DNS, not a SID.

Everything I've seen so far points to RPC not working between WIN-ADC-SRVR1 and WIN-ADC-SRVR2.  You can try using portquery from MS to test.  Also try this command from WIN-ADC-SRVR1
repadmin /bind WIN-ADC-SRVR1 (and the reverse).  Since WIN-ADC-SRVR3 doesn't seem to have any trouble communicating that would rule out a service issue.  More likely that a firewall or other connection issue is involved.

It might be helpful if you post the actual event IDs (and their sources) which you are seeing.

Also, it appears that you don't have all sites defined in Sites and Services.
0
Why You Need a DevOps Toolchain

IT needs to deliver services with more agility and velocity. IT must roll out application features and innovations faster to keep up with customer demands, which is where a DevOps toolchain steps in. View the infographic to see why you need a DevOps toolchain.

 

Author Comment

by:AndyBooker1
ID: 41858713
Hi,

Thanks for the correction of GUID not SID.  

I have run repadmin /bind WIN-ADC-SRVR1 on win-adc-srvr1 and have run the same command on win-adc-srvr2.
I will update on any changes if any.  I also did think it might be a firewall issue but it is disabled on all servers with GP and the service isn't even running.

Also you mention i dont have the all the site defined, there are three 1. Office 2. Factory 3 Hornchurch, as far as I can see these are all defined as they should, but please correct me if I am wrong.
0
 

Author Comment

by:AndyBooker1
ID: 41858728
Hi,

Ok After running repadmin /bind WIN-ADC-SRVR1 on all the servers and then running repadmin /showrepl and
repadmin /replsummary all the servers are now replicating as they should do.

As I said in my last comment if you explain where I have not setup the site and services correctly it would be apppricated and also what the command repadmin /bind WIN-ADC-SRVR1 does as well, if that too much trouble.

Thanks for your help.

Andy
25-10-2016-15-44-03.png
0
 
LVL 40

Accepted Solution

by:
footech earned 500 total points
ID: 41859011
Sorry, I meant to say "subnets", not sites.
The reason is the event in the system log.
During the past 4.00 hours there have been 173 connections to this Domain Controller from client machines whose IP addresses don't map to any of the existing sites in the enterprise. Those clients, therefore, have undefined sites and may connect to any Domain Controller including those that are in far distant locations from the clients.

Don't you have any network/edge firewalls between the sites?

I didn't expect the repadmin /bind command to actually fix anything, but more as a test for communication.  All it does is try to establish an LDAP connection and display some info on features.
I also had a slight misstype.  I meant run
repadmin /bind WIN-ADC-SRVR2 on win-adc-srvr1
and
repadmin /bind WIN-ADC-SRVR1 on win-adc-srvr2

But if all's working now, then great (and I won't worry further)!
I could only speculate that somehow communication got interrupted, and the repadmin command re-established it.  But why it wouldn't re-establish by itself I don't know.
0
 

Author Comment

by:AndyBooker1
ID: 41859136
Hi,

The subnets are defined in the sites and servers see the screen shot attached, so not to sure what you mean, if there is anything missing can you point it out, as I would like the configuration to be correct.  But this has always been as it is and has not changed.

There is a firewall on the each site, on the cisco routers which also provide the VPN tunnels.  But between the site to site links there is no firewall protection.

If you can elaborate on the site and services, it would be greatly appreciated. I will then close off the question.

Thanks very much for your help, I was getting to the stay where I was at a bit of a loss.  Lets hope this is a permanent fix, and it doesn't break in a couple of weeks time, which I have seen before.
25-10-2016-19-12-17.png
0
 
LVL 40

Assisted Solution

by:footech
footech earned 500 total points
ID: 41859706
I can't tell you any more than the event text.  You would have to examine the log file to see what the IPs are that are connecting.  If those IPs should always try to authenticate to a DC in a particular site, then define the subnet and attach it to the site.  Just by selecting "Subnets" in the left pane in Sites and Services you can see in the right pane which site they are attached to.
0
 

Author Closing Comment

by:AndyBooker1
ID: 41860388
Thanks for you input on this issue your help has been gratefully received and is appreciated
0
 
LVL 40

Expert Comment

by:footech
ID: 41860561
Glad things are working for you.
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This script can help you clean up your user profile database by comparing profiles to Active Directory users in a particular OU, and removing the profiles that don't match.
Did you know that more than 4 billion data records have been recorded as lost or stolen since 2013? It was a staggering number brought to our attention during last week’s ManageEngine webinar, where attendees received a comprehensive look at the ma…
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.
Attackers love to prey on accounts that have privileges. Reducing privileged accounts and protecting privileged accounts therefore is paramount. Users, groups, and service accounts need to be protected to help protect the entire Active Directory …

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question