Multi-site AD Replication Issues


DC holding all fsmo roles were seized to a new server.  Metadata cleanup was performed, with the exception of one entry.  The server would not delete from ADSS due to "Access denied" and "Insufficient privileges to delete site".  User account was Enterprise admin, and would not  delete with ADSS or ADSIEdit.

Due to this condition, I changed the new server to the original name of the failed DC.  I transferred the pdc role successfully.  The other four roles were on another server, at the same site.

There are a total of nine sites including the main hub.  The main hub hosts the fsmo roles, and acts as the primary DNS zones.  All the other 8 sites are connected via vpn tunnels.

There are two remaining remote sites, which were never able to replicate from the main hub.  They still "think" the old servers are holding all the fsmo roles.  therefor, they are failing dcdiag tests KnownsOfRoleHolders.  These attributes known as fsmoroleowners, cannot be modified on the bad DCs via ADSIEDIT or LDP.

These servers also are not replicating the dns primary AD integrated zones.  They had old zone data, and I deleted them, and rebuilt the primary dns zone from scratch.

This caused the majority of the domain controllers to replicate perfectly.  Two stubborn ones I have not figured out how to fix.

I'm pretty sure this issue is being caused by multiple factors, based on dcdiag, netdiag, any many other tools. I need to get the dns zone to replicate to the two bad DCs.

I also believe there to be KCC inconsistinces (based on event log errors), such as "kerberos client received a KRB_AP_ERR_MODIFIED error".  

In ADSS or repadmin /replicate, I receive the error "Naming context is in the process of being removed..."

How do I get the kerberos ticket issues resolved?  I've done some basic klist tickets and stuff, but need further guidance.  I've also reset machine accounts with netdom with some degree of success.

I'm positive this is largely DNS.  So, how to get DNS as a primary zone, using the current AD integrated zone?  There could even be other copies of the same zone stored in AD, how do I purge these?

I'm also receiving some "The default SPN registration ... is missing" warnings on netdiags.

I want to resolve these issues without demoting/re-promoting.  These servers will not demote cleanly, and there has to be a better way than to go that route. The 60 day tombstone lifetime has not been met.

Just some more info to save you guys some time, the dcdiag's on the role holder DCs come back clean.  It's just these two that are being a pain (both 2003).  The two bad DC's give me lots of KCC errors, and DNS issues, but it's mostly because they have not replicated.

How do I get these servers to pull the replication data from the main hub???  I even made the bad servers secondary copies of the primary zone, but they still will not replicate.

One last note, there are some SPN errors with Netdiag, there is the possibility of duplicate or missing spn records in dns.  I've dabbled with setspn to list, and attempt to add spn that are reported missing, but could user further guidance here as well if necessary.
Who is Participating?
DynamicQuestAuthor Commented:

For the dns issue, I created a secondary zone copy on the remote servers.  I then changed them to AD integrated primary zones.  They transferred the zone from the master.

I was then able to track down KRB_AP_ERR_MODIFIED errors in the event logs, reset the machine accounts with netdom /resetpwd and replicated ok.

Thanks for everyone's input!
Have you tried setting the broken DC's to use the good master as their DNS server?

see if that gets them to recognize the new settings?
DynamicQuestAuthor Commented:
Yes, I just now changed the secondary zone to primary, and AD integrated.  They are both pointing to the master.
Making Bulk Changes to Active Directory

Watch this video to see how easy it is to make mass changes to Active Directory from an external text file without using complicated scripts.

The actual DNS setting on the network cards on the servers

rather than being but the IP address  of the server as the DNS server
DynamicQuestAuthor Commented:
No, not set to loopback address.
But so that the faulty machines are referencing a good dns server?
DynamicQuestAuthor Commented:
Correct.  I actually backed up a copy of the primary zone, and deleted it.  I started fresh, and did ipconfig /registerdns on all the remote sites to generate the necessary records.

SO as far as I can tell, the dns server is good.  I ran dnslint and got good results.  

I'm doing some more dcdiags now, let me know if you can think of any command output that can help.

By the way, the replication error I'm receiving is "The naming context is in the process of being removed..."  which can mean a lot of things.

I appreciate the quick responses.
Well, assuming you have good DNS data in the remote sites I would try the running the following command on a DC in the 'bad' spoke site.

Firstly run...
'repadmin /KCC' which should force the domain controller to review the replication topology.

Secondly run...
repadmin /replicate <badspoke-DC-FQDN> <goodhub-DC-FQDN> CN=Configuration,DC=<YOURDOMAIN>,DC=<YOURDOMAIN>

As an example
repadmin /replicate badDC1.fred.local goodDC1.fred.local CN=Configuration,DC=fred,DC=local

It may also be worth reading this...
You reset the failing DC computer account password?
DynamicQuestAuthor Commented:
DynamicQuestAuthor Commented:
Yes - using these commands

net stop kdc

netdom /resetpwd /server:brokendc /userd:...

net start kdc

this resolved the The kerberos client received a KRB_AP_ERR_MODIFIED errors I was getting in the event log after fixing DNS.
DynamicQuestAuthor Commented:
I was able to resolve this issue, and want to share my fix with everyone else.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.