Link to home
Start Free TrialLog in
Avatar of chriscrisp
chriscrisp

asked on

Problems with Domain Controller replication

Where do I start?

I am having problems with a secondary DC replicating.  My network consists of two campuses.  The primary campus is running one DC running one domain, this is the PDC since the campus is larger and the server is faster.  The satellite campus has the problem DC.  The connection between the two campus's is VPN over DSL.  Replication is expected to be slow, but recently, it has quit working all together.  Some changes were made, but I did not log the changes.

So far I have demoted the problem DC and promoted several times.  I was running DNS on this server, but in my efforts to resolve the problem have removed the DNS service.  I can communicate with the PDC (ping, authentication, etc.), but I am unable to successfully complete the conversion to a DC (the SYSVOL and NETLOGON shares have not been created).  I have recieved replication errors in my logs.  I have ran DCDIAG and recieve the error:

     Starting test: Advertising
         Warning: DsGetDcName returned information for \\TBSEXCHANGE.TBSDOMAIN.COM, when we were trying to reach NH-DC1.
         Server is not responding or is not considered suitable.
         The DC NH-DC1 is advertising itself as a DC and having a DS.
         The DC NH-DC1 is advertising as an LDAP server
         The DC NH-DC1 is advertising as having a writeable directory
         The DC NH-DC1 is advertising as a Key Distribution Center
         The DC NH-DC1 is advertising as a time server
         ......................... NH-DC1 failed test Advertising

and when I run netdiag, I recieve the error:

Global results:

Domain membership test . . . . . . : Failed
    [WARNING] Ths system volume has not been completely replicated to the local machine. This machine is not working properly as a DC.

I am unsure what is wrong, or what to do.  Any help would be greatly appreciated.
 
Avatar of jwilding
jwilding

After multiple promotes and demotes, you will likely have a scrambled satellite system, don't try and make it work.  Wipe the OS and reinstall.  Delete the original machine from Active directory users and computers on the main campus DC.  Delete ALL references to the original machine from Main campus DNS, including all resource record (SRV) references to the satellite DC in the MSDCS folder in DNS console on the main campus DC.  An hour or so after doing this, rejoin the satellite to the domain, remembering to point the satellite at the main DCs DNS.  I would give the satellite DC a different name and IP address before I joined the domain, it can make life a lot easier later, but it's up to you.  Make the satellite a Global Catalog server.  Install the DNS service on the satellite, but (assuming your main campus DNS is AD integrated), do not run the zone wizard during setup, just cancel at that point, DNS zones will drop in in a moment.  Once this is done, check AD sites and services that you have two site names (you can rename "defaultfirstsitename"), with appropriate subnet ranges attached to them and that your two DCs are in their respective sites.

Keep your satellite pointing to the main site as it's primary DNS, but add itself as a secondary.  But all clients at satellite should point to satellite as primary DNS.  Set up DNS forwarders at satellite to point to your ISP,s DNS.

J

J
Avatar of Netman66
Check the logs on the first DC.  I think you may find a Journal Wrap error up there.

If not, it may be a network issue.


See what errors you can dig up from the main server.

is the sysvol folder structure there on the replica DC?
Is the sysvol folder shared --see bottom entry for registry entry

Good Info
What Happens When SYSVOL is Created During Domain Controller Promotion

The SYSVOL shared folder is built by the Active Directory Installation Wizard (Dcpromo) during the installation of Active Directory. The process is as follows:

1.Dcpromo calls FRS to prepare for promotion. If FRS is already running on the server to be promoted, Dcpromo stops the FRS service.

2.Dcpromo deletes information from previous demotion or promotions (primarily FRS-related registry keys).

3.The Net Logon service stops sharing the SYSVOL shared folder (if it exists), and the SysvolReady registry entry is set to 0 (false).

4.Dcpromo creates the SYSVOL folder and the necessary subfolders and junction points.

5.The FRS service is started.

6.Dcpromo makes a call to FRS to start a promotion thread that sets the necessary registry keys.

7.Dcpromo reboots the server.

8.When the server is restarted, FRS detects that the server is a domain controller and then checks the registry for the SysVol Information is Committed registry entry. Because this entry is set to 0 (false), FRS creates the necessary Active Directory objects and then populates information from the registry to the Active Directory objects and creates the reference attributes as necessary.

9.FRS begins to source the SYSVOL content from the computer that is identified in the Replica Set Parent registry entry. (This key is temporary and is deleted after SYSVOL has been successfully replicated.) The connection to the server specified in Replica Set Parent is also temporary. This connection, called a volatile connection, is used to perform the initial vvjoin so that the new domain controller does not need to rely on Active Directory replication for the new connection objects to be replicated.

10.When SYSVOL is finished replicating, FRS sets the SysvolReady registry entry to 1 (true), and then the Net Logon service shares the SYSVOL folder and publishes the computer as a domain controller.

More of the same info
http://redmondmag.com/features/article.asp?EditorialsID=373

check this registry entry to share sysvol
HKLM\System\CuuurentControlSet\services\netlogon\parameters\sysvolready must be 1

Avatar of chriscrisp

ASKER

So are you saying I should changed the value of HKLM\System\CurrentControlSet\services\netlogon\parameters\sysvolready to 1, or should I verify that all the steps leading up to that point have completed and then change the registry entry?  If I need to verify the steps, how?   Also, I noticed that the article referenced was discussing Win2k server, but the OS I am using is Win2k3 server.  Will this procedure still apply?

I do like the direction of this possible solution.  

Thank you.
I did check the Replication log on the PDC and I did recieve a journal wrap error on the 22nd.  I also found this error in the Replication log on the 23rd:

Event Type:      Warning
Event Source:      NtFrs
Event Category:      None
Event ID:      13562
Date:            2/23/2006
Time:            5:59:16 PM
User:            N/A
Computer:      TBSEXCHANGE
Description:
Following is the summary of warnings and errors encountered by File Replication Service while polling the Domain Controller TBSEXCHANGE.TBSDOMAIN.COM for FRS replica set configuration information.
 
 The nTDSConnection object cn=360116c3-aa39-4360-9198-b88f8d916781,cn=ntds settings,cn=nh-dc1,cn=servers,cn=nh,cn=sites,cn=configuration,dc=tbsdomain,dc=com is conflicting with cn=tbsexchange,cn=ntds settings,cn=nh-dc1,cn=servers,cn=nh,cn=sites,cn=configuration,dc=tbsdomain,dc=com. Using cn=360116c3-aa39-4360-9198-b88f8d916781,cn=ntds settings,cn=nh-dc1,cn=servers,cn=nh,cn=sites,cn=configuration,dc=tbsdomain,dc=com

I also found this in the Directory Service Log:

Event Type:      Warning
Event Source:      NTDS KCC
Event Category:      Knowledge Consistency Checker
Event ID:      1925
Date:            2/24/2006
Time:            11:46:51 AM
User:            NT AUTHORITY\ANONYMOUS LOGON
Computer:      TBSEXCHANGE
Description:
The attempt to establish a replication link for the following writable directory partition failed.
 
Directory partition:
DC=TBSDOMAIN,DC=COM
Source domain controller:
CN=NTDS Settings,CN=NH-DC1,CN=Servers,CN=NewHope,CN=Sites,CN=Configuration,DC=TBSDOMAIN,DC=COM
Source domain controller address:
0e852ee9-bddf-47d5-bb7d-c3b0be24b48c._msdcs.TBSDOMAIN.COM
Intersite transport (if any):
CN=IP,CN=Inter-Site Transports,CN=Sites,CN=Configuration,DC=TBSDOMAIN,DC=COM
 
This domain controller will be unable to replicate with the source domain controller until this problem is corrected.  
 
User Action
Verify if the source domain controller is accessible or network connectivity is available.
 
Additional Data
Error value:
8524 The DSA operation is unable to proceed because of a DNS lookup failure.

But this was last seen on the 24th

and finally, it looks like I don't have NTP setup correctly:

Event Type:      Warning
Event Source:      W32Time
Event Category:      None
Event ID:      12
Date:            2/24/2006
Time:            8:08:15 PM
User:            N/A
Computer:      TBSEXCHANGE
Description:
Time Provider NtpClient: This machine is configured to use the domain hierarchy to determine its time source, but it is the PDC emulator for the domain at the root of the forest, so there is no machine above it in the domain hierarchy to use as a time source.  It is recommended that you either configure a reliable time service in the root domain, or manually configure the PDC to synchronize with an external time source.  Otherwise, this machine will  function as the authoritative time source in the domain hierarchy.  If an external  time source is not configured or used for this computer, you may choose to disable  the NtpClient.

I only mention this because I noticed some posts relating to NTP causing problems with replication.  In this case maybe, maybe not?

I like the direction of this too.

Thanks
Ok, you need to step through these things to make sure the basics are correct.

1) Check DNS on the main site.  Make sure that the Forward and Reverse Zones are present.  If this is a pure 2003 DNS then there should be two zones in the Forward zone (_msdcs.forestrootdomain.com and forestroot.com).  The scope of the _msdcs should be "All DNS servers in the Forest" and the scope for the domain zone should be "All DNS servers in the domain".  Each zone to be AD Integrated and allow Secure Dynamic Updates.

2) Check to see that the DC at the main site exists in all the containers in DNS that it should.  Whatever role it holds (think FSMO and GC) then it should be listed in those containers in _msdcs.

3)  Make sure the new DC only points to the main site for DNS until you are finished with everything.

4)  You must fix all replication errors before SYSVOL on the new server will start sharing out - this is automatic so don't force anything in the registry.  

5)  Time must be correct on both servers - if you are in different time zones then make sure all this is set properly on both servers.  Dont forget the Daylight Saving time box.  You can (and should) set the main site server to use an internet time source to keep current.  It's important that the clocks on these controllers be no more than 5 minutes off or the new DC won't function properly.

If you've added and removed this server a number of times, then you need to stop for a period of time and let AD get settled down.  Every time you add new replication objects then it takes some time to fully sync - if you keep removing them just before they sync then AD is never in a completely steady state.

If the server is now not part of the domain or a DC then you can clean up AD metadata and start again.  If it is part of the domain as a DC then leave it like that and work on fixing the replication issues.

Let us know.

Okay,

I had some problems with DNS (probably more now that I messed with it).  Although I did have a site named tbsdomain.com, I did not have a site named _msdcs.tbsdomain.com, so I added the site.  I began to recieve errors in the DNS log about being unable to create an entry in tbsdomain.com.  I wrestled with this for a while and wound up deleting both sites and recreating them.  I am worried that I may not have recreated them correctly, as I just added the site, and everything else seemed to be automatic.  This being said, I checked dcdiag and I now am recieving no errors on the new DC, but the SYSVOL share is still not there, but the file structure is partially intact (tbsdomain is listed, but no policies).

I also setup W32Time appropriately.  

I am highly concerned about my actions in DNS, as I am not entirely familiar with DNS and AD.  I am getting external resolution, which is good since I have no references to outside DNS except inside the DNS console.  I am worried about things like authentication and domain joins which seem to work a little differently than simple internet name resolution.  So, was there a better way to handle the rebuilding of DNS on the primary DC (which was also my only DNS server)?  I am also thinking there may a "wait and see" thing going on with the SYSVOL share, but the question is how long do I wait before becoming concerned.  I would imagine an hour would be more than enough time.
No problem.  If you can break it, it can also be fixed.

You should have 2 Forward lookup zones (if this is on a 2003 server with no 2000 DNS peers).

_msdcs.forestrootdomain.com
forestrootdomain.com

The first zone (_msdcs) is initially created as a Primary zone, is AD Integrated, allows Secure Dynamic Updates and will be set to replicate with all DNS servers in the FOREST.
The second zone (forest root DNS namespace) is also initially created as a Primary zone, is AD Integrated, allows Secure Dynamic Updates and will be set to replicate with all DNS servers in the DOMAIN.

Once these two are created, restart the Netlogon service on each DC.  This assumes (of course) that the NICs on all machines inside your firewall point exclusively to this DNS server until all this is fixed.

In the Reverse Lookup Zone, make sure there is a zone for each subnet you use.  If they are representing sites within the same domain then they will have a replication scope for only the DOMAIN.  If they represent sites from multiple domains in this forest, then the scope should be set to FOREST for replication.  Now, scopes set to replicate to DNS servers in the FOREST are stored in the Application Partition in AD - which is only available to Windows 2003 servers - which means to you that if you have ANY Windows 2000 servers hosting DNS in your organization they will not get a copy of this zone.  Make sure you think it through before simply selecting forest replication.

As for the Advertising of this DC (when SYSVOL finally shares itself out) it may take some time, but it shouldn't be unrealistic.

Let's fix DNS.  Once this is correct, we should make sure everything gets registered properly before moving forward with troubleshooting.  Make zone corrections now - don't keep removing and replacing the zones - especially if there are peer DNS servers out there.  Replication from other servers hosting AD Integrated zones will continue to screw things up.

Let us know.
I only have one AD/DNS server on the network, and it's Win2k3.  DNS looks correct.  I am recieving no errors in the DNS log, Replication Log or the AD log.  I have started and stopped netlogon on both servers.  Still no SYSVOL share though.  

I tried to put the screws to the server and force replication.  I recieve an error when trying to replicate data from the PDC to the new DC:

The following error occurred during the attempt to contact the domain controller NH-DC1:
The RPC server is unavailable.

Sounds like I've still got problems in DNS?
I followed this article:

http://www.microsoft.com/windows2000/dns/tshoot/dns_tshoot2C.asp#Repl_FWD_Properties

and now when I force replication I don't get any errors.  I still don't have the SYSVOL Share though.
ASKER CERTIFIED SOLUTION
Avatar of Netman66
Netman66
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I did what you said, the time completed was 2/27/06 22:21 PST.  I will check for errors in the morning.

Thanks, you really are going above and beyond!
Okay,

I couldn't wait until tommorrow.  It looks like that worked.  SYSVOL is shared, and there are no errors in any logs.

Man that was slick.

I don't mean to pry, but where in the world did you come up with that?  That sounds like an MS internal KB.

Thanks again...  way above and beyond!
No, no internal MS KB.  It's here:http://support.microsoft.com/kb/315457/en-us - however, it's pretty intimidating for anyone who doesn't like gorey details.

I simply gave you the abbreviated version.  I've had lots of experience with replication problems so it wasn't new to me.

The key to correctly diagnosing replication issues is to make absolutely certain that DNS is 100% before you get into things like I told you.  Otherwise, you risk making things worse.

Glad to assist.
NM
Itforall and Netman66 are stars, ive had a new DC added to a domain that wouldnt share or replicate its Sysvol, from a recently upgraded DC (2000 to 2003) the new one had all teh roles, etc and DNS was on the money.  But there were replication errors, GPO wouldnt open on either server etc

under Replmon both servers had flagged up as an X in 'Server is a Primary Domain Controller for Domain', as i guess i had transferred the role from the old one but the new one wasnt working properly.

so i followed 2 posting here (after eventually finding you, ha) and noted that SysvolReady registry entry was set to 0 on the new server, so did Netman66's steps, stopping FRS, doing D4, D2 etc, but added Itforalls suggestion about setting the SysvolReady to 1 as well

seems fine, new server has flagged its self as 'Server is a Primary Domain Controller for Domain', GPO is working, no for a few more coffees while i watch the event log
Netman66's ultimate solution solved a problem I was having for days!