Brandon_V
asked on
Active Directory File replication service not working
Hello the company I work for is having an odd issue where the netlogon and sysvol folder are working however they aren't replicating.
When I check the File replication service logs I don't see any errors in the logs but I was looking at http://forums.techarena.in/active-directory/704189.htm#post4123495 and saw that it was a similar symptom. However when I type ntfrsutl ds |findstr /i "root stage" it actually returns a blank line rather than a folder.
I typed that command on pretty much all AD servers and all show blank, but 90% of them have a copy of the sysvol and netlogon folder but they're pretty old. Should i be typing Linkd %systemroot%\SYSVOL\SYSVOL \Contoso.c om (I know contoso.com should be whatever my current domain is). So I just want to know what my next best steps are.
When I check the File replication service logs I don't see any errors in the logs but I was looking at http://forums.techarena.in/active-directory/704189.htm#post4123495 and saw that it was a similar symptom. However when I type ntfrsutl ds |findstr /i "root stage" it actually returns a blank line rather than a folder.
I typed that command on pretty much all AD servers and all show blank, but 90% of them have a copy of the sysvol and netlogon folder but they're pretty old. Should i be typing Linkd %systemroot%\SYSVOL\SYSVOL
What do you see when you type repadmin /syncall for a DC?
What level of AD are you on? Domain Functional level?
if 03 you can use repadmin or replmon. If 08, use repadmin /replsummary
FRS is used to replicate sysvol and netlogon in 03 but not in 08.
FRSDiag tool is also a good MS Tool to use.
if 03 you can use repadmin or replmon. If 08, use repadmin /replsummary
FRS is used to replicate sysvol and netlogon in 03 but not in 08.
FRSDiag tool is also a good MS Tool to use.
ASKER
When I do repadmin /syncall I get
CALLBACK MESSAGE: SyncAll Finished.
SyncAll terminated with no errors.
I'm on AD 2003, When I do a repadmin /replsymmary I get the stuff below (so no errors)
Replication Summary Start Time: 2012-04-10 09:23:06
Beginning data collection for replication summary, this may take awhile:
...................
Source DC largest delta fails/total %% error
Server 27m:08s 0 / 10 0
Server 25m:13s 0 / 73 0
Server 27m:02s 0 / 10 0
Server 05m:05s 0 / 8 0
Server 29m:49s 0 / 63 0
Server 27m:09s 0 / 3 0
Server 27m:05s 0 / 10 0
Server 05m:15s 0 / 5 0
Server 27m:03s 0 / 10 0
Server 27m:09s 0 / 10 0
Server 27m:07s 0 / 10 0
Server 27m:07s 0 / 10 0
Server 27m:02s 0 / 10 0
Server 05m:16s 0 / 5 0
Server 27m:06s 0 / 10 0
Server 27m:03s 0 / 10 0
Destination DC largest delta fails/total %% error
Server 02m:00s 0 / 10 0
Server 05m:23s 0 / 70 0
Server 01m:44s 0 / 10 0
Server 29m:54s 0 / 10 0
Server 27m:14s 0 / 58 0
Server :26s 0 / 9 0
Server 04m:24s 0 / 10 0
Server 01m:55s 0 / 5 0
Server 23m:54s 0 / 10 0
Server 06m:47s 0 / 10 0
Server 25m:25s 0 / 10 0
Server 24m:18s 0 / 10 0
Server :11s 0 / 10 0
Server 13m:36s 0 / 5 0
Server 24m:03s 0 / 10 0
Server 04m:58s 0 / 10 0
CALLBACK MESSAGE: SyncAll Finished.
SyncAll terminated with no errors.
I'm on AD 2003, When I do a repadmin /replsymmary I get the stuff below (so no errors)
Replication Summary Start Time: 2012-04-10 09:23:06
Beginning data collection for replication summary, this may take awhile:
...................
Source DC largest delta fails/total %% error
Server 27m:08s 0 / 10 0
Server 25m:13s 0 / 73 0
Server 27m:02s 0 / 10 0
Server 05m:05s 0 / 8 0
Server 29m:49s 0 / 63 0
Server 27m:09s 0 / 3 0
Server 27m:05s 0 / 10 0
Server 05m:15s 0 / 5 0
Server 27m:03s 0 / 10 0
Server 27m:09s 0 / 10 0
Server 27m:07s 0 / 10 0
Server 27m:07s 0 / 10 0
Server 27m:02s 0 / 10 0
Server 05m:16s 0 / 5 0
Server 27m:06s 0 / 10 0
Server 27m:03s 0 / 10 0
Destination DC largest delta fails/total %% error
Server 02m:00s 0 / 10 0
Server 05m:23s 0 / 70 0
Server 01m:44s 0 / 10 0
Server 29m:54s 0 / 10 0
Server 27m:14s 0 / 58 0
Server :26s 0 / 9 0
Server 04m:24s 0 / 10 0
Server 01m:55s 0 / 5 0
Server 23m:54s 0 / 10 0
Server 06m:47s 0 / 10 0
Server 25m:25s 0 / 10 0
Server 24m:18s 0 / 10 0
Server :11s 0 / 10 0
Server 13m:36s 0 / 5 0
Server 24m:03s 0 / 10 0
Server 04m:58s 0 / 10 0
Alright how do you know the problem is happening?
ASKER
If I go \\servername\netlogon and change it on one, it doesn't update anywhere.
What level of Domain are you? 03 or 08?
ASKER
03
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
alright I can do that however under the HKLM\System\CCS\Services\N TFRS\Param eters\Cuml ativeRepli caSets there is no values for anything. So no entry called burflags.
Should it be created ? Also i find it odd that when I type ntfrsutl ds |findstr /i "root stage" it returns a blank line.
Should it be created ? Also i find it odd that when I type ntfrsutl ds |findstr /i "root stage" it returns a blank line.
Post dcdiag
yes, post dcdiag /c /v and again, the FRSDiag tool will be a big help too:
http://www.microsoft.com/download/en/details.aspx?id=8613
http://www.microsoft.com/download/en/details.aspx?id=8613
ASKER
alright so I found the root cause, I just need help with the fix. So the FRSDiag tool shows
NtFrs 1/11/2012 7:55:13 PM Error 13568 The File Replication Service has detected that the replica set "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" is in JRNL_WRAP_ERROR. Replica set name is : "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" Replica root path is : "c:\windows\sysvol\domain" Replica root volume is : "\\.\C:" A Replica set hits JRNL_WRAP_ERROR when the record that it is trying to read from the NTFS USN journal is not found. This can occur because of one of the following reasons.
And then it lists a ton of reasons below
[1] Volume "\\.\C:" has been formatted. [2] The NTFS USN journal on volume "\\.\C:" has been deleted. [3] The NTFS USN journal on volume "\\.\C:" has been truncated. Chkdsk can truncate the journal if it finds corrupt entries at the end of the journal. [4] File Replication Service was not running on this computer for a long time. [5] File Replication Service could not keep up with the rate of Disk IO activity on "\\.\C:". Setting the "Enable Journal Wrap Automatic Restore" registry parameter to 1 will cause the following recovery steps to be taken to automatically recover from this error state. [1] At the first poll, which will occur in 5 minutes, this computer will be deleted from the replica set. If you do not want to wait 5 minutes, then run "net stop ntfrs" followed by "net start ntfrs" to restart the File Replication Service. [2] At the poll following the deletion this computer will be re-added to the replica set. The re-addition will trigger a full tree sync for the replica set. WARNING: During the recovery process data in the replica tree may be unavailable. You should reset the registry parameter described above to 0 to prevent automatic recovery from making the data unexpectedly unavailable if this error condition occurs again. To change this registry parameter, run regedit. Click on Start, Run and type regedit. Expand HKEY_LOCAL_MACHINE. Click down the key path: "System\CurrentControlSet\ Services\N tFrs\Param eters" Double click on the value name "Enable Journal Wrap Automatic Restore" and update the value. If the value name is not present you may add it with the New->DWORD Value function under the Edit Menu item. Type the value name exactly as shown above.
Now on the main DC that holds 4 of the 5 FSMO roles it shows the following under KCC event log test
An Error Event occured. EventID: 0xC0000470
Time Generated: 04/10/2012 13:01:57
(Event String could not be retrieved)
It shows that a bunch of times. now under Starting test: VerifyEnterpriseReferences it shows ;
[1] Problem: Missing Expected Value
Base Object:
CN=DCSERVER,CN=Servers,CN= CAL,CN=Sit es,CN=Conf iguration, DC=company ,DC=ca
Base Object Description: "Server Object"
Value Object Attribute: serverReference
Value Object Description: "DC Account Object"
Recommended Action: This could hamper authentication (and thus
replication, etc). Check if this server is deleted, and if so
clean up this DCs Account Object. If the problem persists and
this is not a deleted DC, authoratively restore the DSA object from
a good copy, for example the DSA on the DSA's home server.
It shows one of those for every DC server. Now to add to the story we had an AD issue back in Jan and we had a Microsoft ticket where we had to do an authoritative AD restore and it was not on that server it was done on a different server.
NtFrs 1/11/2012 7:55:13 PM Error 13568 The File Replication Service has detected that the replica set "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" is in JRNL_WRAP_ERROR. Replica set name is : "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" Replica root path is : "c:\windows\sysvol\domain"
And then it lists a ton of reasons below
[1] Volume "\\.\C:" has been formatted. [2] The NTFS USN journal on volume "\\.\C:" has been deleted. [3] The NTFS USN journal on volume "\\.\C:" has been truncated. Chkdsk can truncate the journal if it finds corrupt entries at the end of the journal. [4] File Replication Service was not running on this computer for a long time. [5] File Replication Service could not keep up with the rate of Disk IO activity on "\\.\C:". Setting the "Enable Journal Wrap Automatic Restore" registry parameter to 1 will cause the following recovery steps to be taken to automatically recover from this error state. [1] At the first poll, which will occur in 5 minutes, this computer will be deleted from the replica set. If you do not want to wait 5 minutes, then run "net stop ntfrs" followed by "net start ntfrs" to restart the File Replication Service. [2] At the poll following the deletion this computer will be re-added to the replica set. The re-addition will trigger a full tree sync for the replica set. WARNING: During the recovery process data in the replica tree may be unavailable. You should reset the registry parameter described above to 0 to prevent automatic recovery from making the data unexpectedly unavailable if this error condition occurs again. To change this registry parameter, run regedit. Click on Start, Run and type regedit. Expand HKEY_LOCAL_MACHINE. Click down the key path: "System\CurrentControlSet\
Now on the main DC that holds 4 of the 5 FSMO roles it shows the following under KCC event log test
An Error Event occured. EventID: 0xC0000470
Time Generated: 04/10/2012 13:01:57
(Event String could not be retrieved)
It shows that a bunch of times. now under Starting test: VerifyEnterpriseReferences
[1] Problem: Missing Expected Value
Base Object:
CN=DCSERVER,CN=Servers,CN=
Base Object Description: "Server Object"
Value Object Attribute: serverReference
Value Object Description: "DC Account Object"
Recommended Action: This could hamper authentication (and thus
replication, etc). Check if this server is deleted, and if so
clean up this DCs Account Object. If the problem persists and
this is not a deleted DC, authoratively restore the DSA object from
a good copy, for example the DSA on the DSA's home server.
It shows one of those for every DC server. Now to add to the story we had an AD issue back in Jan and we had a Microsoft ticket where we had to do an authoritative AD restore and it was not on that server it was done on a different server.
Alright so this post will fix it. You need to determine which one has the most updated information which is usually the one that holds all the roles
Take backup of the policies and script folders from both of the servers from c:\Windows\Sysvol\domain
Stop NTFRS service on both DCs.
Make one of the DCs authoritative server by modifying registry setting : Navigate to registry HKLM\System\CCS\Services\N TFRS\Param eters\Cuml ativeRepli caSets and Set the Burflags value to D4. This should be done with server which has the Updated information available or correct data.
Go to other DC and make that Non-authoritative by navigating to same registry location HKLM\System\CCS\Services\N TFRS\Param eters\Cuml ativeRepli caSets and Set the Burflags value to D2.
Restart Ntfrs service on both servers and force replication to see event 13516 in event viewer for FRS.
Take backup of the policies and script folders from both of the servers from c:\Windows\Sysvol\domain
Stop NTFRS service on both DCs.
Make one of the DCs authoritative server by modifying registry setting : Navigate to registry HKLM\System\CCS\Services\N
Go to other DC and make that Non-authoritative by navigating to same registry location HKLM\System\CCS\Services\N
Restart Ntfrs service on both servers and force replication to see event 13516 in event viewer for FRS.
Journal Wrap errors are not unusual and usually not hard to fix. The burflags registry change should resolve it for you.
ASKER
thanks guys. I will probably have to wait for the weekend for a maintenance window to fix but i'll post the results and close question once performed