File Replication is failing between domain controllers

I just found out that my FRS between my DC's is failing every night.  See attached.  Where do I begin to fix this?
file-replication.txt
J.R. SitmanAsked:
Who is Participating?
 
abhijitwaikarConnect With a Mentor Commented:
I forgot to mention that netdiag is no longer available with 2008, that is ok.

Also configurations and reports of DCDIAG and IPCONFIG are fine.

Now 13568 on 2008 means its replica set is in journal wrap state, to resolve this issue just perform D2, D4.

D4 should be on  healthy DC means 2003 and D2 on 2008 as it has 13568 error. First perform D4 and then D2.

Steps:
D4 also knowas as authorative,
To complete an authoritative restore, stop the FRS service, configure the
BurFlags
registry key, and then restart the FRS service.
To do so:
1.Click Start, and then click Run.
2.In the Open box, type cmd and then press ENTER.
3.In the Command box, type net stop ntfrs.
4.Click Start, and then click Run.
5.In the Open box, type regedit and then press ENTER.
6.Locate the following subkey in the registry:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup
7.In the right pane, double click BurFlags.
8.In the Edit DWORD Value dialog box, type D4 and then click OK.
9.Quit Registry Editor, and then switch to the Command box.
10.In the Command box, type net start ntfrs.
11.Quit the Command box.
When the FRS service is restarted, the following actions occur:
•The value for the BurFlags registry key is set back to 0.
•An event 13566 is logged to signal that an authoritative restore is started.
•Files in the reinitialized FRS replicated directories remain unchanged and become authoritative on direct replication. Additionally, the files become indirect replication partners through transitive replication.
•The FRS database is rebuilt based on current file inventory.
•When the process is complete, an event 13516 is logged to signal that FRS is operational. If the event is not logged, there is a problem with the FRS configuration.


D2 knows as non-authorative:
To perform a nonauthoritative restore, stop the FRS service, configure the
BurFlags
registry key, and then restart the FRS service. To do so:
1.Click Start, and then click Run.
2.In the Open box, type cmd and then press ENTER.
3.In the Command box, type net stop ntfrs.
4.Click Start, and then click Run.
5.In the Open box, type regedit and then press ENTER.
6.Locate the following subkey in the registry:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup
7.In the right pane, double-click BurFlags.
8.In the Edit DWORD Value dialog box, type D2 and then click OK.
9.Quit Registry Editor, and then switch to the Command box.
10.In the Command box, type net start ntfrs.
11.Quit the Command box.
When the FRS service restarts, the following actions occur:
•The value for BurFlags registry key returns to 0.
•Files in the reinitialized FRS folders are moved to a Pre-existing folder.
•An event 13565 is logged to signal that a nonauthoritative restore is started.
•The FRS database is rebuilt.
•The member performs an initial join of the replica set from an upstream partner or from the computer that is specified in the Replica Set Parent registry key if a parent has been specified for SYSVOL replica sets.
•The reinitialized computer runs a full replication of the affected replica sets when the relevant replication schedule begins.
•When the process is complete, an event 13516 is logged to signal that FRS is operational. If the event is not logged, there is a problem with the FRS configuration.

If you are unable to understand the steps use below KB for D2,D4 process: http://support.microsoft.com/kb/290762
0
 
abhijitwaikarCommented:
There could be many reasons for the File Replication Service the experience problems replicating.

check this: http://www.eventid.net/display.asp?eventid=13508&eventno=349&source=ntfrs&phase=1
0
 
Lester_ClaytonCommented:
Seems like you have DNS issues.  Can you just verify that both of your domain controllers can talk to a valid DNS server, and that this DNS server is a Domain Controller in the same domain?

Do some NSLOOKUP tests on both servers, to ensure it can resolve the other server.

FRS needs to do SRV record lookups, and if you're all using a third party DNS server, it's not going to work.
0
Easily manage email signatures in Office 365

Managing email signatures in Office 365 can be a challenging task if you don't have the right tool. CodeTwo Email Signatures for Office 365 will help you implement a unified email signature look, no matter what email client is used by users. Test it for free!

 
AmitIT ArchitectCommented:
Do you see FRS event ID 13509, if you see this ID that means don't need to worry else stop the file replication service and start it again. Also run the repadmin /replsum and check result.
0
 
J.R. SitmanAuthor Commented:
@Lester Clayton.  I'm not a DNS expert or even close.  How do I test if they can talk to each other.  Details please.  Both my DC's are DNS servers.
Anything else you want me to do, please send detailed steps.  I'd really appreciate it.
0
 
abhijitwaikarCommented:
Just run dcdiag /q , netdiag /q, repadmin /replsum and ipconfig /all on both server and post the result.

Also check if there is any 13568 error event on any one of the server.

0
 
Lester_ClaytonConnect With a Mentor Commented:

First Test:


Run a command prompt, and at the command prompt type the following

NSLOOKUP laspca.corp

It should reply with a list of IP's - these should be domain controllers

Second Test:


Now try the following:

NSLOOKUP SPCALA16.laspca.corp

and

NSLOOKUP SPCALA20.laspca.corp

Do all the above tests from both Domain Controllers.

Expected Responses:


This is what a good response for the first test looks like:

C:\Users\lclayton>nslookup mgmt.local
Server:  mgmt01.mgmt.local
Address:  10.110.176.11

Name:    mgmt.local
Addresses:  10.110.176.12
          10.110.176.11

This is what a good response for the second tests looks like:


C:\Users\lclayton>nslookup mgmt01.mgmt.local
Server:  mgmt01.mgmt.local
Address:  10.110.176.11

Name:    mgmt01.mgmt.local
Address:  10.110.176.11

This is what a bad response looks like:

C:\Users\lclayton>nslookup mgmt31.mgmt.local
Server:  mgmt01.mgmt.local
Address:  10.110.176.11

*** mgmt01.mgmt.local can't find mgmt31.mgmt.local: Non-existent domain
0
 
J.R. SitmanAuthor Commented:
Here are the results.  the first 4 are from the 2003 DC, the 5-7 are the 2008 DC.  On the 2008 netdiag stated it was an invalid command.

No 13568 on 2003 but yes on 2008

frs1.png
frs2.png
frs3.png
frs4.png
frs16a.png
frs16b.png
frs16c.png
0
 
J.R. SitmanAuthor Commented:
Here are the NSlookup results.  Not good.  The first 3 are from the 2008, next are 2003 server
nslookup.png
0
 
abhijitwaikarCommented:
For safer side, Before performing any provided step please do take a system state or %systemroor%\SYSVOL folder backup.
0
 
J.R. SitmanAuthor Commented:
Can I just copy the folder to another location?
0
 
J.R. SitmanAuthor Commented:
I'm backing them up with Arcserve
0
 
abhijitwaikarCommented:
Backing up or copying the folder are the valid options.
0
 
J.R. SitmanAuthor Commented:
thanks, I'll do both
0
 
J.R. SitmanAuthor Commented:
It failed on the 2008 server bvecause of disk space.  I clear up space and now I'm getting the 13516 event.  How do I test that this is actually fixed?

And thanks for the "simple" instructions
0
 
J.R. SitmanAuthor Commented:
Thanks to all.  The program that was getting the FRS is now working so all is good.  I love Experts-Exchange
0
All Courses

From novice to tech pro — start learning today.