2008 R2 DFS Replication Error

I am getting this error on my 2008 R2 boxes.  They are all going over VPN's.  I have applied all of the hotfixes that are currently available and none of them have fixed the issue.

The DFS Replication service is stopping communication with partner Server1 for replication group mydomain\shares\documents due to an error. The service will retry the connection periodically.

Additional Information:
Error: 1726 (The remote procedure call failed.)
Connection ID: 2875A428-38DA-4E70-AC7E-F9F234E5F163
Replication Group ID: 9FBCE96A-0E91-438E-94C1-C0EB3F83BF8B

I am aware that it may be on my network.  We have Cisco 1811's doing the WAN VPN.  The only timeout policy in the routers is

ip http timeout-policy idle 60 life 86400 requests 10000

The service is crashing every minute, so does the 60 here refer to seconds or minutes?  If this refers to seconds then this could be the culprit.

The weird thing is, there were never any issues when we were on 2003 R2.  We recently began upgrading all our servers to Server 2008 R2.

Any ideas what may be the root cause here?
LVL 1
considerscsAsked:
Who is Participating?
 
considerscsConnect With a Mentor Author Commented:
I have solved this issue.

Turns out, the ISP had to come install some hardware to resolve time out issues.  They upgraded our speed a few weeks ago, and we have not been getting near that, plus a lot of network latency.

The timeouts were almost dead on with the errors.  Once they installed the new hardware, DFS finished replicating as it should, and has not thrown anymore errors.
0
 
Rich WeisslerProfessional Troublemaker^h^h^h^h^hshooterCommented:
I don't have the answer, but looking at what you have -- lets see if we can't find an answer.

I don't think the http timeout policy should be affecting this.  We're looking at an RPC timeout... I think if the PIX configuration was working in 2003, it should be working now.

There are some folks over on the technet forums who indicate that the problem could be a permission problem on the DFS Computer object in Active Directory, which might be worth a look.

You've been upgrading the DFS servers to 2008R2.  Have you already upgraded the domain controllers as well?
0
 
considerscsAuthor Commented:
The domain is 2008 still.  We plan to move that but much later on.  

But we do have 2008 R2 domain controllers in another company and their 2008 R2 DFS servers are having the same problem.

And the weird thing is, that it is only on the 2008 R2 that this is happening.  The ones that are still 2003 R2 do not have this issue.

I have tried the post previously.  All the permissions are correct on the computer computer object in Active Directory.

The boxes are replicating sometimes, but very slow.  And then when I get the error, DFS crashes out and sits idle and does not replicate even though it says it is.
0
Turn Raw Data into a Real Career

There’s a growing demand for qualified analysts who can make sense of Big Data. With an MS in Data Analytics, you can become the data mining, management, mapping, and munging expert that today’s leading corporations desperately need.

 
Rich WeisslerProfessional Troublemaker^h^h^h^h^hshooterCommented:
Okay, additional information collection... not suggesting any changes...

I assume because you still have some 2003 R2 DFS servers that the namespaces haven't been migrated to 2008 yet.

Was the upgrade an in-place upgrade of the operating system to the new version, or a fresh install of Windows 2008 R2?

I think the 1726 error is a symptom.  It's letting you know it can't communicate, and it would make sense for that instance to stop replicating at that point, because it has said it can't... but the service will continue to try periodically.  Is it possible there is a corresponding message on the other server, or another message in the system or application log?  Or even an audit failure in the security log on either servers participating in replication?
0
 
considerscsAuthor Commented:
This was a fresh install of 2008 R2.  The old folder target and replication membership was removed prior to disjoining the old box from the domain.

Below is the error I receive on the Hub member.  The Hub member is also a 2008 R2 server.

The DFS Replication service is stopping communication with partner server2 for replication group mycompany\shares\documents due to an error. The service will retry the connection periodically.
 
Additional Information:
Error: 1727 (The remote procedure call failed and did not execute.)
Connection ID: 1AC65D8D-5CF5-4670-A0D9-F5F9532C8F32
Replication Group ID: 9FBCE96A-0E91-438E-94C1-C0EB3F83BF8B
0
 
Rich WeisslerConnect With a Mentor Professional Troublemaker^h^h^h^h^hshooterCommented:
http://support.microsoft.com/kb/832017
Looking down at 'Distributed File System Replication'.

Is it possible that tcp/5722 is being blocked or filtered on the router/firewall/vpn boxes?  

One other possible change between 2003 and 2008 -- apparently in 2003, it would start it's random port allocation at tcp/1024, and count up from there.  2008 starts tcp/49152.  (And there are pointers to instructions on changing/customizing that port range at the bottom of the Microsoft support document.)  Could the router/firewall/vpn be intercepting the traffic?
0
 
considerscsAuthor Commented:
The firewalls are all disabled due to software requirements by our EMR.  But I went in and put route rules into windows firewall on the servers for 5722 just in case.

I am awaiting to see if it continues to throw the error.  For now I am getting the following error.

The DFS Replication service failed to communicate with partner server2 for replication group mycompany\shares\documents. The partner did not recognize the connection or the replication group configuration.
 
Partner DNS Address: server2.mycompany
 
Optional data if available:
Partner WINS Address: server2
Partner IP Address: x.x.x.x
 
 
The service will retry the connection periodically.
 
Additional Information:
Error: 9026 (The connection is invalid)
Connection ID: 547F88E8-BBF4-45C3-9B6F-F7CAE91D37C4
Replication Group ID: 9FBCE96A-0E91-438E-94C1-C0EB3F83BF8B
0
 
considerscsAuthor Commented:
I did a dfrsdiag pollad on the hub member and now I am back to getting the original error.
0
 
Rich WeisslerConnect With a Mentor Professional Troublemaker^h^h^h^h^hshooterCommented:
Alrighty then.  I think we can probably eliminate ports and firewalls.
You mentioned you've installed all the hotfixes.  That makes me nervous.  Do you still have a list of the hotfixes which were installed?  (Not critical... probably unrelated to the problem, but could certainly be exasperating the issue.  And I assume the hotfixes weren't applied on ALL the servers... but some... and there are others at your location and another that don't have all the hotfixes installed?)

I assume you aren't encountering any other Active Directory replication issues?

Do you have a single, or multiple AD sites?  (Are the servers across the VPN in another site from the Hub?)

>> But we do have 2008 R2 domain controllers in another company and their 2008 R2 DFS servers are having the same problem.

In the same Site/Domain/Tree/Forest?  Or different?

Just to double check -- for the site in which the DFS Hub Member server resides -- open AD Sites and Servers, select that site, open the NTDS Site Settings, and check the identity of the Inter-Site Topology Generator.  Check that server for any possible communication problems, or errors in it's logs.
0
 
considerscsAuthor Commented:
I have gotten it to replicate now for 30 minutes.

RPC locator service was set to manual by the Roles installation.  I set that to automatic and Started the service and it has started to replicate as it should for 30 minutes.

Then it goes back to crashing every minute just a couple minutes ago.

The Hub member is not showing any of the issues on its side now.  Just the spoke member is crashing every minute.
0
 
considerscsAuthor Commented:
Do you still have a list of the hotfixes which were installed?

Yes.  At this link http://support.microsoft.com/kb/968429

I did not apply them all to every server.  I was a little reserved at applying to all in case something went wrong due to one of the hotfixes.

No Active Directory replication issues showing in the logs.

I have a single AD site.  All other sites are connected through VPN.  Hub and spoke topology.

The other clients buildings are a different organization that we support.  They have their own VPN to every building and hub and spoke topology.  Their domain sits at the Hub location.

Just to double check -- for the site in which the DFS Hub Member server resides -- open AD Sites and Servers, select that site, open the NTDS Site Settings, and check the identity of the Inter-Site Topology Generator.  Check that server for any possible communication problems, or errors in it's logs.

There is not server listed here for this site.  But this site is the domain site.  The domain server sits on the same vm box as the Hub DFS member.
0
 
considerscsAuthor Commented:
I am out of ideas on this.  I am getting this error nearly every 30 seconds now.

This successfully replicated for a little while earlier, but has since went back to crashing.
0
 
Rich WeisslerProfessional Troublemaker^h^h^h^h^hshooterCommented:
It's a single spoke member that is crashing every 30?  Or are every 2008 DFS server in the environment other than the Hub crashing?

Is it worth attempting to remove the namespace(s) relevant to the crashing server, and recreating them in Windows 2008?
0
 
considerscsAuthor Commented:
So far it is just the single spoke member, but it is the only 2008 R2 server in the environment other than the Hub member.

I have removed the namespace a couple different times and even uninstalled DFS on the trouble box and reinstalled, but still nothing.

I have never seen something like this continue to happen even after all the troubleshooting.
0
 
considerscsAuthor Commented:
All of the responses here were vital to help pinpoint the issue.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.