Link to home
Start Free TrialLog in
Avatar of snyderkv
snyderkv

asked on

DFSr not replicating

I have a simple setup.

Two servers in two sites replicating 20 gigs of data.

They never did replicate except for a few files before stopping.

To troubleshoot, I did a dfsrdiag proptest and successfully replicated a file both ways. I saw this in the Diagnostic Test Folder within the replicated folder.

Second, I did a dfsrdiag syncnow both ways and still no luck even though the command shows successfull. I don't get it

Within EventViewer I receive eventID 5002 DFSR "Error: 5 (Access is denied) as shown in the code.

My shares do have authenticated users full control and standard NTFS permissions. I then upped the quota to 25000 megs just incase because I did see an error regarding quota being exceded for the dfsrprivate folder.
EVENTID:5002 Source:DFSR
The DFS Replication service encountered an error communicating with partner <computer> for replication group <fds path to folder>.

Partner DNS address: <dfs path to computer>

Optional data if available:
Partner WINS Address: <computer>
Partner IP Address: <ip address>

The service will retry the connection periodically.

Additional Information:
Error: 5 (Access is Denied)
Connection ID: <guid>
Replication Group ID: <guid2>.

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of sunnyc7
sunnyc7
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Avatar of arnold
arnold
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of snyderkv
snyderkv

ASKER

As far as network blocks which ports does DFS use besides RPC? Because I easily tested that by (net view \\distant server) and saw the shares. Whats weird is I can also manually copy the files from one server to the other (takes forever) but yeah

I upped the queue to 25000mb and the largest file is 2.5gb and I haven't seen 4202 errors since doing that.

I will disable our anti-virus McCafee and disable it temporarily before running the manual push again however, nothing showed up as stopping it from working so.

Bandwidth usage on the replication settings within dfsmgmt.msc shows FULL. We have no QOS over the WAN.

Since I can copy a file manually, I doubt QOS or bandwidh issues would be the problem.
I forgot to mention

The distant end has eventID: 5014 Error 9027 (A failure was reported by the remote partner)

Not sure if this was due to me restarting the service. It's possible so I'm not spending much time on it.

Not sure what else I can do to test things out.
Ok my Backlog file counut says 3816

But I don't think the dfsrdiag backlog command says anything about why they haven't replicated. It just shows that it is aware of files that need to be replicated.

I ince stopped all McAfee services on both receiving and sending members A and B, restarted DFSr and redid a dfsrdiag syncnow command. Operation succeeded but files do not seem to replicate pluss, still receive 5002 DFSR errors in EventID on the sending member.

Any ideas?
The DCDiag show passed on both as well as the DNS test you recommended. I also did a replmon showrepl and there is no replication or DNS problems. There is no physical connectivity problems either. I can manually copy the files over no problem.
Check DFSR logs here
 c:\windows\debug\DFSR*.log.

Please post back the log.

Also check this link which is specific to the event id's
http://social.technet.microsoft.com/Forums/en/winserverfiles/thread/3778427a-a594-4f1d-9c97-d8d1e6a56a83
YOu should not use full if you do not have the bandwidth. if your max upload is 512, you should configure the bandwidth to 128/256. etc.
The bandwidth configuraiton deals with DFSr and nothing else.  If you say that it should use Full bandwidth but when it does it is slower than it exepects to transfer the file, this might be the cause of the error i.e. a 100 MB file should take less than a second, but with your bandwidth limitation it takes a minutes.  It still might work, but with larger files after a minute or so it may see the issue is a timeout.
My guess is that smaller files replicate, but large files are the ones that fail.  Also check the configured replication partners to make sure you have the remote Differential compression checked.
Yes RDC is checked on both.

I just swithed my replication bandwidth to 128k. I'll give it 24 hours or so to burn in before spinning my head over it.

I uploaded a 300kb file and took 1 minute.

Which equals .039 Mbps.

I think I have to step down to Kbps

5 KiloBytes per second = 40 KiloBits per second so I'm changing to 64kbps

Does that sound right?
I do not see why you would convert to Mbs
300kb/60seconds=5kbs=.004Mbs

What is the WAN upload speed at each end?
Divide by 8 and try to see whether it will replicate.
IF it is successful, adjust the allocated bandwidth to a division by 6 and see if the replication continues to work without maxing out the upload of the outside interface.
Pushing it beyond one sixth of the upload bandwidth might be pushing it, but you can try to a quarter, third, half
I converted it to Mbps because I accidently had 64Mbps instead of 64Kbps. It is set at 64Kbps currently

The next option down is 16Kbps. I'm going to keep it at 64Kbps. It's close enough to the 5Kbps I saw when copying a file.

Anyways I'm going to try and pre-stage my data instead but it looks like that can take days for just a few gigs.
Still no go. I set bandwidth to 64Kbps

It's slow but I see some data from time to time copy over, then it loops into the same error as shown in the logs. Hopefully someone can decispher whats really going on whether its bandwidth or file access permissions.

+      [Error:9027(0x2343) DownstreamTransport::EstablishConnection downstreamtransport.cpp:3543 2348 C622 A failure was reported by the remote partner]
+      [Error:5(0x5) DownstreamTransport::EstablishConnection downstreamtransport.cpp:3543 2348 W621 Access is denied.]
20100722 09:16:07.789 2348 DOWN  3097 DownstreamTransport::SetupBinding Setting authentication information for partner: DOMAIN\Server$
20100722 09:16:07.789 2348 DOWN  3131 DownstreamTransport::SetupBinding Setup connId:{95416C94-A5A6-472F-BBD9-21DDD69D851A} remoteAddress:Server.FQDN.com  stringBinding:[5bc111107-f111-4111-9111-1111cf9a111@1111n_ip_tcp:Server]
20100722 09:16:10.711 2348 DOWN  3575 [ERROR] DownstreamTransport::EstablishConnection EstablishConnection failed. connId:{95411111-A111-111F-1111-1111111D851A} Error:
+      [Error:9027(0x2343) DownstreamTransport::EstablishConnection downstreamtransport.cpp:3543 2348 C623 A failure was reported by the remote partner]
+      [Error:5(0x5) DownstreamTransport::EstablishConnection downstreamtransport.cpp:3543 2348 W622 Access is denied.]
20100722 09:16:10.711 2348 INCO  2265 [WARN] InConnection::ReConnectAsync Failed to connect, (attempts: 770) connId:{95416C94-A5A6-472F-BBD9-21DDD69D851A} Error:
+      [Error:9027(0x2343) DownstreamTransport::EstablishConnection downstreamtransport.cpp:3610 2348 C624 A failure was reported by the remote partner]
+      [Error:9027(0x2343) DownstreamTransport::EstablishConnection downstreamtransport.cpp:3543 2348 C623 A failure was reported by the remote partner]
+      [Error:5(0x5) DownstreamTransport::EstablishConnection downstreamtransport.cpp:3543 2348 W622 Access is denied.]
The sites are presumably connected via a VPN, does each side have the ability to initiate and establish the connection or is it one way i.e. you have a dynamic on one and a static on the other.

Try to setup a ping on each side to make sure that the VPN tunnel remains up at all times.

Could you add a host entry c:\windows\system32\drivers\etc\hosts
Remote_LAN_IP server.FQDN.com
just to be sure that the IP replciation gets is not the external/public IP.
600ms average 3% loss over 30 hops, the other side 0% loss with 600ms

I pinged FQDN, no problems with resolution or replication exc. Just DFS only.

What can you guys tell from the dfsr logs?
Also, within the error log I had EstablishConnection Failed. Try Flat Name

Suggests RPC connection errors. Weird. Firewalls disabled by default.

I'm going to disable McAfee temperarily and get rid of IPv6 if it's conneted.

I'll report back in a few days. Gota head out
Ok its actually replicating.

Its just taking forever because the pipe is so small. I checked the backlog after two weeks? and it went from 3800 objects to just a few. It skipped some files apparently, I guess I can reinitiate those.