snyderkv
asked on
DFSr not replicating
I have a simple setup.
Two servers in two sites replicating 20 gigs of data.
They never did replicate except for a few files before stopping.
To troubleshoot, I did a dfsrdiag proptest and successfully replicated a file both ways. I saw this in the Diagnostic Test Folder within the replicated folder.
Second, I did a dfsrdiag syncnow both ways and still no luck even though the command shows successfull. I don't get it
Within EventViewer I receive eventID 5002 DFSR "Error: 5 (Access is denied) as shown in the code.
My shares do have authenticated users full control and standard NTFS permissions. I then upped the quota to 25000 megs just incase because I did see an error regarding quota being exceded for the dfsrprivate folder.
Two servers in two sites replicating 20 gigs of data.
They never did replicate except for a few files before stopping.
To troubleshoot, I did a dfsrdiag proptest and successfully replicated a file both ways. I saw this in the Diagnostic Test Folder within the replicated folder.
Second, I did a dfsrdiag syncnow both ways and still no luck even though the command shows successfull. I don't get it
Within EventViewer I receive eventID 5002 DFSR "Error: 5 (Access is denied) as shown in the code.
My shares do have authenticated users full control and standard NTFS permissions. I then upped the quota to 25000 megs just incase because I did see an error regarding quota being exceded for the dfsrprivate folder.
EVENTID:5002 Source:DFSR
The DFS Replication service encountered an error communicating with partner <computer> for replication group <fds path to folder>.
Partner DNS address: <dfs path to computer>
Optional data if available:
Partner WINS Address: <computer>
Partner IP Address: <ip address>
The service will retry the connection periodically.
Additional Information:
Error: 5 (Access is Denied)
Connection ID: <guid>
Replication Group ID: <guid2>.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I forgot to mention
The distant end has eventID: 5014 Error 9027 (A failure was reported by the remote partner)
Not sure if this was due to me restarting the service. It's possible so I'm not spending much time on it.
Not sure what else I can do to test things out.
The distant end has eventID: 5014 Error 9027 (A failure was reported by the remote partner)
Not sure if this was due to me restarting the service. It's possible so I'm not spending much time on it.
Not sure what else I can do to test things out.
ASKER
Ok my Backlog file counut says 3816
But I don't think the dfsrdiag backlog command says anything about why they haven't replicated. It just shows that it is aware of files that need to be replicated.
I ince stopped all McAfee services on both receiving and sending members A and B, restarted DFSr and redid a dfsrdiag syncnow command. Operation succeeded but files do not seem to replicate pluss, still receive 5002 DFSR errors in EventID on the sending member.
Any ideas?
But I don't think the dfsrdiag backlog command says anything about why they haven't replicated. It just shows that it is aware of files that need to be replicated.
I ince stopped all McAfee services on both receiving and sending members A and B, restarted DFSr and redid a dfsrdiag syncnow command. Operation succeeded but files do not seem to replicate pluss, still receive 5002 DFSR errors in EventID on the sending member.
Any ideas?
ASKER
The DCDiag show passed on both as well as the DNS test you recommended. I also did a replmon showrepl and there is no replication or DNS problems. There is no physical connectivity problems either. I can manually copy the files over no problem.
Check DFSR logs here
c:\windows\debug\DFSR*.log .
Please post back the log.
Also check this link which is specific to the event id's
http://social.technet.microsoft.com/Forums/en/winserverfiles/thread/3778427a-a594-4f1d-9c97-d8d1e6a56a83
c:\windows\debug\DFSR*.log
Please post back the log.
Also check this link which is specific to the event id's
http://social.technet.microsoft.com/Forums/en/winserverfiles/thread/3778427a-a594-4f1d-9c97-d8d1e6a56a83
YOu should not use full if you do not have the bandwidth. if your max upload is 512, you should configure the bandwidth to 128/256. etc.
The bandwidth configuraiton deals with DFSr and nothing else. If you say that it should use Full bandwidth but when it does it is slower than it exepects to transfer the file, this might be the cause of the error i.e. a 100 MB file should take less than a second, but with your bandwidth limitation it takes a minutes. It still might work, but with larger files after a minute or so it may see the issue is a timeout.
My guess is that smaller files replicate, but large files are the ones that fail. Also check the configured replication partners to make sure you have the remote Differential compression checked.
The bandwidth configuraiton deals with DFSr and nothing else. If you say that it should use Full bandwidth but when it does it is slower than it exepects to transfer the file, this might be the cause of the error i.e. a 100 MB file should take less than a second, but with your bandwidth limitation it takes a minutes. It still might work, but with larger files after a minute or so it may see the issue is a timeout.
My guess is that smaller files replicate, but large files are the ones that fail. Also check the configured replication partners to make sure you have the remote Differential compression checked.
ASKER
Yes RDC is checked on both.
I just swithed my replication bandwidth to 128k. I'll give it 24 hours or so to burn in before spinning my head over it.
I just swithed my replication bandwidth to 128k. I'll give it 24 hours or so to burn in before spinning my head over it.
ASKER
I uploaded a 300kb file and took 1 minute.
Which equals .039 Mbps.
I think I have to step down to Kbps
5 KiloBytes per second = 40 KiloBits per second so I'm changing to 64kbps
Does that sound right?
Which equals .039 Mbps.
I think I have to step down to Kbps
5 KiloBytes per second = 40 KiloBits per second so I'm changing to 64kbps
Does that sound right?
I do not see why you would convert to Mbs
300kb/60seconds=5kbs=.004M bs
What is the WAN upload speed at each end?
Divide by 8 and try to see whether it will replicate.
IF it is successful, adjust the allocated bandwidth to a division by 6 and see if the replication continues to work without maxing out the upload of the outside interface.
Pushing it beyond one sixth of the upload bandwidth might be pushing it, but you can try to a quarter, third, half
300kb/60seconds=5kbs=.004M
What is the WAN upload speed at each end?
Divide by 8 and try to see whether it will replicate.
IF it is successful, adjust the allocated bandwidth to a division by 6 and see if the replication continues to work without maxing out the upload of the outside interface.
Pushing it beyond one sixth of the upload bandwidth might be pushing it, but you can try to a quarter, third, half
ASKER
I converted it to Mbps because I accidently had 64Mbps instead of 64Kbps. It is set at 64Kbps currently
The next option down is 16Kbps. I'm going to keep it at 64Kbps. It's close enough to the 5Kbps I saw when copying a file.
Anyways I'm going to try and pre-stage my data instead but it looks like that can take days for just a few gigs.
The next option down is 16Kbps. I'm going to keep it at 64Kbps. It's close enough to the 5Kbps I saw when copying a file.
Anyways I'm going to try and pre-stage my data instead but it looks like that can take days for just a few gigs.
ASKER
Still no go. I set bandwidth to 64Kbps
It's slow but I see some data from time to time copy over, then it loops into the same error as shown in the logs. Hopefully someone can decispher whats really going on whether its bandwidth or file access permissions.
+ [Error:9027(0x2343) DownstreamTransport::Estab lishConnec tion downstreamtransport.cpp:35 43 2348 C622 A failure was reported by the remote partner]
+ [Error:5(0x5) DownstreamTransport::Estab lishConnec tion downstreamtransport.cpp:35 43 2348 W621 Access is denied.]
20100722 09:16:07.789 2348 DOWN 3097 DownstreamTransport::Setup Binding Setting authentication information for partner: DOMAIN\Server$
20100722 09:16:07.789 2348 DOWN 3131 DownstreamTransport::Setup Binding Setup connId:{95416C94-A5A6-472F -BBD9-21DD D69D851A} remoteAddress:Server.FQDN. com stringBinding:[5bc111107-f 111-4111-9 111-1111cf 9a111@1111 n_ip_tcp:S erver]
20100722 09:16:10.711 2348 DOWN 3575 [ERROR] DownstreamTransport::Estab lishConnec tion EstablishConnection failed. connId:{95411111-A111-111F -1111-1111 111D851A} Error:
+ [Error:9027(0x2343) DownstreamTransport::Estab lishConnec tion downstreamtransport.cpp:35 43 2348 C623 A failure was reported by the remote partner]
+ [Error:5(0x5) DownstreamTransport::Estab lishConnec tion downstreamtransport.cpp:35 43 2348 W622 Access is denied.]
20100722 09:16:10.711 2348 INCO 2265 [WARN] InConnection::ReConnectAsy nc Failed to connect, (attempts: 770) connId:{95416C94-A5A6-472F -BBD9-21DD D69D851A} Error:
+ [Error:9027(0x2343) DownstreamTransport::Estab lishConnec tion downstreamtransport.cpp:36 10 2348 C624 A failure was reported by the remote partner]
+ [Error:9027(0x2343) DownstreamTransport::Estab lishConnec tion downstreamtransport.cpp:35 43 2348 C623 A failure was reported by the remote partner]
+ [Error:5(0x5) DownstreamTransport::Estab lishConnec tion downstreamtransport.cpp:35 43 2348 W622 Access is denied.]
It's slow but I see some data from time to time copy over, then it loops into the same error as shown in the logs. Hopefully someone can decispher whats really going on whether its bandwidth or file access permissions.
+ [Error:9027(0x2343) DownstreamTransport::Estab
+ [Error:5(0x5) DownstreamTransport::Estab
20100722 09:16:07.789 2348 DOWN 3097 DownstreamTransport::Setup
20100722 09:16:07.789 2348 DOWN 3131 DownstreamTransport::Setup
20100722 09:16:10.711 2348 DOWN 3575 [ERROR] DownstreamTransport::Estab
+ [Error:9027(0x2343) DownstreamTransport::Estab
+ [Error:5(0x5) DownstreamTransport::Estab
20100722 09:16:10.711 2348 INCO 2265 [WARN] InConnection::ReConnectAsy
+ [Error:9027(0x2343) DownstreamTransport::Estab
+ [Error:9027(0x2343) DownstreamTransport::Estab
+ [Error:5(0x5) DownstreamTransport::Estab
The sites are presumably connected via a VPN, does each side have the ability to initiate and establish the connection or is it one way i.e. you have a dynamic on one and a static on the other.
Try to setup a ping on each side to make sure that the VPN tunnel remains up at all times.
Could you add a host entry c:\windows\system32\driver s\etc\host s
Remote_LAN_IP server.FQDN.com
just to be sure that the IP replciation gets is not the external/public IP.
Try to setup a ping on each side to make sure that the VPN tunnel remains up at all times.
Could you add a host entry c:\windows\system32\driver
Remote_LAN_IP server.FQDN.com
just to be sure that the IP replciation gets is not the external/public IP.
ASKER
600ms average 3% loss over 30 hops, the other side 0% loss with 600ms
I pinged FQDN, no problems with resolution or replication exc. Just DFS only.
What can you guys tell from the dfsr logs?
I pinged FQDN, no problems with resolution or replication exc. Just DFS only.
What can you guys tell from the dfsr logs?
ASKER
Also, within the error log I had EstablishConnection Failed. Try Flat Name
Suggests RPC connection errors. Weird. Firewalls disabled by default.
I'm going to disable McAfee temperarily and get rid of IPv6 if it's conneted.
I'll report back in a few days. Gota head out
Suggests RPC connection errors. Weird. Firewalls disabled by default.
I'm going to disable McAfee temperarily and get rid of IPv6 if it's conneted.
I'll report back in a few days. Gota head out
ASKER
Ok its actually replicating.
Its just taking forever because the pipe is so small. I checked the backlog after two weeks? and it went from 3800 objects to just a few. It skipped some files apparently, I guess I can reinitiate those.
Its just taking forever because the pipe is so small. I checked the backlog after two weeks? and it went from 3800 objects to just a few. It skipped some files apparently, I guess I can reinitiate those.
ASKER
I upped the queue to 25000mb and the largest file is 2.5gb and I haven't seen 4202 errors since doing that.
I will disable our anti-virus McCafee and disable it temporarily before running the manual push again however, nothing showed up as stopping it from working so.
Bandwidth usage on the replication settings within dfsmgmt.msc shows FULL. We have no QOS over the WAN.
Since I can copy a file manually, I doubt QOS or bandwidh issues would be the problem.