asked on

Network drive dropped over VPN after MTU packet size change

I am losing randomly access to mapped network drives after we changed the packet size between two ISA boxes on a VPN tunnel (the specific network name is no longer available). Since I use a GRE tunnel on my end, the packet size is 1436 bytes and there were a large amount of dropped packages when the other side used 1500 bytes. Since both using 1436 bytes now, the dropping rate improved tremendously. But now I have an issue with the network drives staying up.
Any ideas why this happens and what could I do to improve that?
Thanks

lanboyo

Make sure there are no PMTU issues...

do a series of ping commands with the do not fragment (DF) bit set.

ping -f -l PACKET_SIZE SERVER

Where PACKET_SIZE is the data size of the ping and SERVER is the IP of the server with the shares.

You should look for a gap in packet sizes between when you get ping replies and when you get the message :
Packet needs to be fragmented but DF set.

If you have a gap then you have a bit of a blackhole where certain packet sizes can not traverse the network AND icmp can not fragment messages do not return. If you have no problems from the client to server do the same thing server to client.

Next Problem: Active Directory Global Catallog.

From the workstatio that you have problems seeing the servers, do you have full network access to all of the AD servers that may be providing name and directory services? AD can place odd rules on unexpected servers. The VPN might not have access to the correct server.

Do you have any MTU black holes with those servers?

croplife

ASKER

Thanks for the answer. Yes I have gap, ping out with max 1372 bytes, form the receiving server back only 1352 bytes.
The client workstation can ping all AD server without a problem across the VPN.

On both ISA servers is a HK_LM\System\CurrentControlSet\Services\NdisWAN\Parameters\Protocols\0
TunnelMTU REG_DWORD entry of 0x00000578 (1400). Should these entries be changed? If so to what packet size?

lanboyo

Sorry I don't fully understand. Is there a packet size from which you get neither replies or "Packet needs to be fragmented but DF set" messages when you ping?

Are the connections pptp l2tp or ipsec?

To understand more fully, is the GRE tunnel an external feature of the network, or does it refer to the gre used in the microsoft tunneling protocol?

Am I to understand that the Interface MTUs are 1436 and the tunnel mtus are 1400? You might want to bump the tunnel mtu down to 1300, I understand that trial and error is not that fun with a full reboot between each attempt.

croplife

ASKER

Thanks for your prompt reply. We have a PPTP tunnel between sites A and B.
Site A has a high speed EFM connection to the internet with a backup DSL. The connections feed into the back of a cisco 2600. GRE is running between the two ISP lines, so negotiate IP changes if there is a problem on the primary line and we need to failover. Because of this GRE issue, we were instructed to set the MTU size on the NIC, through the registry to 1468.
We then attempted to establish the PPTP VPN tunnel between the 2 locations, but would have random line outages, a very hard time reesatblishing a dropped connection, and generalized slowness between the two sites. This went on and on for the better part of a year, but then recently the slowness was too much to bear.
Some research was done and it was determined that we should set the MTU size in the registry for the PPTP protocol specifically. We did this at both locations and set the registry to 1400.
however, we just noticed that SITE A can ping Site B with 1372. 1373 fails with the error you specified. Site B can ping Site A with 1352. 1353 fails with the error you specified.
while the line outages have seemed to disappear and the speed has improved since the MTU PPTP change, users are now experiencing problems with accessing file shares from SITE A to SITE B. The errors seem random (some users affected at different times, sometimes its down for hours - other times minutes - other times, not at all.)
I hope the extra information was helpful and thanks again for your reply.

lanboyo

For the most part that error is what you want to see, the networking devices recognize an MTU too small for a packet that can not be fragmented because the DF bit is set. It discards the packet and sends the icmp message back to the sender that you see, "Packet needs to be fragmented but DF set"

So site A=EFM Site B=DSL?

What is odd is that the MTU seems to be asymetrical, A device on one of the networks thinks that something has an MTU 20 bytes smaller than the other side does. Since the pings get the correct error mesage, either mtu is good though. Are there mtu configuration settings on the cisco gre tunnels? Do they match on both sides? Is the DSL a ppoe connection by any chance?

Last question, does all this user traffic go thru the windows servers, which in turn goes thru the gre tunnel, or does the user traffic go directly thru the gre?

I am tiresome, I know.

croplife

ASKER

Yes, site A is EFM and site B is DSL. I dont think its a PPOE connection - it's in Brussels and they're all sleeping now.
when site A installed it's EFM connection, we were told that we'd have to set the WAN side of the firewall's MTU to 1468, as 32 bytes are absorbed by the GRE tunnel.
to be clear:
Line A = EFM
Line B = ADSL
Line A + B feed into WAN side CISCO 2600 running GRE.
Cisco 2600 LAN side feeds into WAN side ISA server.
We changed MTU packet size on WAN facing NIC on ISA server to 1468.
All users go through the ISA server out to the Internet.

We later added registry entry for PPP MTU size to be 1400 on that same NIC as described in documentation we found online.

SIte B is an ISA server also - 1400 PPP MTU size change on the NIC. Default MTU size for all other traffic. This is the site using a dsl line as the primary.

ASKER CERTIFIED SOLUTION

lanboyo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

croplife

ASKER

thanks again for your reply.
we'll test it out and let you know what we find.

croplife

ASKER

Thanks for your help - it turned out to be a bandwidth issue. Thanks again.