TCP issues with NetApp SnapMirror between sites

Hi,

We have a problem with SnapMirror operations in NetApp, we are replicating volumes from 1 filer to a 2nd filer located on a different site.  The connection goes through 2 FW's and several routers/switches. The problem is that the connection is established, data is being transferred, and at some random point (could be after 5GB of data transfer or 15 GB, doesn't matter), the data replication fails.

After consulting with NetApp tech support, I've run a packet trace (pktt) and from the results (viewed in wireshark) netapp concluded that the problem is with MSS, something on the way is modifying the packets and causing the MSS to be lowered from 1460 to 1406, here are the lines from the trace log:
trace log
X.X.X.X = Destination FW, the destination netapp filer to which the data is replicated is behind that FW.
172.16.0.13 = source filer located behind a different FW on a different site.

I would like to consult with a network specialist here if there's any sense in NetApp claim regarding the packet modification according to the different MSS values seen in the trace logs, this is what, according to NetApp, is causing the failures.

Thank you.
iNc0gAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

giltjrCommented:
First MSS is unique to each host and is typically based on MTU, which is normally based on maximum frame size.

What I see in your packet capture is that the MSS for one host  x.x.x.x is 1406 and the MSS for 172.16.0.13 is 1460.  It is perfectly fine to have this.

I'm not saying that something is NOT changing the MSS, all I am saying is that they are different that there is no technical reason for that to be a problem.

In order to verify that the MSS is being altered, you need to do a packet capture in front of and behind any/all firewalls, or just ask the firewall people to see if they alter the MSS.  Altering the MSS can be done in firewalls.

Normally MSS is MTU=40.  Since most MTU's are 1500, this means MSS 1460.   See the 1406 means that somewhere something may have a MTU of 1446, which is a weird number, but could be valid if there is a VPN tunnel and the VPN tunnel is configured not fragment packets and thus is lowering the MTU.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
iNc0gAuthor Commented:
in our Fortigate FWs there is no such option regarding packet fragmentation , i've opened a case to Fortinet as well to see if/what's causing the MSS change from the FW side, but really, I don't see any reason the IPSEC vpn tunnel should alter this.

I guess I'll wait for Fortinet to investigate this.  NetApp claims that this MSS difference between the packets is the cause for the transfer failing of the SnapMirror operation.
0
giltjrCommented:
I have no clue why it should.  The MSS does not have to be the same for both hosts in a TCP connection.

The application passes a a stream of data to TCP, TCP should break up the stream based on the remote side's advertised/agreed upon MSS.  

If NetAPP can't support a MSS smaller than 1460, then it has a major problem.

If you have a VPN tunnel, it is possible that MTU/MSS can be altered.  Since the traffic that flows inside the tunnel must fit inside normal TCP/IP packets/datagrams something has to be done.

Say you are sending 1460 bytes of data in a 1500 byte packet that must go over a VPN.  That 1500 datagram becomes the payload in another packet and thus must somehow get smaller than 1460 bytes.    One way is to fragment the packet into 1460+40.  The other way is to set the MTU on the VPN interface to something small enough that the VPN traffic does not need to be fragmented.  By lowering the MTU on the VPN interface the MSS will also get lowered.

It looks as if the VPN interface may have the MTU set to 1446, a weird number.  I have seen 1440 and 1400.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
TCP/IP

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.