Link to home
Start Free TrialLog in
Avatar of piedthepiper
piedthepiper

asked on

Increase transfer speed between Linux VMs across a 16ms LES Link

Hi Guys,

So we have a Linux VM and we are trying to send data across out 1Gb LES link from the UK to a VM in France.

It seems to max out at about 20%/200Mbsec.

We did have a similar issue with our SAN replication (Compellent), but when we enabled the TCP Immediate Data feature, this cured all our issues and replication started using the link properly.

Now my question is, is there a way to enable this feature on the Linux OS (Centos 6.5)?

Linux isn't my strong point and I am just curious to see if its possible. We did try adjusting the TX/RX to 4096 using ethtool -G eth0 rx 4096 tx 4096, after reading about some troubleshooting that was going on in another thread and it made no difference at all.

I could be totally barking up the wrong tree here or whatever, but I was wondering if anyone had any further ideas.

The LES link is not throttled in any way whats so ever, I went through all that crap with Compellent, and provided them proof that I could dump data down that link and max it out easily through various VMs no issue.

After reading a few articles here is what the sysctl.conf file looks like now:

# increase TCP max buffer size setable using setsockopt()
# allow testing with 256MB buffers
net.core.rmem_max = 268435456
net.core.wmem_max = 268435456
# increase Linux autotuning TCP buffer limits
# min, default, and max number of bytes to use
# allow auto-tuning up to 128MB buffers
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
# recommended to increase this for 10G NICS or higher
net.core.netdev_max_backlog = 250000
# don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1
# Explicitly set htcp as the congestion control: cubic buggy in older 2.6 kernels
net.ipv4.tcp_congestion_control=htcp





#net.core.wmem_max=12582912
#net.core.rmem_max=12582912
#net.ipv4.tcp_rmem= 10240 87380 12582912
#net.ipv4.tcp_wmem= 10240 87380 12582912
#net.ipv4.tcp_window_scaling = 1
#net.ipv4.tcp_timestamps = 1
#net.ipv4.tcp_sack = 1
#net.ipv4.tcp_no_metrics_save = 1
#net.core.netdev_max_backlog = 5000

So as you can see we have tried to make adjustments, but they have not had any impact?

I am open to ideas!
Avatar of giltjr
giltjr
Flag of United States of America image

Is this the ONLY traffic on the link?  Are the linux setting you show the same on both hosts?

You may want to run a short packet trace, no more than 1 minute, to see if it identifies any obvious issues.

Issues like: packet size smaller than 1500 bytes, TCP window getting full (down to zero) and long delay before getting reset, or long delay on packet ACK's.
Avatar of piedthepiper
piedthepiper

ASKER

Ive thrown traffic from two windows boxes a while ago on the same link when I was doing testing and I could max it out.

These settings are on both VMs

Any particular settings by running a trace? Not really done a trace on Linux before.
I've done ping -M do -s 1472 remoteHost and it passes fine

I did a thsark capture of everything during the data send, it came to 9GB haha, I have loaded it into wireshark to have a look, but to be far I am not sure what I am looking for!
SOLUTION
Avatar of giltjr
giltjr
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Great.  The only other thing you could try, if you wanted to get a little more, is to see if the link supports jumbo frames and change the frame/packet size to at least 4000 bytes.  The bigger the payload, the few the messages, the less the overhead, both in terms of network and CPU.

I don't know if it still holds true, but at one time the biggest gain was going from 1500 to 4000 bytes.  Going any bigger did not really buy you a lot in increased through put or decreased CPU utilization.
it took some more adjustment, but only after realizing through the captures that there was nothing showing as being the issue over the LES link