Solved

Tcp connection troubles on ubuntu after vmware conversion

Posted on 2013-01-29
18
764 Views
Last Modified: 2013-03-11
Hi,

We had an old vmware server on a 2008 server, this has 3 virtual machines.  One I installed and two that were installed before i was here.
We converted these a few weeks ago to esxi 5.0.

Since then we seem to have some issues on two of the three ubuntu vm's  (the one I installed works correctly).  

Network is up and i have a constant ping open from and to machine.207 (one that has issues).This is the issue: unstable tcp connections.  When you try to ssh to it it just drops the connection, second time does work sometimes.  When working in the ssh session you get disconnected randomly.

What I have done so far:
- upgraded all ubuntu's to latest 12.04 LTS
- disabled ipv6
- checked routing on .207 -> just has a default route to our firewall, routing is not the issue as this happens locally.
- checked arp = ok
- rebooted the machines -> issue is the same

Now I installed wireshark and this is what happens when it fails:
i see from my machine (.100) to .207 -> 335      19.989077000      192.168.0.100      192.168.0.207      TCP      54      59763 > ssh [ACK] Seq=1 Ack=1 Win=65536 Len=0
.207 answers back -> 335      19.989077000      192.168.0.100      192.168.0.207      TCP      54      59763 > ssh [ACK] Seq=1 Ack=1 Win=65536 Len=0
368      21.389081000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
555      23.389142000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
765      27.389213000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
970      35.389339000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
1307      51.389592000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
1417      58.524460000      192.168.0.207      192.168.0.100      TCP      60      ssh > 58009 [ACK] Seq=1 Ack=1 Win=2532 Len=0
3777      133.529204000      192.168.0.207      192.168.0.100      TCP      60      [TCP Dup ACK 1417#1] ssh > 58009 [ACK] Seq=1 Ack=1 Win=2532 Len=0
8144      208.534869000      192.168.0.207      192.168.0.100      TCP      60      [TCP Dup ACK 1417#2] ssh > 58009 [ACK] Seq=1 Ack=1 Win=2532 Len=0
12679      283.539682000      192.168.0.207      192.168.0.100      TCP      60      ssh > 58009 [RST, ACK] Seq=2 Ack=1 Win=2532 Len=0


I have no idea why i get a reset back.  I'm in the dark :(
iptables -L on .207 gives nothing.

This is a successfull attempt, then dropping after a while:
48      2.635152000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59999 [SYN, ACK] Seq=0 Ack=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
76      3.911437000      192.168.0.100      192.168.0.207      TCP      66      60000 > ssh [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
77      3.911732000      192.168.0.207      192.168.0.100      TCP      66      ssh > 60000 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
78      3.911834000      192.168.0.100      192.168.0.207      TCP      54      60000 > ssh [ACK] Seq=1 Ack=1 Win=65536 Len=0
then i wait a while, i see some traffic and then this happens:
2165      113.070976000      192.168.0.207      192.168.0.100      SSHv2      106      Encrypted response packet len=52
2166      113.125807000      192.168.0.100      192.168.0.207      TCP      54      60000 > ssh [ACK] Seq=1917 Ack=3640 Win=65536 Len=0
2168      113.126155000      192.168.0.207      192.168.0.100      TCP      60      ssh > 60000 [RST] Seq=3640 Win=0 Len=0
2169      113.288979000      192.168.0.207      192.168.0.100      SSHv2      106      [TCP Retransmission] Encrypted response packet len=52
repeated a few times
and ends with:
4660      225.774711000      192.168.0.207      192.168.0.100      SSHv2      106      [TCP Retransmission] Encrypted response packet len=52
0
Comment
Question by:PlusIT
  • 10
  • 6
  • 2
18 Comments
 
LVL 10

Author Comment

by:PlusIT
ID: 38830837
i kept the monitoring open this time: and now this pop up in wireshark

174601      699.510273000      192.168.0.207      192.168.0.100      SSHv2      106      [TCP Retransmission] Encrypted response packet len=52
296330      819.832237000      192.168.0.207      192.168.0.100      SSHv2      106      [TCP Retransmission] Encrypted response packet len=52

if more will be logged beside this one, I'll update this post.
0
 
LVL 119
ID: 38830862
are you using the VMXNET3 network interface?

if not change it, from the E1000 to the VMXNET3 network interface
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38830867
Hi

Can you try with disabled iptables?
Is the connection going through your firewall?

What does "cat /proc/sys/net/ipv4/tcp_sack" say?
and cat /proc/sys/net/ipv4/tcp_window_scaling?
0
Easy, flexible multimedia distribution & control

Coming soon!  Ideal for large-scale A/V applications, ATEN's VM3200 Modular Matrix Switch is an all-in-one solution that simplifies video wall integration. Easily customize display layouts to see what you want, how you want it in 4k.

 
LVL 10

Author Comment

by:PlusIT
ID: 38830877
@hanccocka: we're using the e1000's
@unix86

- iptables -L is empty so i guess it's disabled?
- it's a local connection so no it does not pass a firewall (only a default gateway) and i'm testing from same subnet
- tcp_Sack = 1
- tcp_Windows_scaling = 1
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38830903
Can you also run

mtr --report hostname

Open in new window


This will gather some latency numbers.

You may change the cylces with option -c to gather performance over a longer time period.
Like

mtr -c 50 --report hostname

Open in new window

0
 
LVL 119
ID: 38830952
I would recommend switching to VMXNET3.
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38830974
from the machine .207 (hostname = sftp) I ran:

mtr --report sftp:
0% loss all rest are 0
mtr --report sftp -c 50:
0% loss all 0 excpt wrst = 0.1
mtr --report lt-cvdw.DOMAINOBSCURED.local (lt-cvdw was .100 in the tests)

this gives just a line:
HOST: SFTP with the colums like before but not data output
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38831003
@ hanccocka : issue remains the same when switchting the nic to v3
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38831023
Hm.  Can you do:

1. disable iptables. (service iptables stop). And try again
2. with enabled iptables disable sack (echo 0 > cat /proc/sys/net/ipv4/tcp_sack)

Is this only happening with ssh?
Do you see anything in the logfiles?
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38831061
no service iptables exists
tried echo 0 to tcp_Sack, rebooted + issue remains
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38831066
yes. because it does not survive a reboot :D You need to disable sack and then test it without reboot.

You can also just flush iptables with iptables -F. But I doubt you have iptables running
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38831126
issue remains :(
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38834255
Hi

You might want to check your ssh logs. One possability is to run sshd in debug mode with
sshd -d

Open in new window

0
 
LVL 10

Author Comment

by:PlusIT
ID: 38834264
it's not ssh.  FTP and telnet drop to.  It's must be something on tcp level as ping is contstant and ok
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38834269
Hi

Can you attach your wireshark or tcpdump?
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38839984
i'm very sorry I'm not allowed to share that kind of information on the internet.  I can do what you ask from me though and provide obscured logs.
0
 
LVL 10

Accepted Solution

by:
PlusIT earned 0 total points
ID: 38960940
well the solution was simple.. without my knowledge the old vm's on the old server got started up again.  Hence the strange behaviour.  Thx for your help though
0
 
LVL 10

Author Closing Comment

by:PlusIT
ID: 38972977
well the solution was simple.. without my knowledge the old vm's on the old server got started up again.  Hence the strange behaviour.  Thx for your help though
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

HOW TO: Upload an ISO image to a VMware datastore for use with VMware vSphere Hypervisor 6.5 (ESXi 6.5) using the vSphere Host Client, and checking its MD5 checksum signature is correct.  It's a good idea to compare checksums, because many installat…
Veeam Backup & Replication has added a new integration – Veeam Backup for Microsoft Office 365.  In this blog, we will discuss how you can benefit from Office 365 email backup with the Veeam’s new product and try to shed some light on the needs and …
This video shows you how to use a vSphere client to connect to your ESX host as the root user. Demonstrates the basic connection of bypassing certification set up. Demonstrates how to access the traditional view to begin managing your virtual mac…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question