Solved

Tcp connection troubles on ubuntu after vmware conversion

Posted on 2013-01-29
18
758 Views
Last Modified: 2013-03-11
Hi,

We had an old vmware server on a 2008 server, this has 3 virtual machines.  One I installed and two that were installed before i was here.
We converted these a few weeks ago to esxi 5.0.

Since then we seem to have some issues on two of the three ubuntu vm's  (the one I installed works correctly).  

Network is up and i have a constant ping open from and to machine.207 (one that has issues).This is the issue: unstable tcp connections.  When you try to ssh to it it just drops the connection, second time does work sometimes.  When working in the ssh session you get disconnected randomly.

What I have done so far:
- upgraded all ubuntu's to latest 12.04 LTS
- disabled ipv6
- checked routing on .207 -> just has a default route to our firewall, routing is not the issue as this happens locally.
- checked arp = ok
- rebooted the machines -> issue is the same

Now I installed wireshark and this is what happens when it fails:
i see from my machine (.100) to .207 -> 335      19.989077000      192.168.0.100      192.168.0.207      TCP      54      59763 > ssh [ACK] Seq=1 Ack=1 Win=65536 Len=0
.207 answers back -> 335      19.989077000      192.168.0.100      192.168.0.207      TCP      54      59763 > ssh [ACK] Seq=1 Ack=1 Win=65536 Len=0
368      21.389081000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
555      23.389142000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
765      27.389213000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
970      35.389339000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
1307      51.389592000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
1417      58.524460000      192.168.0.207      192.168.0.100      TCP      60      ssh > 58009 [ACK] Seq=1 Ack=1 Win=2532 Len=0
3777      133.529204000      192.168.0.207      192.168.0.100      TCP      60      [TCP Dup ACK 1417#1] ssh > 58009 [ACK] Seq=1 Ack=1 Win=2532 Len=0
8144      208.534869000      192.168.0.207      192.168.0.100      TCP      60      [TCP Dup ACK 1417#2] ssh > 58009 [ACK] Seq=1 Ack=1 Win=2532 Len=0
12679      283.539682000      192.168.0.207      192.168.0.100      TCP      60      ssh > 58009 [RST, ACK] Seq=2 Ack=1 Win=2532 Len=0


I have no idea why i get a reset back.  I'm in the dark :(
iptables -L on .207 gives nothing.

This is a successfull attempt, then dropping after a while:
48      2.635152000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59999 [SYN, ACK] Seq=0 Ack=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
76      3.911437000      192.168.0.100      192.168.0.207      TCP      66      60000 > ssh [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
77      3.911732000      192.168.0.207      192.168.0.100      TCP      66      ssh > 60000 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
78      3.911834000      192.168.0.100      192.168.0.207      TCP      54      60000 > ssh [ACK] Seq=1 Ack=1 Win=65536 Len=0
then i wait a while, i see some traffic and then this happens:
2165      113.070976000      192.168.0.207      192.168.0.100      SSHv2      106      Encrypted response packet len=52
2166      113.125807000      192.168.0.100      192.168.0.207      TCP      54      60000 > ssh [ACK] Seq=1917 Ack=3640 Win=65536 Len=0
2168      113.126155000      192.168.0.207      192.168.0.100      TCP      60      ssh > 60000 [RST] Seq=3640 Win=0 Len=0
2169      113.288979000      192.168.0.207      192.168.0.100      SSHv2      106      [TCP Retransmission] Encrypted response packet len=52
repeated a few times
and ends with:
4660      225.774711000      192.168.0.207      192.168.0.100      SSHv2      106      [TCP Retransmission] Encrypted response packet len=52
0
Comment
Question by:PlusIT
  • 10
  • 6
  • 2
18 Comments
 
LVL 10

Author Comment

by:PlusIT
ID: 38830837
i kept the monitoring open this time: and now this pop up in wireshark

174601      699.510273000      192.168.0.207      192.168.0.100      SSHv2      106      [TCP Retransmission] Encrypted response packet len=52
296330      819.832237000      192.168.0.207      192.168.0.100      SSHv2      106      [TCP Retransmission] Encrypted response packet len=52

if more will be logged beside this one, I'll update this post.
0
 
LVL 118
ID: 38830862
are you using the VMXNET3 network interface?

if not change it, from the E1000 to the VMXNET3 network interface
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38830867
Hi

Can you try with disabled iptables?
Is the connection going through your firewall?

What does "cat /proc/sys/net/ipv4/tcp_sack" say?
and cat /proc/sys/net/ipv4/tcp_window_scaling?
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38830877
@hanccocka: we're using the e1000's
@unix86

- iptables -L is empty so i guess it's disabled?
- it's a local connection so no it does not pass a firewall (only a default gateway) and i'm testing from same subnet
- tcp_Sack = 1
- tcp_Windows_scaling = 1
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38830903
Can you also run

mtr --report hostname

Open in new window


This will gather some latency numbers.

You may change the cylces with option -c to gather performance over a longer time period.
Like

mtr -c 50 --report hostname

Open in new window

0
 
LVL 118
ID: 38830952
I would recommend switching to VMXNET3.
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38830974
from the machine .207 (hostname = sftp) I ran:

mtr --report sftp:
0% loss all rest are 0
mtr --report sftp -c 50:
0% loss all 0 excpt wrst = 0.1
mtr --report lt-cvdw.DOMAINOBSCURED.local (lt-cvdw was .100 in the tests)

this gives just a line:
HOST: SFTP with the colums like before but not data output
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38831003
@ hanccocka : issue remains the same when switchting the nic to v3
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38831023
Hm.  Can you do:

1. disable iptables. (service iptables stop). And try again
2. with enabled iptables disable sack (echo 0 > cat /proc/sys/net/ipv4/tcp_sack)

Is this only happening with ssh?
Do you see anything in the logfiles?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 10

Author Comment

by:PlusIT
ID: 38831061
no service iptables exists
tried echo 0 to tcp_Sack, rebooted + issue remains
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38831066
yes. because it does not survive a reboot :D You need to disable sack and then test it without reboot.

You can also just flush iptables with iptables -F. But I doubt you have iptables running
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38831126
issue remains :(
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38834255
Hi

You might want to check your ssh logs. One possability is to run sshd in debug mode with
sshd -d

Open in new window

0
 
LVL 10

Author Comment

by:PlusIT
ID: 38834264
it's not ssh.  FTP and telnet drop to.  It's must be something on tcp level as ping is contstant and ok
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38834269
Hi

Can you attach your wireshark or tcpdump?
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38839984
i'm very sorry I'm not allowed to share that kind of information on the internet.  I can do what you ask from me though and provide obscured logs.
0
 
LVL 10

Accepted Solution

by:
PlusIT earned 0 total points
ID: 38960940
well the solution was simple.. without my knowledge the old vm's on the old server got started up again.  Hence the strange behaviour.  Thx for your help though
0
 
LVL 10

Author Closing Comment

by:PlusIT
ID: 38972977
well the solution was simple.. without my knowledge the old vm's on the old server got started up again.  Hence the strange behaviour.  Thx for your help though
0

Featured Post

Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

Join & Write a Comment

Suggested Solutions

VM backup deduplication is a method of reducing the amount of storage space needed to save VM backups. In most organizations, VMs contain many duplicate copies of data, such as VMs deployed from the same template, VMs with the same OS, or VMs that h…
HOW TO: Upload an ISO image to a VMware datastore for use with VMware vSphere Hypervisor 6.5 (ESXi 6.5) using the vSphere Host Client, and checking its MD5 checksum signature is correct.  It's a good idea to compare checksums, because many installat…
Teach the user how to install log collectors and how to configure ESXi 5.5 for remote logging Open console session and mount vCenter Server installer: Install vSphere Core Dump Collector: Install vSphere Syslog Collector: Open vSphere Client: Config…
This Micro Tutorial steps you through the configuration steps to configure your ESXi host Management Network settings and test the management network, ensure the host is recognized by the DNS Server, configure a new password, and the troubleshooting…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now