Solved

Tcp connection troubles on ubuntu after vmware conversion

Posted on 2013-01-29
18
769 Views
Last Modified: 2013-03-11
Hi,

We had an old vmware server on a 2008 server, this has 3 virtual machines.  One I installed and two that were installed before i was here.
We converted these a few weeks ago to esxi 5.0.

Since then we seem to have some issues on two of the three ubuntu vm's  (the one I installed works correctly).  

Network is up and i have a constant ping open from and to machine.207 (one that has issues).This is the issue: unstable tcp connections.  When you try to ssh to it it just drops the connection, second time does work sometimes.  When working in the ssh session you get disconnected randomly.

What I have done so far:
- upgraded all ubuntu's to latest 12.04 LTS
- disabled ipv6
- checked routing on .207 -> just has a default route to our firewall, routing is not the issue as this happens locally.
- checked arp = ok
- rebooted the machines -> issue is the same

Now I installed wireshark and this is what happens when it fails:
i see from my machine (.100) to .207 -> 335      19.989077000      192.168.0.100      192.168.0.207      TCP      54      59763 > ssh [ACK] Seq=1 Ack=1 Win=65536 Len=0
.207 answers back -> 335      19.989077000      192.168.0.100      192.168.0.207      TCP      54      59763 > ssh [ACK] Seq=1 Ack=1 Win=65536 Len=0
368      21.389081000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
555      23.389142000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
765      27.389213000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
970      35.389339000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
1307      51.389592000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59763 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
1417      58.524460000      192.168.0.207      192.168.0.100      TCP      60      ssh > 58009 [ACK] Seq=1 Ack=1 Win=2532 Len=0
3777      133.529204000      192.168.0.207      192.168.0.100      TCP      60      [TCP Dup ACK 1417#1] ssh > 58009 [ACK] Seq=1 Ack=1 Win=2532 Len=0
8144      208.534869000      192.168.0.207      192.168.0.100      TCP      60      [TCP Dup ACK 1417#2] ssh > 58009 [ACK] Seq=1 Ack=1 Win=2532 Len=0
12679      283.539682000      192.168.0.207      192.168.0.100      TCP      60      ssh > 58009 [RST, ACK] Seq=2 Ack=1 Win=2532 Len=0


I have no idea why i get a reset back.  I'm in the dark :(
iptables -L on .207 gives nothing.

This is a successfull attempt, then dropping after a while:
48      2.635152000      192.168.0.207      192.168.0.100      TCP      66      ssh > 59999 [SYN, ACK] Seq=0 Ack=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
76      3.911437000      192.168.0.100      192.168.0.207      TCP      66      60000 > ssh [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
77      3.911732000      192.168.0.207      192.168.0.100      TCP      66      ssh > 60000 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=16
78      3.911834000      192.168.0.100      192.168.0.207      TCP      54      60000 > ssh [ACK] Seq=1 Ack=1 Win=65536 Len=0
then i wait a while, i see some traffic and then this happens:
2165      113.070976000      192.168.0.207      192.168.0.100      SSHv2      106      Encrypted response packet len=52
2166      113.125807000      192.168.0.100      192.168.0.207      TCP      54      60000 > ssh [ACK] Seq=1917 Ack=3640 Win=65536 Len=0
2168      113.126155000      192.168.0.207      192.168.0.100      TCP      60      ssh > 60000 [RST] Seq=3640 Win=0 Len=0
2169      113.288979000      192.168.0.207      192.168.0.100      SSHv2      106      [TCP Retransmission] Encrypted response packet len=52
repeated a few times
and ends with:
4660      225.774711000      192.168.0.207      192.168.0.100      SSHv2      106      [TCP Retransmission] Encrypted response packet len=52
0
Comment
Question by:PlusIT
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 6
  • 2
18 Comments
 
LVL 10

Author Comment

by:PlusIT
ID: 38830837
i kept the monitoring open this time: and now this pop up in wireshark

174601      699.510273000      192.168.0.207      192.168.0.100      SSHv2      106      [TCP Retransmission] Encrypted response packet len=52
296330      819.832237000      192.168.0.207      192.168.0.100      SSHv2      106      [TCP Retransmission] Encrypted response packet len=52

if more will be logged beside this one, I'll update this post.
0
 
LVL 121
ID: 38830862
are you using the VMXNET3 network interface?

if not change it, from the E1000 to the VMXNET3 network interface
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38830867
Hi

Can you try with disabled iptables?
Is the connection going through your firewall?

What does "cat /proc/sys/net/ipv4/tcp_sack" say?
and cat /proc/sys/net/ipv4/tcp_window_scaling?
0
Migrating Your Company's PCs

To keep pace with competitors, businesses must keep employees productive, and that means providing them with the latest technology. This document provides the tips and tricks you need to help you migrate an outdated PC fleet to new desktops, laptops, and tablets.

 
LVL 10

Author Comment

by:PlusIT
ID: 38830877
@hanccocka: we're using the e1000's
@unix86

- iptables -L is empty so i guess it's disabled?
- it's a local connection so no it does not pass a firewall (only a default gateway) and i'm testing from same subnet
- tcp_Sack = 1
- tcp_Windows_scaling = 1
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38830903
Can you also run

mtr --report hostname

Open in new window


This will gather some latency numbers.

You may change the cylces with option -c to gather performance over a longer time period.
Like

mtr -c 50 --report hostname

Open in new window

0
 
LVL 121
ID: 38830952
I would recommend switching to VMXNET3.
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38830974
from the machine .207 (hostname = sftp) I ran:

mtr --report sftp:
0% loss all rest are 0
mtr --report sftp -c 50:
0% loss all 0 excpt wrst = 0.1
mtr --report lt-cvdw.DOMAINOBSCURED.local (lt-cvdw was .100 in the tests)

this gives just a line:
HOST: SFTP with the colums like before but not data output
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38831003
@ hanccocka : issue remains the same when switchting the nic to v3
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38831023
Hm.  Can you do:

1. disable iptables. (service iptables stop). And try again
2. with enabled iptables disable sack (echo 0 > cat /proc/sys/net/ipv4/tcp_sack)

Is this only happening with ssh?
Do you see anything in the logfiles?
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38831061
no service iptables exists
tried echo 0 to tcp_Sack, rebooted + issue remains
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38831066
yes. because it does not survive a reboot :D You need to disable sack and then test it without reboot.

You can also just flush iptables with iptables -F. But I doubt you have iptables running
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38831126
issue remains :(
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38834255
Hi

You might want to check your ssh logs. One possability is to run sshd in debug mode with
sshd -d

Open in new window

0
 
LVL 10

Author Comment

by:PlusIT
ID: 38834264
it's not ssh.  FTP and telnet drop to.  It's must be something on tcp level as ping is contstant and ok
0
 
LVL 11

Expert Comment

by:un1x86
ID: 38834269
Hi

Can you attach your wireshark or tcpdump?
0
 
LVL 10

Author Comment

by:PlusIT
ID: 38839984
i'm very sorry I'm not allowed to share that kind of information on the internet.  I can do what you ask from me though and provide obscured logs.
0
 
LVL 10

Accepted Solution

by:
PlusIT earned 0 total points
ID: 38960940
well the solution was simple.. without my knowledge the old vm's on the old server got started up again.  Hence the strange behaviour.  Thx for your help though
0
 
LVL 10

Author Closing Comment

by:PlusIT
ID: 38972977
well the solution was simple.. without my knowledge the old vm's on the old server got started up again.  Hence the strange behaviour.  Thx for your help though
0

Featured Post

Why Off-Site Backups Are The Only Way To Go

You are probably backing up your data—but how and where? Ransomware is on the rise and there are variants that specifically target backups. Read on to discover why off-site is the way to go.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If we need to check who deleted a Virtual Machine from our vCenter. Looking this task in logs can be painful and spend lot of time, so the best way to check this is in the vCenter DB. Just connect to vCenter DB(default DB should be VCDB and using…
In this article, I will show you HOW TO: Install VMware Tools for Windows on a VMware Windows virtual machine on a VMware vSphere Hypervisor 6.5 (ESXi 6.5) Host Server, using the VMware Host Client. The virtual machine has Windows Server 2016 instal…
Teach the user how to configure vSphere Replication and how to protect and recover VMs Open vSphere Web Client: Verify vsphere Replication is enabled: Enable vSphere Replication for a virtual machine: Verify replicated VM is created: Recover replica…
Teach the user how to join ESXi hosts to Active Directory domains Open vSphere Client: Join ESXi host to AD domain: Verify ESXi computer account in AD: Configure permissions for domain user in ESXi: Test domain user login to ESXi host:

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question