VM Suddenly has no network connectivity

Running on a 2-node cluster, Vmware ESXi 5.5 the symtoms are that a VM becomes unreachable, no ping replies, and wireshark shows no replies to any kind of connection to the VM.  You can logon locally to the VM and then reboot it using the Console connection and then the VM becomes responsive again, but in a matter of a few hours, the VM loses network connectivity again.  The local VM Operating System, Windows Server 2008R2 still shows the VM has network connectivity during the problem.

If you reboot the VM, the problem temporarily goes away.

The problem also travels between ESX Hosts after a VMotion migration.  I have even shut the box down done a cold migration and restarted it.  Within hours the problem returns.

If anyone has seen this before or has any ideas on how to resolve this permanently, I'd be really happy to hear from you.

Thanks.
techadmnAsked:
Who is Participating?
 
Brett DanneyIT ArchitectCommented:
I have also seen this happen when a switch loses its vlan config. To prevent the VM connecting to a NIC that could be connected to a problematic switch I would suggest going to the properties of the vSwitch on VMware (under the Configuration tab, select networking) On the NIC Teaming tab the Network Failover Detection is by default set to link status only, change this to beacon probing. What that will do is not allow a VM to switch NICs at a vSwitch level if the required vlan is not available. If I was you I would do this regardless of if you are using vlans or not and see how the VM behaves after this.

As a side note I have seen this before when I had five vlans trunked per physical cable. Each vlan had a vSwitch and was tagged with its vlan ID. On some hosts what had happened was certain vlans had not been assigned to the port on the physical switch. When a VM migrated to a host having this issue it would lose connection to the network because the vlan was not present on the cable. Because all the vSwitches had two NICs assigned to each vSwitch the issue could present itself even if the VM did not migrate and the vSwitch directed traffic through a NIC that did not have the trunk assigned.

If you are using vlans I would check the physical switche config and make sure the all vlans are tagged and configured to the right port.
0
 
strivoliCommented:
The host logs could help. Do they report anything useful?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Do you have any IP Address conflicts?

Are you using the VMXNET3 interface ? (in the VM)

Can you still ping the host?
0
Network Scalability - Handle Complex Environments

Monitor your entire network from a single platform. Free 30 Day Trial Now!

 
techadmnAuthor Commented:
The Host logs show nothing untoward.

The Host still responds to pings on the management network and all the other VMs on the Same vSwitch still function as normal.  To add more detail, sometimes its not always the same VM that has the issue, but there is always connectivity to the VMs that dont have the issue.

Which ever machine this problem ends up on a reboot clears the fault but it comes back later on.

Do you think this could be possibly realted to the physical switch?  The VMs are uplinked to trunk port on the physical switch via the ESXi Host.

Thanks.
0
 
strivoliCommented:
Did you consider the option of more VMs having the same MAC address?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Yes, it could be if your trunk physical switch config, is wrong, or your teaming policy does not match trunk physical switch config
0
 
techadmnAuthor Commented:
Hello SagiEdoc,

Thanks for your comments.  The configuration you described with the VLANs in your experience with this problem is very similar to mine.  There are 4 VLANS trunked through a 2 port NIC in both ESXi Hosts.

2 UTP Cables per Host, a total of 4 Physical uplinks to the Switch.

Each VLAN has a vSwitch represented in the network configuration in ESXi.  Each VLAN is tagged with an ID.

I will certainly have a look and ensure that all switch ports are configured with the correct VLAN memberships and also take your advice on the NIC teaming properties, which as you correctly pointed out, are currently set to Link Status Only and not Beacon Probe, so I will adjust this and observe the behaviour.

Thanks for your help thus far.
0
 
Brett DanneyIT ArchitectCommented:
No problem. Take a look at the observed IP ranges on the NICs under the vSwitches. You should be able to pick up pretty quickly if a vlan is missing based on the ip ranges you can see.
0
 
techadmnAuthor Commented:
I've requested that this question be closed as follows:

Accepted answer: 0 points for techadmn's comment #a39876508

for the following reason:

The expert response given resolved the issue and the problem has now gone away.  I woud like to thank all of those people who kindly replied. The solution was supplied by SagiEDoc and I would like to extend this thanks to this expert.
0
 
strivoliCommented:
Dear requester, you should not close the question. You should Accept the expert's answer in order to award him.
0
 
Brett DanneyIT ArchitectCommented:
:( No points? That sucks a bit.
0
 
techadmnAuthor Commented:
Sorry I am trying to get it ajusted I clicked on the wrong bit - Apologies.
0
 
techadmnAuthor Commented:
Apologies, I don't often raise questions on Experts Exchange and have made an error in the question closure procedure.   I have pressed the request attention button and await a moderators' response.
0
 
techadmnAuthor Commented:
Thankyou all for your help and to the Moderators for allowing me to correct the points allocation.  Thanks again.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.