Solved

Guest VM network connection lost at random

Posted on 2011-03-03
13
3,601 Views
Last Modified: 2012-05-11
Our setup
 
We have two ESXi 4.1 (348481) servers, each connected to the LAN via three vSwitches:
- one vSwitch with a Service Console port group and 2 physical NICs
- one vSwitch with a VMotion port group and 2 physical NICs
- one vSwitch with a iSCSI port group and 2 physical NICs
- one vSwitch with several VMNetwork port groups, one for each VLAN (DMZ=VLAN 3, SERVERS=VLAN 10 and PG_APPS=VLAN 0 (default), with 2 physical NICs in an active/active configuration and Trk11 and Trk12
 
All physical NICs connect to a HP Procurve 5406zl switch on which all the linked ports have the VLANs in use set in tagged mode. i.e.:
 
DEFAULT_VLAN=1       Trk11 and Trk12 are untagged
VLAN_3                    Trk11 and Trk12 are tagged
VLAN_10                    Trk11 and Trk12 are tagged
 
 
 
The vSwitches are set up as follows:
- Promiscuous Mode: Reject
- MAC Address Changes: Accept
- Forged Transmits: Accept
- Traffic shaping: disabled
- Load Balancing: Route based on IP Hash
- Network Failover Detection: Link status only
- Notify switches: Yes
- Failback: Yes
 
 
Our problem
 
Since very recently VM's on the VLAN 10 network randomly lose network connections. Windows does not show the link as disconnected, but still cannot get traffic in or out to other systems, except to guests that are on the same ESX server (which soft of makes sense as this traffic never actually touches the physical adapter). The really weird bits are:
 
- A single VM on one ESX may suddenly have this problem at any time, while the other VM's on the same ESX still work fine
- A single VM may have this problem on one NIC but not on both, or sometimes on both cards at the same time
- Neither Windows or VMware report any issues/events/etc.
- Restarting the disabling and enabling the NIC in within the VM reconnects it to the network.
 
Does anyone have experience with issues like these? Is this a known issue (I could not find any info on this while search through the discussions here)?
Any help would be greatly appreciated!
0
Comment
Question by:thevirtualdude
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
13 Comments
 
LVL 5

Expert Comment

by:ianmellor
ID: 35029333
Hi,

Just a quick one, is your trunk configured as a trunk and not "LACP"?

Cheers,
0
 

Author Comment

by:thevirtualdude
ID: 35029352
Yes, correct Trk, not LACP
0
 
LVL 16

Expert Comment

by:danm66
ID: 35032797
is spanning tree protocol enabled?
0
How Do You Stack Up Against Your Peers?

With today’s modern enterprise so dependent on digital infrastructures, the impact of major incidents has increased dramatically. Grab the report now to gain insight into how your organization ranks against your peers and learn best-in-class strategies to resolve incidents.

 

Author Comment

by:thevirtualdude
ID: 35032799
Yes it is.
0
 
LVL 16

Expert Comment

by:danm66
ID: 35033687
One thing you might do is check the physical switch(es) when this happens and see if the arp tables contain the MAC address for the affected VM.  if that happens, you might try changing your load balancing back to originating port and see if the behavior stops.
0
 

Author Comment

by:thevirtualdude
ID: 35036317
@danm66 - I just checked and Spanning Tree is disabled on that switch. Think that might be it?
0
 
LVL 16

Expert Comment

by:danm66
ID: 35038102
0
 

Author Comment

by:thevirtualdude
ID: 35062576
Do you know if there is a way to just enable STP on just the ports that the ESX servers are connected to and not the entire switch? If so, thoughts on the command-line?

 I have a separate VLAN on the same switch dedicated to iSCSI traffic and think STP may not be good for iscsi.

Thank you so far for the assistance!
0
 
LVL 16

Expert Comment

by:danm66
ID: 35062647
Yes, you should be able to enable it by switchport.  Here's a configuration example for Cisco from http://kb.vmware.com/kb/1004074

interface GigabitEthernet1/2
switchport                                        (Set to layer 2 switching)
switchport trunk encapsulation dot1q  (ESX only supports dot1q, not ISL)
switchport trunk allowed vlan 10-100   (Allowed VLAN to ESX . Ensure ESX VLANs are allowed)
switchport mode trunk                       (Set to Trunk Mode)
switchport nonegotiate                      (DTP is not supported)
no ip address
no cdp enable                                  (ESX 3.5 supports CDP)
spanning-tree portfast trunk               (Enables portfast feature- port forwarding)
0
 

Author Comment

by:thevirtualdude
ID: 35071500
A reference to another KB within the one you mention above: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003804

...mentions that another way to resolved that issue would be to disable STP, which in my case is true. Mine is disabled and I still experience the issue.

I am a little nervous about enabling STP on the switch only because I have an iSCSI vlan on the same switch. I need to confirm that I can enable STP per port on an HP procurve switch.
0
 
LVL 16

Accepted Solution

by:
danm66 earned 250 total points
ID: 35071961
http://www.hp.com/rnd/support/manuals/pdf/release_06628_07110/Bk2_Ch5_STP.pdf page 3

You can enable or disable STP on the following levels:
• Globally – Affects all ports on the device.
• Port-based VLAN – Affects all ports within the specified port-based VLAN. When you enable or disable STP within a port-based VLAN, the setting overrides the global setting. Thus, you can enable STP for the ports within a port-based VLAN even when STP is globally disabled, or disable the ports within a port-based VLAN when STP is globally enabled.

If you have a support contract with VMware, you may want to open a case with their Networking support team, just to make sure.
0
 
LVL 69

Expert Comment

by:Qlemo
ID: 35410399
This question has been classified as abandoned and is being closed as part of the Cleanup Program. See my comment at the end of the question for more details.
0

Featured Post

Ransomware: The New Cyber Threat & How to Stop It

This infographic explains ransomware, type of malware that blocks access to your files or your systems and holds them hostage until a ransom is paid. It also examines the different types of ransomware and explains what you can do to thwart this sinister online threat.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

When converting a physical machine to a virtual machine using VMware vCenter Converter Standalone or vCenter Converter Enterprise, if an adapter type is not selected during the initial customization the resulting virtual machine may contain an IDE d…
Background Information Recently I have fixed file server permission issues for one of my client. The client has 1800 users and one Windows Server 2008 R2 domain joined file server with 12 TB of data, 250+ shared folders and the folder structure i…
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…
This video shows you how easy it is to boot from ISO images for virtual machines with the ISO images stored on a local datastore on the ESXi host.

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question