Guest VM network connection lost at random
Posted on 2011-03-03
We have two ESXi 4.1 (348481) servers, each connected to the LAN via three vSwitches:
- one vSwitch with a Service Console port group and 2 physical NICs
- one vSwitch with a VMotion port group and 2 physical NICs
- one vSwitch with a iSCSI port group and 2 physical NICs
- one vSwitch with several VMNetwork port groups, one for each VLAN (DMZ=VLAN 3, SERVERS=VLAN 10 and PG_APPS=VLAN 0 (default), with 2 physical NICs in an active/active configuration and Trk11 and Trk12
All physical NICs connect to a HP Procurve 5406zl switch on which all the linked ports have the VLANs in use set in tagged mode. i.e.:
DEFAULT_VLAN=1 Trk11 and Trk12 are untagged
VLAN_3 Trk11 and Trk12 are tagged
VLAN_10 Trk11 and Trk12 are tagged
The vSwitches are set up as follows:
- Promiscuous Mode: Reject
- MAC Address Changes: Accept
- Forged Transmits: Accept
- Traffic shaping: disabled
- Load Balancing: Route based on IP Hash
- Network Failover Detection: Link status only
- Notify switches: Yes
- Failback: Yes
Since very recently VM's on the VLAN 10 network randomly lose network connections. Windows does not show the link as disconnected, but still cannot get traffic in or out to other systems, except to guests that are on the same ESX server (which soft of makes sense as this traffic never actually touches the physical adapter). The really weird bits are:
- A single VM on one ESX may suddenly have this problem at any time, while the other VM's on the same ESX still work fine
- A single VM may have this problem on one NIC but not on both, or sometimes on both cards at the same time
- Neither Windows or VMware report any issues/events/etc.
- Restarting the disabling and enabling the NIC in within the VM reconnects it to the network.
Does anyone have experience with issues like these? Is this a known issue (I could not find any info on this while search through the discussions here)?
Any help would be greatly appreciated!