If anyone has experience with VMware HA, the textbook answer to this question does not work.

If anyone has experience with VMware HA, the textbook answer to this question does not work.

Problem: Our customer has a VCenter and ESX hosts in an HA Cluster. Twice encountered was a problem where switches dropped links to the network which leave the switches online (link status positive) but without routes, and the ESX hosts entered into host isolation mode even though all ESX servers were online. Likewise the VMs were unable to communicate on the network because of the network isolation split brain for 15 minutes. The customer asked us to validate their new configuration and make recommendation for best practice.

Scenario 1  The faulty configuration
vSwitch1
                Service Console  VLAN 1 - 10.1.1.1
                Uplinks  2 Active uplinks on VLAN 1 connected to separate physical switches
                Switches run Spanning Tree

Scenario 2  Alternative
vSwitch1
                Service Console - VLAN 1  - 10.1.1.1
                Service Console 2  VLAN 2  - 10.1.2.1
                Uplinks  2 Active on VLAN 1 and 2 Standby on VLAN 2
                Switches run Spanning Tree and PortFast has been enabled

Is Scenario 2 the best config?
John MyersConsultantAsked:
Who is Participating?
 
larstrCommented:
You should either disable STP or enable portfast for the ESX ports.

As mentioned above here, DNS is a very important component for HA to work correctly and the hosts needs to be able to resolve both FQDN and hostnames for each other.

Have you checked the logs of your DNS server in these time periods?

Lars
0
 
larstrCommented:
Scenario 1 should be fine, but you should also enable portfast for reliable performance.

What kind of switches are you using? Are you sharing the SC pipe with any heavy trafic type of connection such as VMotion or ip storage?

Lars
0
 
Paul SolovyovskySenior IT AdvisorCommented:
Most of the time you have isollation issues are due to DNS.  How are the hosts added to vCenter?  Are you sing IP address or FQDN for the hosts.  If using FQDN are the A record on the DNS zone?
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
ryder0707Commented:
Why your switches dropped the links? If this is the main issue dont you think you should fix this first?
During this period, if you connect a PC to the switch or port on specific vlan, can you ping default gateway for each vlan?
Btw, both scenarios are fine depending on the network design and you wont be needing stp on the ports connected the esx host, so enable portfast
0
 
za_mkhCommented:
Shouldn't STP be disabled for all ports going to ESX servers? 
Also what was the reason for the links being dropped?
If you use Scenario 2 ... you would need to configure the das.isolationadress, etc
This is explained here: http://www.yellow-bricks.com/vmware-high-availability-deepdiv/


0
 
John MyersConsultantAuthor Commented:
Looks like DNS is the issue.  Thank you very much for the quick reply and most important the resolution.
0
 
John MyersConsultantAuthor Commented:
Great work
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.