[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 538
  • Last Modified:

If anyone has experience with VMware HA, the textbook answer to this question does not work.

If anyone has experience with VMware HA, the textbook answer to this question does not work.

Problem: Our customer has a VCenter and ESX hosts in an HA Cluster. Twice encountered was a problem where switches dropped links to the network which leave the switches online (link status positive) but without routes, and the ESX hosts entered into host isolation mode even though all ESX servers were online. Likewise the VMs were unable to communicate on the network because of the network isolation split brain for 15 minutes. The customer asked us to validate their new configuration and make recommendation for best practice.

Scenario 1  The faulty configuration
vSwitch1
                Service Console  VLAN 1 - 10.1.1.1
                Uplinks  2 Active uplinks on VLAN 1 connected to separate physical switches
                Switches run Spanning Tree

Scenario 2  Alternative
vSwitch1
                Service Console - VLAN 1  - 10.1.1.1
                Service Console 2  VLAN 2  - 10.1.2.1
                Uplinks  2 Active on VLAN 1 and 2 Standby on VLAN 2
                Switches run Spanning Tree and PortFast has been enabled

Is Scenario 2 the best config?
0
johnemyers
Asked:
johnemyers
1 Solution
 
larstrCommented:
Scenario 1 should be fine, but you should also enable portfast for reliable performance.

What kind of switches are you using? Are you sharing the SC pipe with any heavy trafic type of connection such as VMotion or ip storage?

Lars
0
 
Paul SolovyovskyCommented:
Most of the time you have isollation issues are due to DNS.  How are the hosts added to vCenter?  Are you sing IP address or FQDN for the hosts.  If using FQDN are the A record on the DNS zone?
0
 
ryder0707Commented:
Why your switches dropped the links? If this is the main issue dont you think you should fix this first?
During this period, if you connect a PC to the switch or port on specific vlan, can you ping default gateway for each vlan?
Btw, both scenarios are fine depending on the network design and you wont be needing stp on the ports connected the esx host, so enable portfast
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
za_mkhCommented:
Shouldn't STP be disabled for all ports going to ESX servers? 
Also what was the reason for the links being dropped?
If you use Scenario 2 ... you would need to configure the das.isolationadress, etc
This is explained here: http://www.yellow-bricks.com/vmware-high-availability-deepdiv/


0
 
larstrCommented:
You should either disable STP or enable portfast for the ESX ports.

As mentioned above here, DNS is a very important component for HA to work correctly and the hosts needs to be able to resolve both FQDN and hostnames for each other.

Have you checked the logs of your DNS server in these time periods?

Lars
0
 
johnemyersAuthor Commented:
Looks like DNS is the issue.  Thank you very much for the quick reply and most important the resolution.
0
 
johnemyersAuthor Commented:
Great work
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now