High Availability and failover

i would like to know what is the time interval allowed between heartbeats initiated by ESX host before the failover kicks in.

In other words, if ESX hosts has not heard from on of the other ESX hosts in the cluster, for a certain period of time, then they can declare it down, and start rebooting the VMs residing on the Defunct Host , from other hosts

I also want to know if there is a network outage where one ESX host is located on, or 2 of the ESX hosts are located on, would this initiate reboot of VMs on the other hosts.
I know this is very rare, because there is switch redundancy, but it can happen..

Thanks
jskfanAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
i would like to know what is the time interval allowed between heartbeats initiated by ESX host before the failover kicks in.

In other words, if ESX hosts has not heard from on of the other ESX hosts in the cluster, for a certain period of time, then they can declare it down, and start rebooting the VMs residing on the Defunct Host , from other hosts

The Time Interval is 10 seconds, these values can be changed, but the defaults are recommeded by VMware.

I also want to know if there is a network outage where one ESX host is located on, or 2 of the ESX hosts are located on, would this initiate reboot of VMs on the other hosts.
I know this is very rare, because there is switch redundancy, but it can happen..

Yes, this can happen, because VMware HA, and the ESXi servers, are checking each other, and checking they can reach the default gateway.
0
jskfanAuthor Commented:
10 seconds , that sounds too short… this can cause reboot of VMs, that's what I believe……..
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
If you networking is that poor.

Is your networking and physical switches likely to be unavailale for 10 seconds?
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

jskfanAuthor Commented:
I got the following paragraph from Vmware:

=========
Network Isolation Addresses
A network isolation address is an IP address that is pinged to determine whether a host is isolated from the network. This address is pinged only when a host has stopped receiving heartbeats from all other hosts in the cluster. If a host can ping its network isolation address, the host is not network isolated, and the other hosts in the cluster have failed. However, if the host cannot ping its isolation address, it is likely that the host has become isolated from the network and no failover action is taken.
By default, the network isolation address is the default gateway for the host. Only one default gateway is specified, regardless of how many management networks have been defined. You should use the das.isolationaddress[...] advanced attribute to add isolation addresses for additional networks. See vSphere HA Advanced Attributes.
=================

what I do not understand is, when the host can OR cannot ping its default gateway , what would happen?
 I know most of environment do not dedicate a networked isolation address, since the default gateway address can be used..
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
If the gateway is not reachable, a worker process is started to determin to start HA failover...

VMware HA starts the process of deciding whether to start the VMware HA process.

e.g. a workflow procedure is started, eg. it starts writing to datastores, it checks if all Hosts in the Cluster are contactable, it does not just say, oh, cannot ping the gateway, therefore, I must now failover!!!!

different isolation addresses are used, normally the default gateway is used, because it shoudl always be available in your network.
0
jskfanAuthor Commented:
<<However, if the host cannot ping its isolation address, it is likely that the host has become isolated from the network and no failover action is taken.>>>
if you read the above excerpt from vmware, the way they stated it , is no failover action is taken when the host cannot ping the isolation address.
however the way I understand is the failover  will indeed take action when the host cannot ping the isolation address (assuming we are using DG only)
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Many intervals and timings are used to determine if and when to initiate a VMware HA failover.

(and you've got Host Failure and VM failure and restart)

The HA agents on the servers, also check they can reach Master and Slave HA Agents (FDM Agents).
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
jskfanAuthor Commented:
Thank you
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.