Vmware  Fail over question

mokkan
mokkan used Ask the Experts™
on
We are running 4 ESX hosts on our Farm and configured  200+ servers. All ESX hosts are clustered, last night few linux boxes moved to other ESX hosts on the farm and rebooted automatically. My question is that, when VM boxes move to other ESX hosts, it shouldn't reboot right? Why it got rebooted? Please advise.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
That is correct vMotion (DRS) should be seamless.

Check that the processors in the farm are all the same, are you also using EVC mode?
Do you have DRS enabled in you Cluster?

.Dave
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
Are you certain that the Linux boxes reboot at the exact same time as the DRS event.

It's easy to replicate, should you need to test, just manually vMotion the server back. (at a quiet time, and scheule downtime, just in case the Linux server should restart again).

I suspect that it's possible that you are maybe using different processors in the farm. EVC mode should be used to ensure vMotion compatibility instruction sets are used across the farm.

Success in ‘20 With a Profitable Pricing Strategy

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
I'm new to this. How do I check whether EVC mode enabled or not?  Also, I forget to provide some other info. All the VM boxes which were running on this specific ESX host got rebooted. Also, this specific ESX host failed.
Right Click on your Cluster --> Edit Settings and then under VMWare EVC you can see if it is enabled or disabled.

Author

Commented:
Thank you, let  me try. Anything to do with vmware tools update?
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
Okay, so the ESX host FAILED?

If the host failed, then the VMs would start-up in HA mode, so there would be a VM restart.

In HA, the VMs would startup on a new host in the farm, there will be an outage of 1m-2m, whilst the VMs, come online.

this is normal.
No it should have anything to do with VM Tools update

Can you confirm the question on DRS. If this isn't enables with EVC then your hosts will shut down when the ESX fails and then boot up again on another ESX Hosts
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
It's not VMware Tools, it's not vMotion, or EVC.

You farm is operating normally.

You need to find out why your ESX host failed and restarted.
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
The ESX Host Failed, Under HA, this is normal.
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:

Author

Commented:
yes, EVC is enabled, but if ESX host failed VM boxes shouldn't reboot right?
As per hanccocka's previous answer - the server's will reboot

If the host failed, then the VMs would start-up in HA mode, so there would be a VM restart.
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
Yes, that's correct. There is no way for ESX to know the server is about to fail, if it has VMs working on it. So if the ESX hosts fail, all the VMs on that ESX host will also fail.

BUT, they will be returned on other hosts automatically by HA, but there will be a few minutes of outage.

They don't reboot, the VMs have failed, they have crashed.

and then they will be re-started from cold.

Author

Commented:
Thanks a lot. Sorry for asking stupid question. How do I know it failed by Vmotion event or DRS event?
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
What failed? the ESX host or the VM.

VMs did not fail, they restarted due to ESX Host failure.

If you need advice on ESX host failure, that's off topic, and requires another question, to examine the various possibilities of why the ESX host failed.

Author

Commented:
ESX host was failed, because of that VM failed over to another ESX host, once it's moved or it got rebooted automatically. Is this Vmotion event or DRS event?  Is this clear?

Author

Commented:
Here is the line from your link.

• Automatic detection of server failures. Automate the monitoring
of physical server availability. HA detects server failures and
initiates the virtual machine restart without any human intervention.



Does it mean virtual machine should resatart? I'm really confused with the DOC.
Top Expert 2010
Commented:
This would be an HA event when the ESX host fails, not a DRS event.

For HA VMware restarts the vms that were running on the failed host on other hosts in the cluster, so it recognizes that the host fails, then powers on the vms on other hosts. It wasn't just a reboot of the vm, it was actually a fresh power on of the vm.

For DRS VMware will move vms around amongst the various hosts in the cluster in order to balance the load carried by any given host and thus optimize operations. This is a seamless event from the perspective of any vm.

Your cluster has behaved as I would expect it to.

Hope this helps
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
1. ESX host failed.
2. All VMs on the ESX host that failed also failed.
3. The HA function restarted the VMs on other ESX hosts.

Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
I would get another question raised, as to "Why ESX host had a failure?"

The software and operating system did what they were supposed to do.
Top Expert 2010
Commented:
It is normally impossible for VMware to seamlessly restart VMs on other hosts when an ESX host fails, because the memory image and all state information for that running vm is on the host that failed. The only thing that can be done is a fresh power on of the vm on another host. That is better than nothing.

I say normally because there is another facility within VMware called Fault Tolerance (FT) where you can run a mirror image of a vm concurrently on another ESX(i) host, and if the main host fails then the mirror vm takes over seamlessly. This may be used with highly critical workloads that need 100% availablility. However, there are restrictions - and this is a fairly "expensive" method in terms of resource requirements. An example restriction is only single virtual cpu vms are supoported. It is expensive in that it doubles your memory and cpu requirements for the workload (but not the disk).

Let me know if you want more information on FT.

Author

Commented:
Thanks a lot for all of your explanation. I'm just clarifying your third line:

When you say restart the VMS on other ESX host, it means VM gets reboot on other ESX host?  I'm confused with reboot and restart term, please correct me.

3. The HA function restarted the VMs on other ESX hosts.
Top Expert 2010
Commented:
It is actually a power on event that VMware uses to "restart the VMs on the other ESX hosts." Not just a "reboot" as in the case you reboot a powered on system - but more of a cold boot of a powered off vm.

Author

Commented:
Thank you. It means this even make  VMs to restart? The reason is My manager is asking why our VMs got rebooted? Is it normal event? Or there is a problem
Top Expert 2010
Commented:
It is a normal event, there is no problem with the way VMware is working. However there is some problem that caused the ESX(i) host to fail. You need to determine what the problem was there.
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
It's NORMAL. This is what HA does.

Open a question about Why your ESX host failed?

Author

Commented:
Thanks a lot guys.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial