Solved

esxi VM moved to different clustered host in power off condition, why?

Posted on 2013-11-28
27
571 Views
Last Modified: 2013-12-16
This is the same 2-node esxi 5.5 hosts setup for HA in my previous posted questions. 2 Volumes were made available from iscsi san storage. both esxi hosts can simultaneously access to both volumes. In volume 1, 2 VMs are stored, which 3 VMs are stored in volume 2.

Using vCenter, a cluster is formed to take care of the above 2 hosts and 5 VMs. I also configured the 2 VMs that stored in volume 1, hosted by esxi host 1; whereas, other 3 VMs hosted by esxi host 2.

Now, in order to test the cluster, I shut down the esxi host 1, I can see that 2 VMs from volume 1, are not hosted by esxi host 2, but, both VMs are in the power off state. Can I set in such the way that these VMs, while migrating, is in the power on state?

thanks in advance.
0
Comment
Question by:MichaelBalack
  • 13
  • 10
  • 4
27 Comments
 
LVL 117
ID: 39683174
VMware HA - restarts VMs on a Host Failure.

otherwise, you will need to migrate the VMs off the host before shutting down!

Shutting down a Host is not a host failure, because it's controlled.

Just pull out the power cable to simulate a Host Failure.
0
 
LVL 1

Author Comment

by:MichaelBalack
ID: 39683222
Okay, will try it tomorrow while onsite
0
 
LVL 5

Expert Comment

by:Steve M
ID: 39683904
In case you don't like pulling the power cord on your server, you can also pull the Network cables, or disable the switch ports they are connected to initiate an HA failover (as long as host monitoring is enabled in HA).
0
 
LVL 1

Author Comment

by:MichaelBalack
ID: 39685115
Hi isk-ck,

Tried pull out the power cable, and all VMs failover with system reboot. Does this behaviour normal?

There are 3 nic in nic teaming configured in the same vswitch fot all vm. I pulled out alk 3cables, and there wasn't failover, why? The vm wasn't powered off
0
 
LVL 117
ID: 39685129
CORRECT - VMware HA, RESTARTS the VMs on other HOSTS!

Completely Normal for VMware HA

They are not rebooted, they have failed......because the host has failed, so they are restarted on new hosts!
0
 
LVL 1

Author Comment

by:MichaelBalack
ID: 39685140
Hi Hanccocka,

That means pulling out the cables doesn't trigger a failover?
0
 
LVL 117
ID: 39685174
We normally test by pulling out the power.

VMs should then restart on other hosts.
0
 
LVL 1

Author Comment

by:MichaelBalack
ID: 39685178
Hi Hanccocka,

How about if all the network ports/cables have  to take as well?
0
 
LVL 117
ID: 39685187
Not quite sure I understand "How about if all the network ports/cables have  to take as well? "
0
 
LVL 1

Author Comment

by:MichaelBalack
ID: 39685195
Hi Hanccocka,

That means if all the nic for vm are detected offline, can ha take care and trigger a failover?
0
 
LVL 117

Accepted Solution

by:
Andrew Hancock (VMware vExpert / EE MVE) earned 300 total points
ID: 39685325
see here for testing

VMware KB: Simulating VMware High Availability failover

There are several tests:-

Host Failure and  a Network Isolation type of failure, what you simulated was a  Network Isolation type of failure, e.g. network fault, and if working, the VMs should have been restarted on another server.
0
 
LVL 5

Assisted Solution

by:Steve M
Steve M earned 200 total points
ID: 39685375
HA should absolutely take care of a network failure (wouldn't be very highly available if it didn't) - if you have redundant nics it should fail over internally on the nics, but if all the nics went offline then it should bring the guests on that host up on another host.

Do you run your vcenter server as a guest on one of the hosts or is it a separate physical server?
0
 
LVL 1

Author Comment

by:MichaelBalack
ID: 39685500
Hi isk-ck,

Vcenter ran as a vm on one of the host. Ever tested that it was able yo failover to another host on power failure.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 5

Expert Comment

by:Steve M
ID: 39685518
If I understand correctly, when vcenter is a vm on a host, if it is on the same host that you pulled the nic cables from, then it would likely still be able to communicate with the one host and vm's on that same vSwitch, so likely a failover would not be initiated. I've never actually tested that, but it would make sense.

Is your vcenter guest on the same host that you unplugged the nics?
0
 
LVL 1

Author Comment

by:MichaelBalack
ID: 39685527
Hi isk-ck,

I actually tried to pull cable of the host where vcenter wasn't located.
0
 
LVL 117
ID: 39685542
VMware HA is conducted by the HA Agents on the Host Servers. vCenter Server is only used to configure VMware HA.

e.g. if the host fails, and vCenter Server is a VM, HA Agents control restarting....
0
 
LVL 5

Expert Comment

by:Steve M
ID: 39685672
Ah thanks Hanccocka, I hoped it would be that way.

MichaelBalack; Using the vSphere WebClient, have you looked at the vSphere HA runtime information page?
(located by selecting your cluster, then monitor tab, then vSphere HA tab)

This page should show you if everything is configured - how many hosts are connected to the master, who is the master host, and what datastores are used for heartbeat, etc.

Maybe that will show something.
0
 
LVL 1

Author Comment

by:MichaelBalack
ID: 39688036
Hi both,

Thanks for showing all the guidelines.

Hanccocka pointed a very good guide - vCenter is only used for configuring HA, it doesn't need to make ha work.

I suspect the problem lies on the vSwitches. vSwitch0 had configured for VM, Management; vSwitch1 had configured for IP Storage (iSCSI), Storage heartbeat, and vMotion. I think I should configure vMotion on vSwitch0 instead of current vSwitch1.

Please see few of my corrective works to be done:

        1. Configure correct IP in Software > DNS and routing; as both hosts are added-in in
            IP addresses. Put in the correct FQDNs in internal DNS server
        2. on vSwitch0, no default gateway is configured. I will configure it to point to switch
        3. Move vMotion VMkernal port group to vSwicth0
        4. Review Cluster settings for Host monitoring, and VM monitoring
   
As Isk-ck pointed out, there is no reason the ha failover didn't occurred for the following tests:

        a. pull the power cable of the host
        b. disconnect the network cables
        c. off the all NICs

I did checked through the cluster summary, not abnormality found.
0
 
LVL 1

Assisted Solution

by:MichaelBalack
MichaelBalack earned 0 total points
ID: 39688046
May be I should share out the current setup on the networking - 2 network segments, one is network for VMs (production) and Management (host), using 172.16.100.0/24. The second one is for Storage - 172.16.0.0/24. 2 vSwitches are created, each targetted at one segment.

    vSwitch0:     vmkernel port group for management, no default gateway defined
                         vm port group for production
                         * 3 NICs bound

    vSwitch1:    vmkernel port group storage (iSCSI 1)
                        vmkernel port group storage (iSCSI 2)
                        vmkernle port group (Storage Heartbeat)
                        VMkernel port group for vmotion and Management, got default gateway
                        * 2 NICs bound
0
 
LVL 117
ID: 39688108
Can you upload screenshots of networking?

Are your default gateways reachable or the isolation address.

Networking DNS default gateways all have to be correct eg DNS resolution and reverse DNS
0
 
LVL 1

Author Comment

by:MichaelBalack
ID: 39688123
Hi hanccocka,

Please see the networking screenshots as attached.

On vSwitch0, default gateway/isolation address is not reachable or hence not defined
on vSwitch1, default gateway defined and pingeable.

DNS resolution and reverse DNs for  2 hosts? not defined

Would these be the root cause?
Networkings.docx
0
 
LVL 117
ID: 39688135
The default gateway is that which is defined on your management interface

Eg ip address and hostname of host.
Can this be pinged from all hosts?
0
 
LVL 117
ID: 39688141
Isolation address does not need to be the default gateway but usually is or can be any man interface which is reachable 24/7 but then must be specified also not having working DNS will not help
0
 
LVL 1

Author Comment

by:MichaelBalack
ID: 39688143
Hi hanccocka,

Okay, I will put those needed settings when i am onsite tomorrow. Will update you guys about the progress...
0
 
LVL 1

Author Comment

by:MichaelBalack
ID: 39711390
Hi hanccocka,

2 main changes I did: create a new VMkernel for vmotion on vSwitch0; and secondly, configure the DNS hosts and related IPs, and thirdly, change the default gateway.

Now, the testing on plugging off all the network cables, can triggered a host isolation, and subsequent a failover occurred.
0
 
LVL 117
ID: 39711397
Very good, it's often network configuration which causes HA to fail, and not failover!

Glad it's fixed.
0
 
LVL 1

Author Closing Comment

by:MichaelBalack
ID: 39721031
Thanks a lot on both experts, that provided details info/leads to eventually got the problem resolved.
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Join & Write a Comment

HOW TO: Install and Configure VMware vSphere Hypervisor 6.5 (ESXi 6.5), Step by Step Tutorial with screenshots. From Download, Checking Media, to Completed Installation.
In this article, I will show you HOW TO: Create your first Windows Virtual Machine on a VMware vSphere Hypervisor 6.5 (ESXi 6.5) Host Server, the Windows OS we will install is Windows Server 2016.
Teach the user how to use vSphere Update Manager to update the VMware Tools and virtual machine hardware version Open vSphere Client: Review manual processes for updating VMware Tools and virtual hardware versions: Create a new baseline group in vSp…
This video shows you how to use a vSphere client to connect to your ESX host as the root user. Demonstrates the basic connection of bypassing certification set up. Demonstrates how to access the traditional view to begin managing your virtual mac…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now