Link to home
Create AccountLog in
Avatar of AllDaySentry
AllDaySentry

asked on

VMware HA Failover Fails with Insufficient Resources Error

I have two identical physical servers of the same type being used with VMware.  Both servers are in the same cluster and set up for high availability.  One server is running all the VM's and the other server is idle until a failover is initiated.  

On a couple of failover instances, I have received the error message "Insufficient resources to satisfy HA failover level...." and some of the VM's do not failover.  I have tried a couple different HA settings such as:

Host failure cluster tolerates: 1
Specify a failover host: (I select the backup server)

I have received the error with both of the settings.  This seems pretty straightforward with two physical servers and one as a backup.  

What would cause some of the VM's to not failover properly?
What is the preferred setting?  
Should I disable admission control since it doesn't really apply to my environment?  


Running esxi 4.1 with a virtualized vcenter
Memory capacity is currently at about 80% on the physical server
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Do you have any reservations?

reduce resources used by VMs, eg cpusband memory.

Quite Simply, your second host does not have enough resources for ALL your VMs.
Hi ALLDAYSENTRY,

Yes, you're correct, disable admission control.  In a two server cluster scenario, such as yours, I always disable it and HA still works as it should.

I also have a two server setup that I manage and I have it setup that way.  Just make sure that in the event you need to run off of one host, that you have suffcient memory and CPU to handle the load.  In my case, I do, but I still got the error no matter what, so I disabled admission control.

Assuming that your virtualized vCenter is running in the same cluster you're trying to protect, we both have the same caveat.  vCenter HAS to be online to detect an HA event.

For example, if the particular ESX/ESXi host that is running vCenter happens to go down, it takes down vCenter and no HA activity will occur.  In the event that that happens, you'll have to login to the remaining ESX/ESXi host directly with the vSphere client and add the machines that were running on the failed host to inventory (browse shared datastore, right click VMX and add to inventory).

I hope this info helps.
@heeneemo vCenter is used to Configure HA, it does not control it, the HA Agents on the Hosts control HA.

e.g. we can pull the power on ALL our Clusters, we use vCenter as a VM on all our Clusters, and the Hosts will be restarted and all VMs, including vCenter Server, and vCenter Server was not online!

So, while HA, by design, will respond to failures without vCenter, HA relies on vCenter to be available to configure or monitor the cluster.

Source
http://www.yellow-bricks.com/vmware-high-availability-deepdiv/
Avatar of AllDaySentry
AllDaySentry

ASKER

hanccocka,

I don't have any reservations set up.  The two servers are identical physically so whatever is running on one, should have no problem failing over to the second.

heeneemo,

Do you have it set up to fail over to the specific host or set up to tolerate 1 host failure?  I thought the HA agent running on the actual physical host could initiate the startup of the VM's if it lost the heartbeat to the other host and still had a connection to the gateway.  Otherwise, if the one host running vCenter died, how would any failover happen?
Thanks hanccoka.  I haven't tested HA in a while but I remember when I pulled the plug on the server that was running vCenter, nothing happenend.  What you stated is true, I appreciate it and I'll have to test again.
@heeneemo VMware HA not setup correctly. Anyway back to question.

Admission control will prevent the following if they encroach into the resources reserved for virtual machines restarted due to failure:

• The power-on of new virtual machines
• Changes of virtual machine memory or CPU reservations
• A vMotion of a virtual machine into the cluster from another cluster

see also here

http://www.yellow-bricks.com/2012/12/04/insufficient-resources-to-satisfy-ha-failover-level-on-cluster/
User generated imageIt looks like hanccoka provided correct information regarding how HA is handled when vCenter is not arround, so it seems as long as HA is configured, vCenter does NOT have to be online for HA events to occur in the case of a host failure.

AllDaySentry,

Once you disable adminission control, the policy settings (host failures to tolerate, etc.) become unavailable.  With admission control disabled, it will power on VMs no matter what, or at least to where all available CPU and memory resources have been exhausted.

I've attached a screen of my HA settings.
Thanks.  You are right.  I originally had it set to disabled which greys out everything.  When I had a problem with that, I switched it to enabled and tried the Host failures and Specify failover settings.

I am going to switch back to disabling admission control since it's not an issue with my two host configuration in active / standby.  

Thats all I need to do right?  I'm still not sure why everything did not fail over properly previously so maybe I will need to do some testing one night and if it happens again, get the logs out.
SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
ASKER CERTIFIED SOLUTION
Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
Thank you both for the suggestions.  I disabled the admission control and I'm going to make sure I put the DNS entries in the hosts file since the AD/DNS is also a VM.  

I may consider doing a real-world test and unplug the server to see everything fail over.
Cool deal!  You're welcome.