Link to home
Start Free TrialLog in
Avatar of pzozulka
pzozulka

asked on

VMware HA: Advanced Options

I am reviewing the "Host Failures Cluster Tolerates Admission Control Policy", which says that it uses a formula to calculate the slot size of all the VMs on a host. Based on this calculation, HA can decide how many total hosts can fail, and still have enough resources available to support all other VMs.

It says the SLOT size is calculated using CPU and memory reservations. But, since we don't use CPU reservations it says it's going to use the default value of 32MHz, and you can change that value by using das.vmCpuMinMHz advanced attribute.

My Question:
If we leave the settings at the default value of 32MHz, will that result in a small SLOT size calculation, and result in all VMs being allowed to pass through the "Admission" policy?
Avatar of coolsport00
coolsport00
Flag of United States of America image

Well, kinda. What it means solely is that value, as well as the Memory Reservation value, is used to calculate how many slots a Host can have on it, based on those two values. So, if a slot size for CPU is 32MHz and you have a Host with a single CPU (socket) with 4 cores at 2.5GHz, that means you have 9 total GHz on that Host. Divide 9(GHz) by 32MHz & that'll give you slots for CPU. The same is used for Memory.

The Availability Guide shares more about this, as well as Duncan Epping/Frank Denneman's book - Cluster Deepdive

http://pubs.vmware.com/vsphere-51/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-51-availability-guide.pdf

Regards.
~coolsport00
1 slot = 1 VM basically; the Hosts in your Cluster will need to reserve as many slots as needed to failover your VMs in the event of a Host/Hosts failure. So, if your Cluster doesn't meet the reservation for either CPU *OR* Memory at some point, then if you were to try to power on a new VM in that cluster, it wouldn't power on because it would be against what you have configured for admission control.

~coolsport00
Avatar of pzozulka
pzozulka

ASKER

But essentially, since we don't use VM CPU or Memory reservations (we do use limits on some though), all slots will default to 32MHz, which basically will tell the host to allow all machines to be migrated in the event of a failure right?
Is this the recommended approach? We don't want to change the reservation settings on the VMs. Thus the default value will be 32MHz and all VMs will be migrated. That's fine unless we have 2 host failures leaving us with only 1 host.

Would it be wise to take the average of the VMs current memory allocation? I have 37 VMs, and some are currenlty using 9GB of memery, and some only using 50MB. I took the average, and it comes out to 1650MB per VM as far as memory goes. We have 3 hosts. Each host has 32GB RAM. 32000MB/1650MB = 19.85 slots. So if one host fails we will still have two hosts, or equivalent to 38 slots, which will accomodate all 37 VMs. However that's taking the average.
Actually, unless there is some SLA type need or a VM that really needs resources is getting cheated on those resources, reservations, & even limits (moreso limits) should never be used. The hypervisor is actually really good at scheduling its resources amongst all the VMs.

Anyway, if no reservations are set for CPU, the slot size is 32MHz; if none are used for RAM, the size is 0MB + amt of overhead for the VM based on # of vCPUs & amt of RAM it's configured for (overhead is fairly miniscule..about 100-200MB; some sample values are shown in the Res Mgmt Guide). Anyway, I think what you may be concerned about is that is the amt a VM will get if failed over, no? Again, a slot is nothing more than a logical value the Host...actually HA, uses to determine how many VMs can be failed over (i.e. restarted, not migrated) in the event of a Host(s) going down. Also, VMs don't get migrated with HA. DRS does VM migrations to equalize resource consumption across the Hosts in the Cluster. HA *restarts* VMs...just FYI.

Where the advanced setting to configure slot size comes into play is where you do have a VM or 2 that has an ungodly amt of say RAM reservation (say, like 10-20GB) whereas all the rest of your VMs have like 1-2GB reserved (again, for those that use reservations). The slot size will then be distorted because HA uses the largest amt configured, not an avg., for reservation for it's slot sizes. So, what happens is you wind up having less slots for VMs, thus having a diminished consolidation ratio of VMs per Host. Since you don't use reservations, you don't have any issue and need not be concerned.

Since you have such a small environment, it's actually quite easy to determine the % you need to set to accommodate a failover. Just take the Host that is utilizing the most resources of CPU & RAM and set that as a % for each (CPU & RAM) in HA Admin Ctrl. You can click on a Host on the left then look on the right (in the Summary tab > Resources section/box) and see the current Resources being consumed. Or, another quick & dirty way to do it is simply select 33% for CPU & RAM each. Because, what you need to accommodate your VMs in the event of a failover is 33% of your Cluster resources (in other words 1 Host lost out of 3). :)

Hope that helps. (clear as mud, right?) ha
~coolsport00
Clear as mud :)

Just to confirm, are you recommending we use "Percentage of cluster resources reserved as failover spare capacity = 33%" instead of "Host failures the cluster tolerates = #"
Yeah - that is actually the recommended option by VMware as it is more flexible.

~coolsport00
I just read the section of the article you provided that talks about the percent based admission, and it's a bit unclear.

Say we make ours = 33%.

If we focus on memory only, across 3 hosts we have 98GB of RAM. If we "reserved" 1650MB to each of the 37 VMs, that would consume 61GB of memory. Their forumla is ((98GB-61GB)/98GB) = 38%.

Because the cluster's Configured Failover Capacity is set to 33%, only 5% (38%-  33%) of the cluster's total memory resources are still available to power on additional virtual machines?

What will happen with the rest of the 33%, unused?

I know that since we don't use reservations, all machines will be admitted, I'm just asking because I fail to see the difference between Host Failures Cluster Tolerates policy and Percentage of Cluster Resources Reserved policy when reservations aren't used -- because all machines are admitted one way or another (unless we choose to use the advanced options?)
ASKER CERTIFIED SOLUTION
Avatar of coolsport00
coolsport00
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
So just to clarify:

But, that 33% can't be used for new or currently powered down VMs that you then decide to power on.

If I understand this correctly, the 33% CAN only be used for additional resources by currently powered ON machines, but is PRIMARILY there to accomodate the FAILED host's VMs?

If so, where does the admission policy draw the line that it can no longer "admit" new VMs (if we were using reservations of 2GB per VM, each of 3 hosts has 12 VMs, and we set it to 33%, and each host has 32GB RAM)?
Yes, you are correct. For new VMs, HA draws the line at 33%. No new VMs can be powered on. Resources for failover are guaranteed by HA.

~coolsport00