vSphere introduces a new type of memory management to the mix, memory compression. The following is a quick breakdown of how an ESX/ESXi 4.1 host manages memory at various levels and how memory resources are managed under the hood when memory contention comes into play. Clusters, resource pools and other vCenter level resource management techniques are beyond the scope of this article, I’ll save those for another day. For a full rundown, see VMware’s Understanding Memory Resource Management in VMware ESX4.
The reasoning behind these memory management techniques is to allow for higher memory utilization and to allow for a higher VM-to-host consolidation ratio by enabling memory overcommitment. All physical memory mapped to a VM is zeroed out prior to allocation, which prevents data leakage across VM boundaries.
Memory resource levels:
-Guest Virtual memory (guest OS): Memory presented to an application, granted by the guest OS which maps to guest physical RAM.
-Guest Physical memory (VM): Memory configured at the VM level, backed by Host RAM.
-Host Physical memory: Physical RAM installed on the host.
ESX provides memory translation between the guest physical and host physical memory in the form of a data structure called a pmap. ESX intercepts all VM instructions which modify the translation lookaside buffer (TLB) that maintains the mapping of guest virtual to guest physical memory, and stores the changes in Shadow Page Tables which are used to update the hardware TLB.
ESX has four main methods of managing memory.
- Transparent Page Sharing (TPS): TPS uses a hash of each 4KB page to build a global hash table, which can be scanned for matches quickly. When a match is found, a bit-by-bit comparison is performed to rule out a hash collision. If the pages are indeed a match, a single copy of the page is stored and all VMs mapped to the page are re-mapped to the shared copy. If a VM attempts to modify the shared memory page, a copy-on-write operation is initiated and that VM is remapped to it’s own modified copy of the page. The sharing and copy-on-write operations are not visible to the VM’s guest OS and cannot leak data between VMs. The time between memory scans and the amount of resources allocated to scanning are modifiable using the following advanced settings; Mem.ShareScanTime, Mem.ShareScanGhz and Mem.ShareRateMax.
When hardware-assisted memory virtualization is available (Intel EPT & AMD RVI), ESX will use large page tables (2MB) to back the 4KB guest physical tables. Doing so can increase performance, but will disable TPS due to the low liklihood of any two 2MB pages being identical and the overhead associated with a bit-by-bit comparison of two large tables should a match occur. The exception is when hypervisor swapping (discussed below) is used, in that case the large tables are broken up into small tables and TPS is used to minimize the amount of data being swapped to the hypervisor swap file.
- Hypervisor Swapping: When a VM is powered on, ESX creates a swap file for the VM equal to the size of the memory allocated. This swap file is stored with the VM’s files by default, but that is configurable on a per-host basis. During periods of memory contention, ESX has the ability to swap out VM memory to the VM swap file as a way of freeing physical memory to be used by other VMs. This the a last resort method of freeing memory on an ESX host, as the latency involved when accessing pages that reside in the swap file is much greater than that of pages stored within physical RAM, therefore the VM’s performance can be degraded substantially. The hypervisor has no way of knowing which guest physical pages should or should not be swapped, so ESX will randomly select pages to swap in an attempt to mitigate the overall performance hit to the VM.
- Ballooning: The balloon driver is included in the VMware Tools package which can (should) be installed within VMs, and gives the hypervisor a way to reclaim memory directly from a VM. The balloon driver polls the hypervisor through a private channel, and when needed it will ‘inflate’, which is to say it will consume guest physical memory, mark it as allocated and prevent it from being paged out by the guest OS. This process allows the memory pages consumed by the balloon driver to be re-allocated by the hypervisor to other VMs and serves to releive memory pressure on the ESX host.
- Memory compression: With ESX/ESXi 4.1, memory compression can be used instead of hypervisor swapping in some cases. A per-VM compression cache is created and accounted for by the VM’s guest memory usage(ie. it counts agasint the VM’s allocated memory), when a page is identified as a candidate for swapping ESX will attempt to compress it and place the compressed page in the VM’s compression cache which resides in memory vice disk. Should that page be accessed by a VM, which would cause a swapped page to be retrieved from disk, memory compression can simply decompress the page and swap it back into the VM memory very quickly. Since memory compression is only used when a page is a candidate for swapping, it does not take effect when there is not memory contention on the host and no swapping is required.
That is a quick run through of vSphere host level memory management basics, please comment if you would like to add something or make a correction.