The following article is comprised of the pearls we have garnered deploying Hyper-V since Virtual Server 2005 and subsequent 2008 RTM+ standalone and clustered.
We've been building standalone virtualization solutions since virtual server 2005.
We've been building Hyper-V virtualization solutions since Longhorn. We built out our first cluster not long after 2008 RTMd though it took about 6 months of life to figure the whole setup out!
Here are some points to consider when looking to build a virtualization solution whether standalone or clustered on Hyper-V.
- Server Management: Always install an RMM, iLO Advanced, or iDRAC Enterprise
- Out of band KVM over IP can save time in the event of an emergency
- Keep a USB flash drive plugged in that is bootable and is kept up to date with OS install files
- Rebuild the server host OS and settings without leaving the shop
- CPU: GHz over Cores
- Memory: Largest, fastest for CPU, prefer one stick per channel, and same size/speed on all channels
- BIOS: Enable all Intel/AMD Virtualization Settings
- BIOS: Disable C3/C6 States
- BIOS: Enable High Performance Profile
- Server performance and fan performance
- Disk subsystem: Hardware RAID, 1GB Cache, Non-Volatile or Battery backed
- Disk subsystem: SAS only, 10K spindles to start, and 8 or more preferable
- Go smaller sizes with higher quantities of disks
- RAID 6 with 90GB Logical Disk for OS and Balance for VHDX and ISO files
- Networking: Intel only, 2x Dual-Port NICs at the minimum
- We always install at least two Intel i350-T4 Gigabit NICs
- In cluster settings at least one x540-T2 for 10GbE Live Migration
- Two node clusters can have direct connect thus eliminating the expense of a 10GbE switch
- Networking: Teaming
- Team Port 0 on both NICs for management
- Team Port 1on both NICs for vSwitch (Not shared with host OS)
- OPTION: Team Port 0 for Management and bind one vSwitch per port to team _within_ VM OS
- Port 2: Live Migration Standalone
- Port 3: Live Migration Standalone
- Networking: Broadcom NICs Disable VMQ for ALL physical ports
- Hyper-V: Server Core has a reduced attack surface plus lower update count thus requiring fewer reboots
- Hyper-V: Fixed VHDX files preferred unless dedicated LUN/Partition
- We set a cut-off of about 12 VMs before we look to deploy one or two LUNs/Partitions for VHDX files
- We deploy one 75GB fixed VHDX for guest OS
- We deploy one 150GB+ dynamic VHDX for guest Data with a dedicated LUN/partition
- Hyper-V: Max vCPUs to Assign = # Physical Cores on ONE CPU - 1
- Hyper-V: Leave ~1.5GB physical RAM to the host OS
- Hyper-V: Set a 4192MB static Swap File for host OS on C:
- Hyper-V: Standalone preferred to keep Workgroup
- Use RMM, iLO Advanced, iDRAC Enterprise, or if needed RDP to manage
The C3/C6 states can actually impact Live Migration performance, Storage Performance, and more so it is best to disable them from get-go.
It is a good idea to enable High Performance mode for the server. Doing so enables a number of settings that improves data flow throughout the system as well as cooling profiles that help keep the system temperatures down.
Our preference is for Intel NICs since they tend to run a lot more stable than the Broadcom NICs do. Witness the issues with Broadcom firmware and drivers and VMQ. If Broadcom is in place then make sure to disable VMQ to improve network access performance to the VMs.
A minimum of 2 NICs should be in place. A pair of teams, one for management and one for the vSwitch, utilizing one port each on a dual-port NIC setup is best to protect against NIC failure. If using quad-port NICs then team port 0 on both NICs for management and team port 1-3 for the vSwitch on both NICs. It is preferable to _never_ use one NIC port dedicated to a VM. This defeats the redundancy virtualization brings to the table.
As an option team port 0 on the NIC pair for management of the host server. Then bind a vSwitch for each physical port on the NICs to utilize vNIC teaming from within the guest OS. A guest OS of Windows Server 2012 and up is required for vNIC teaming.
If there are on board Broadcom NICs they could be used for management but it is preferable to disable them in the BIOS.
Storage and CPU
We utilize fixed VHDX files. This gives us one contiguous file for each which eliminates fragmentation that dynamic VHDX files would give us over time. This point may be somewhat moot for a SAN or DAS with 60 spindles but for smaller setups under 24 one should keep in mind the need to make as much performance as is possible.
Without getting into the technical details for the CPU pipeline setup suffice it to say that one needs to keep in mind that one vCPU = 1 thread to the physical CPU. The Hyper-V vCPU threads _must_ be processed simultaneously. So, if we assign more vCPUs than there are physical cores in one CPU in a multi-processor setting (Hyper-Threading is not a consideration here) the CPU pipeline must juggle the extra threads to get them processed on the second or other CPU.
Juggling memory and NUMA is another consideration. vRAM assigned to a VM may get spread across RAM attached to different CPUs if we assign a large amount. The juggling of vRAM between RAM owned by different CPUs costs cycles. This in effect causes loss of performance for the VM.
Finally, it is our preference to run several configuration tests for a VM setup on a host we have just built. We run an assortment of tests to verify a setup prior to sending it out to a client. In fact, it is our policy to build the server configuration in-house, burn it in, and then test it with several VM setups before ever selling that configuration to a client.
I just published a SAS Connectivity Guide
on our blog. It includes pictures of how we cable up two nodes and one JBOD with directions for adding further nodes and JBODs.
We don't do it as a rule. However, make sure the network fabric is at least 10GbE with Jumbo Frames enabled on both the switches (two required at the minimum
) and on the 10GbE NIC ports.
Virtualization and Time
We have a number of time related posts on our blog that are important to note when setting up a virtualization platform or cluster:
Time is absolutely critical on any Windows domain. When time goes out of whack the whole network or workloads running on the network can go offline.
In a virtualization setting the operating system environment (OSE) has no physical point of reference for time like the CMOS clock. In a standalone or even a clustered setting one must disable time sync between the host and guests. This is critical since on a Windows domain there should only be one time source: The PDCe.
In a standalone setting we tend to set up the Hyper-V host as the time source for the guest PDCe. In a cluster setting we _always
_ deploy a physical DC that holds all FSMO Roles and is the domain time authority.
Microsoft Cluster MVP