Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.
One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.
We've been building standalone virtualization solutions since Microsoft Virtual Server 2005.
We've been building Hyper-V virtualization solutions since Longhorn (Server 2008 Pre-Release bits). We built out our first cluster not long after 2008 RTMd on the Intel Modular Server platform though it took about 6-9 months of life to figure the whole setup out!
Here are some points to consider when looking to build a virtualization solution whether standalone or clustered on Hyper-V.
The C3/C6 states can actually impact Live Migration performance, Storage Performance, and more so it is best to disable them from get-go.
It is a good idea to enable High Performance mode for the server. Doing so enables a number of settings that improves data flow throughout the system as well as cooling profiles that help keep the system temperatures down.
In our testing Hyper-Threading has not made much of a difference. Think of it this way: The 401/Interstate with 8 lanes (physical cores on CPU) spreads out to 16 lanes (8 cores + virtual cores/Hyper-Threads = 16). At some point those extra eight lanes of traffic need to merge back in!
Our preference is for Intel NICs since they tend to run a lot more stable than the Broadcom NICs do. Witness the issues with Broadcom Gigabit firmware and drivers and VMQ. If Broadcom is in place then make sure to disable VMQ to improve network access performance to the VMs.
NOTE: For Broadcom Gigabit drivers one needs to verify that VMQ is still set to DISABLED after a driver update.
A minimum of 2 NICs should be in place. A pair of teams, one for management and one for the vSwitch, utilizing one port each on a dual-port NIC setup is best to protect against NIC failure. If using quad-port NICs then team port 0 on both NICs for management and team port 1-3 for the vSwitch on both NICs. It is preferable to _never_ use one NIC port dedicated to a VM. This defeats the redundancy virtualization brings to the table.
As an option team port 0 on the NIC pair for management of the host server. Then bind a vSwitch for each physical port on the NICs to utilize vNIC teaming from within the guest OS. A guest OS of Windows Server 2012 and up is required for vNIC teaming.
If there are on board Broadcom NICs they could be used for management but it is preferable to disable them in the BIOS.
We utilize both dynamic and fixed VHDX files. What file configuration we use depends on how our storage is set up.
Rule of Thumb: 80GB Fixed VHDX File for Guest OS Install
Rule of Thumb: 50GB-500GB Fixed VHDX for guest's second partition for apps/data
Rule of Thumb: The largest VHDX file gets a dynamic VHDX file (Gigabytes into Terabytes into Petabytes)
We tend to setup our storage with smaller VHDX files getting a fixed VHDX with the last VHDX to get created being the huge dynamic VHDX file attached to the file services or other LoB VM.
We do this to save on fragmentation. By creating our fixed VHDX files first we have a set of contiguous files on our storage that won't get fragmented across their lifetime. By leaving the big one dynamic we don't have to worry about moving a huge fixed VHDX file around if we're in a migration scenario or a disaster recovery scenario.
Important NOTE: The Volume Shadow Copy (VSS) services have a 64TB limit! Keep this in mind when planning out a large storage repository.
The best image for how the CPU pipeline works is an Interstate with 1 Lane = 1 Physical Core. We don't count Hyper-Threading, as mentioned above, because that's like spreading 8 lanes out to 16 but having to merge all that traffic back down to 8 at some point. There is no real performance gain to be had there.
Now, there is a concrete barrier between the eight lanes in 1 CPU and the eight lanes in the CPU in a dual CPU system. That path between the two sets of lanes is call the InterConnect on Intel CPUs.
To the physical CPU, (1) one virtual CPU = 1 Thread = 1 Core
Rule of thumb: The physical CPU pipeline must process all vCPU Threads in parallel.
So, a VM with two vCPUs = 2 Threads side-by-side.
So, a VM with four vCPUs = 4 Threads side-by-side.
So, a VM with six vCPUs = 6 Threads side-by-side.
If we assign 9 vCPUs to a VM in a dual 8 core system the physical CPU pipeline ends up having to juggle the extra thread across the InterConnect path in order to have them all in parallel. This costs big time.
Rule of thumb: Adding more vCPUs does not = more performance!
Our Rule of Thumb: Maximum # vCPUs for 1 VM = # Physical Cores - 1.
So, in the case of a dual eight core server we'd assign a maximum of 7 vCPUs to a VM.
Rule of Thumb: The more vCPUs = Wider parallel thread count = Harder to get into the CPU pipeline!
That 7 vCPU VM leaves 1 core for either an OS thread or a single vCPU VM thread.
Here’s a simple way to look at NUMA: Each processor has at least one memory controller. One memory controller = one NUMA node. Higher end processors have more than one memory controller built-in to the CPU.
So, how do we look at it?
If a dual processor system with one memory controller per processor has 256GB of RAM then 128GB belongs to each NUMA node.
If a dual processor system with two memory controllers per processor has 256GB of RAM then 64GB belongs to each NUMA node.
An analogy: One NUMA node = One Room in a house.
We can only fit so much in the one room.
A VM with no NUMA awareness must have its vRAM fit in that one room plus there must be enough free space in that room to fit its vRAM.
A VM with NUMA awareness can have more vRAM assigned to it than is available in one NUMA node.
That VM can have its stuff spread across two or more rooms (NUMA Nodes) depending on the amount of vRAM assigned to it.
Now one catch: Having more vRAM assigned to the VM than is available in one room means having to juggle things between rooms. There is a performance hit for that. Moving stuff between rooms costs CPU cycles. To the CPU, that means moving memory bits across the bus between CPUs and memory controllers.
Now another catch: Having VMs with large amounts of vRAM assigned to them can cause “out of memory” errors when the physical server is not able to juggle free space in the rooms to allow it to start. There may indeed be more than enough “free” physical RAM in the box but the VM won’t start because of the way that free RAM is distributed across NUMA nodes.
Finally, it is our preference to run several configuration tests for a VM setup on a host we have just built. We run an assortment of tests to verify a setup prior to sending it out to a client. In fact, it is our policy to build the server configuration in-house, burn it in, and then test it with several VM setups before ever selling that configuration to a client.
I just published a SAS Connectivity Guide on our blog. It includes pictures of how we cable up two nodes and one JBOD with directions for adding further nodes and JBODs.
We don't do it as a rule. However, make sure the network fabric is at least 10GbE with Jumbo Frames enabled on both the switches (two required at the minimum) and on the 10GbE NIC ports.
We have a number of time related posts on our blog that are important to note when setting up a virtualization platform or cluster:
Time is absolutely critical on any Windows domain. When time goes out of whack the whole network or workloads running on the network can go offline.
In a virtualization setting the operating system environment (OSE) has no physical point of reference for time like the CMOS clock. In a standalone or even a clustered setting one must disable time sync between the host and guests. This is critical since on a Windows domain there should only be one time source: The PDCe.
In a standalone setting we tend to set up the Hyper-V host as the time source for the guest PDCe. In a cluster setting we _always_ deploy a physical DC that holds all FSMO Roles and is the domain time authority.
Microsoft Cluster MVP
Add your voice to the tech community where 5M+ people just like you are talking about what matters.
|How to protect your VMs against Ransomware||400|
|Nano Server Creation – Part 1||148|
|What if your organization still prefers a network file share? Consider the updated Azure File Services||121|
|How to solve the problem of incorrect System Uptime being reported when a system has been up for a long time (approximately 50 days or more)||183|