Practical Hyper-V Performance Expectations

Philip ElderSenior Technical Architect - HA/Compute/Storage
CERTIFIED EXPERT
Philip is a Senior Technical Architect specializing in high availability solutions for SMB/SME businesses and hosting companies.
Published:
Updated:
Herein one will find an aggregate of some of my experience building and deploying virtualization stacks both in standalone, clustered Hyper-V, clustered Hyper-V with a Scale-Out File Server (SOFS) backend, and Storage Spaces Direct (S2D).

My EE article Some Hyper-V Hardware & Software Best Practices has a lot of great information in it on performance and how we gear our systems for certain types of performance. 


This article is intended to augment the above with how a VM or particular workload would be configured by us in a production setting and/or how a VM or particular workload would be expected to behave in a given setup. In other words, some of our real-world experiences are to be found here.


Note that this is a living article. It will get updated as I go along.


Some Defaults

All standalone servers get hardware RAID with 10K SAS drives.


As a rule, our default VM deployment is set to 2 vCPUs for most average workloads.


Our default storage stack gets set up with 256KB block sizes or Interleave on Storage Spaces virtual disks.


We use RoCE (RDMA over Converged Ethernet) for our RDMA deployments for Scale-Out File Server (SOFS) to Hyper-V Compute clusters and for Storage Spaces Direct (S2D). As a rule, we deploy on Mellanox NICs and switches.


NOTE: Deploying RoCE is not for the faint of heart! It is a rather complex setup but once complete is very rewarding.


Standalone Hyper-V

Our default setup for small to medium-large settings with a standalone virtualization host is one of the following setups.

  • 2-4 VMs ~2TB Data Max
  • 4-8 VMs ~8TB Data Max
    • Intel Server System R2208WFTZS or Dell PE R740
      • Dual Intel Xeon Gold or Silver
      • 192GB ECC DDR 2400 (6 channels x 2 CPUs = 12x 16GB Minimum)
      • Dual Intel Gigabit Quad Port NICs
        • Server has two 10GbE ports on board
      • 8x 10K SAS RAID 6
      • Dual Power Supply
      • Intel RMM, Dell iDRAC Enterprise
  • 8+ VMs Data Max ?
    • Intel Server System R2224WFTZS or Dell PE R740xd
      • Dual Intel Xeon Gold or Silver
      • 384GB ECC DDR 2400 (6 channels x 2 CPUs = 12x 32GB Minimum)
      • Dual Intel 10GbE Dual Port NICs
        • Server has two 10GbE ports on board
      • 16 to 24 10K SAS RAID 6
      • Dual Power Supply
      • Intel RMM, Dell iDRAC Enterprise

By default we deploy 2.5" drives with one RAID 6 array. Two logical disks get set up on the array one for the host OS at 75GB and the balance for Hyper-V.


If we need more IOPS then we set up an all-flash solution based on the Intel DC S3520 series SSD. NVMe is an option when there is a need for it within the workloads such as a scratch disk, high IOPS TempDB, or rendering.


VM Examples

An RDS Broker/Gateway/Web VM we'd do 2 vCPUs with 2GB to 3GB of vRAM depending on service loads.


An RDS Session Host would get 2vCPUs in a small setting with 8GB vRAM. For more demanding user counts and apps we'd deploy more than one Session Host with at least 3-4 vCPUs and 4GB vRAM for the host and 512MB/User added in to start. 


A Domain Controller would get 1 vCPU or 2 vCPUs and 2GB vRAM as they are not doing much.


File and Print VMs get 2 vCPUs and 2GB to 4GB vRAM.


Server Specific Workloads        

Server services workloads can vary from memory hungry such as Exchange to IOPS intense for SQL online transactional databases.


Exchange On-Premises or "Cloud"

For Exchange 2016 on Server 2016 there were some issues that I'm not sure are resolved as of yet (as of this writing 2018-01-05). 


For Exchange 2013 running on 2012 R2 we could start at 2 vCPUs and 8GB vRAM for anything up to around 25 mailboxes. 


  • For 25-50 mailboxes we'd bump that up to 10GB vRAM leaving 2 vCPUs
  • 75-100 mailboxes we'd go to 12GB vRAM to 16GB vRAM depending on mailbox sizes and total mailbox volume (GB/TB)


Exchange 2013 and 2016 and any newer editions are very memory hungry. The Exchange team re-wrote the code for 2013 to allow for Exchange to run hundreds of mailboxes on a single SATA spinning disk (quote from the first Microsoft Exchange Conference in Orlando that I attended).


As a result, one needs some experience running Exchange in a virtualized environment so as to have a good idea of how many vCPUs and how much vRAM to assign to the VM or VMs if clustered.


Remote Desktop Services (RDS)

There are two "styles" of Virtual Desktop Infrastructure (VDI) that ride in a RDS solution set.

  1. Virtualized desktop operating system environments (OSEs)
  2. Virtualized desktop sessions provided by RDS Session Host

VDI for desktop OS is a topic all to itself. Suffice it to say one needs to be mindful of the amount of IOPS required by each desktop OS VM, the apps running in-guest, and the user's role(s) with the company.


As far as RDS itself, there are a number of roles to deploy:

  1. RD Gateway
  2. RD Broker
  3. RD Web
  4. RD Session Host

In the smallest of settings we would set up a single VM with all of the roles installed and configured though best practices recommends breaking things up a bit. EX-??? is the VM's name.

  • VM: EX-RDS - 2 vCPUs & 4GB-8GB vRAM 
    • 4-6 Session Host Desktops

For all others we would deploy 2 VMs which is the best practice:

  1. VM0: EX-RDGBW - 2 vCPUs & 2GB-4GB vRAM
    • RD Gateway, Broker, Web
  2. VM1: EX-RDSH01+ - 2vCPUs & 8GB+ vRAM
    • Collection 1: Session Hosts 0 & 1 @ 3 vCPUs & 16GB vRAM
      • User's dedicated desktop environment
    • Collection 2: Session Hosts 2 & 3 @ 2 vCPUs & 8GB vRAM
      • Resource hungry LoB in its own Collection
      • Delivered to Collection 1 via RSS Group Policy

The general rule of thumb is 15-20 users per RD Session Host depending on applications being used.


For single silo applications that are not RemoteApp friendly, and yes they do exist as we have been working with one for over five years now as of this writing (2018-01-09), all of the RD Session Host rules are broken meaning we put all users and the LoB on _one_ RD Session host. In this case we need to be mindful of not only the LoB's needs but all of the other applications that would be running in the user's desktop session.


Browsers are resource hogs. Keep that in mind. Some offer performance tuning abilities while others just take what they can get and sometimes don't release the resources after they are closed.


For a Hyper-V host that will have a RDS Farm set up on it we'd look to the following setup:

  • 5-15 Users
    • 8x 10K SAS in RAID 6 on Host a minimum
    • 4 Core E3-1200 Series Xeon Okay
    • 64GB ECC Minimum
  • 15-25 Users
    • OPTION: Can run with E3-1270v6, 64GB ECC for no-growth settings
    • 8x 10K Hybrid SAS in RAID 6 (SSD & HDD)
    • Single Intel Xeon Scalable Silver or Gold
    • 96GB ECC (6x 16GB ECC)
  • 25-50 Users
    • 8x SATA SSD such as Intel DC S3520 Series in RAID 6
      • If GB/TB volume is such that 8 disks won't due move to 16 or 24
    • Dual Intel Xeon Scalable Silver or Gold
    • 192GB ECC (12x 16GB ECC)
  • 50+ Users
    • 8x or 24x SAS SSD such as HGST SS200 3DWPD (Drive Writes Per Day) in RAID 6
      • U.2 NVMe is an option with certain 2U platforms
    • Dual Intel Xeon Scalable Gold
    • 384GB ECC (12x 32GB ECC)


If users are complaining about slowness then log on to the Session Host(s) and keep an eye on in-guest Disk Latency via ResMon (Resource Monitor). Anything with three digit milliseconds (ms) is bad with high two digit numbers showing I/O strain. Anything about 15ms to 25ms would mean performance issues for database driven back ends like SQL, MySQL, PostGRE, and so on.

At the host level, keep an eye on the Disk Queue Length for the partition hosting the RDS VM's VHDX file(s) as well as for the VHDX hosting the User Profile Disks (UPDs). The disk subsystem is usually the first culprit in performance problems so one should not skimp on the disk setup.


SQL Server/Database Services
SQL and any database server service are very I/O dependent as well as CPU bound. We'd baseline the setup in its existing environment before putting together a virtualized version.


There is a huge caveat with virtualizing database workloads: A virtualization stack will almost never perform anywhere near as well as a physical server.


Rule of Thumb:

  • A smaller storage stack block size and Interleave in Storage Spaces yields more IOPS
  • Our default starting place is 64KB in size
  • Online Transactional Databases high IOPS with small write sizes block and Interleave sizes can be smaller 


Storage Specific Notes

Storage cannot be emphasized enough. Knowing the workload's needs as far as IOPS, throughput, and latency is critical to providing a solution that will meet a client's/customer's needs today and five years down the road near the end of the solution's life.


The storage question does _not_ go away just because a given workload is being "put in the cloud". In fact, the storage question becomes all the more relevant as stories of on-premises workloads moved to the cloud then pulled back on-premises because there was just no way cloud could meet the performance requirements.


Our testing started many years ago under the guidance of a fellow MVP of the time (Tim Barrett). At the time and to this day we use IOmeter to thrash a proposed disk and storage system. The process we use discovers the numerous sweet spots we can find in IOPS and Throughput which have an inverse relationship to each other.


The 512n, 512e, and 4Kn drive setups underpinning the storage being used for the VMs and their workloads whether that storage is SAN, DAS, or Hyper-Converged must be a known commodity. 


Rule of Thumb:

  • 256KB Block/Interleave will provide decent IOPS and/or Throughput (our default setting)
  • 64KB Block/Interleave will provide great IOPS and low Throughput
  • 1024KB Block/Interleave will provide low IOPS with great Throughput


Using the default setting above an eight 2.5" 10K SAS drive set up in a RAID 6 array will perform at about 250-300 IOPS per disk and 800MB/Second mean throughput.



The above was run on a SAS SSD configured Simple Storage Spaces setup, which is the equivalent of RAID 0, utilizing a block and Interleave size less than 64KB. It turned out that the node we ran the tests on saturated the two 6Gbps SAS Cables. There were two SAS HBAs per node with one cable per HBA connected to an Intel JBOD2224S2DP JBOD.


A 12Gbps SAS setup will run an easy 700K IOPS via one node. A three-node SOFS cluster would easily run 2.1M IOPS. Microsoft has publicly published performance documentation that backs up our numbers. :)


Conclusion

Please take note, overcommitting the setup as far as virtual CPU goes is not as important as knowing the disk subsystem load. That has been and will always be the primary bottleneck in any virtualization setup until solid-state drives (SSDs) and/or NVMe (and interconnectivity fabrics) become the norm.


I do hope this article has helped provide a practical frame of reference. It doesn't matter if we are architecting for on-premises or for the cloud. Knowing our workload's performance needs across all systems, subsystems, and interconnect fabrics is critical to providing a solution that keeps our clients/customers happy for years to come.


Resources

We've published a number of PowerShell and CMD based guides with more to come.

 


Please check them out as they should be quite helpful.


Philip Elder

Microsoft High Availability MVP

5
6,042 Views
Philip ElderSenior Technical Architect - HA/Compute/Storage
CERTIFIED EXPERT
Philip is a Senior Technical Architect specializing in high availability solutions for SMB/SME businesses and hosting companies.

Comments (6)

CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
I understand that and in that case, it is of course the best solution to do as you did before.
Sorry about that - may I ask you to revert to your previous version? (should be easy to copy and paste from your other HV article, I hope).
Philip ElderSenior Technical Architect - HA/Compute/Storage
CERTIFIED EXPERT
Most Valuable Expert 2021
Distinguished Expert 2023

Author

Commented:
No worries at all and no need to apologize.

Thanks for that. I'll get the content back in shortly.
Philip ElderSenior Technical Architect - HA/Compute/Storage
CERTIFIED EXPERT
Most Valuable Expert 2021
Distinguished Expert 2023

Author

Commented:
Thank you sir. :)
Timothy AlexanderControl System Engineer III

Commented:
What are some good heuristics when considering the ratio of total Guest vCPUs to Host Physical Cores(ignoring HT/SMT); 2:1, 3:1?
The more involved followup might be how to measure core contention on a Hyper-V Host

Also when does core count matter more vs frequency, I guess the answer is workload dependent.
20 Cores @ 2.5Ghz (50k Ghz)
16 Cores @ 3.7Ghz (59.2k Ghz)
Philip ElderSenior Technical Architect - HA/Compute/Storage
CERTIFIED EXPERT
Most Valuable Expert 2021
Distinguished Expert 2023

Author

Commented:
Microsoft used to have a set of values for VM counts depending on processor core count and GHz. Those recommendations have essentially been sidelined due to the fact that CPU is pretty much not the bottleneck anymore.

That being said, we go for GHz before core count as a rule because it's just that much faster to get the smaller vCPU count VM's threads through the CPU pipeline.

And finally, a wide core count would indeed be needed for multi-threaded applications like SQL to get through the pipeline as soon and as wide as possible.

View More

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.