My EE article Some Hyper-V Hardware & Software Best Practices has a lot of great information in it on performance and how we gear our systems for certain types of performance.
This article is intended to augment the above with how a VM or particular workload would be configured by us in a production setting and/or how a VM or particular workload would be expected to behave in a given setup. In other words, some of our real-world experiences are to be found here.
Note that this is a living article. It will get updated as I go along.
All standalone servers get hardware RAID with 10K SAS drives.
As a rule, our default VM deployment is set to 2 vCPUs for most average workloads.
Our default storage stack gets set up with 256KB block sizes or Interleave on Storage Spaces virtual disks.
We use RoCE (RDMA over Converged Ethernet) for our RDMA deployments for Scale-Out File Server (SOFS) to Hyper-V Compute clusters and for Storage Spaces Direct (S2D). As a rule, we deploy on Mellanox NICs and switches.
NOTE: Deploying RoCE is not for the faint of heart! It is a rather complex setup but once complete is very rewarding.
Our default setup for small to medium-large settings with a standalone virtualization host is one of the following setups.
By default we deploy 2.5" drives with one RAID 6 array. Two logical disks get set up on the array one for the host OS at 75GB and the balance for Hyper-V.
If we need more IOPS then we set up an all-flash solution based on the Intel DC S3520 series SSD. NVMe is an option when there is a need for it within the workloads such as a scratch disk, high IOPS TempDB, or rendering.
An RDS Broker/Gateway/Web VM we'd do 2 vCPUs with 2GB to 3GB of vRAM depending on service loads.
An RDS Session Host would get 2vCPUs in a small setting with 8GB vRAM. For more demanding user counts and apps we'd deploy more than one Session Host with at least 3-4 vCPUs and 4GB vRAM for the host and 512MB/User added in to start.
A Domain Controller would get 1 vCPU or 2 vCPUs and 2GB vRAM as they are not doing much.
File and Print VMs get 2 vCPUs and 2GB to 4GB vRAM.
Server services workloads can vary from memory hungry such as Exchange to IOPS intense for SQL online transactional databases.
For Exchange 2016 on Server 2016 there were some issues that I'm not sure are resolved as of yet (as of this writing 2018-01-05).
For Exchange 2013 running on 2012 R2 we could start at 2 vCPUs and 8GB vRAM for anything up to around 25 mailboxes.
Exchange 2013 and 2016 and any newer editions are very memory hungry. The Exchange team re-wrote the code for 2013 to allow for Exchange to run hundreds of mailboxes on a single SATA spinning disk (quote from the first Microsoft Exchange Conference in Orlando that I attended).
As a result, one needs some experience running Exchange in a virtualized environment so as to have a good idea of how many vCPUs and how much vRAM to assign to the VM or VMs if clustered.
There are two "styles" of Virtual Desktop Infrastructure (VDI) that ride in a RDS solution set.
VDI for desktop OS is a topic all to itself. Suffice it to say one needs to be mindful of the amount of IOPS required by each desktop OS VM, the apps running in-guest, and the user's role(s) with the company.
As far as RDS itself, there are a number of roles to deploy:
In the smallest of settings we would set up a single VM with all of the roles installed and configured though best practices recommends breaking things up a bit. EX-??? is the VM's name.
For all others we would deploy 2 VMs which is the best practice:
The general rule of thumb is 15-20 users per RD Session Host depending on applications being used.
For single silo applications that are not RemoteApp friendly, and yes they do exist as we have been working with one for over five years now as of this writing (2018-01-09), all of the RD Session Host rules are broken meaning we put all users and the LoB on _one_ RD Session host. In this case we need to be mindful of not only the LoB's needs but all of the other applications that would be running in the user's desktop session.
Browsers are resource hogs. Keep that in mind. Some offer performance tuning abilities while others just take what they can get and sometimes don't release the resources after they are closed.
For a Hyper-V host that will have a RDS Farm set up on it we'd look to the following setup:
If users are complaining about slowness then log on to the Session Host(s) and keep an eye on in-guest Disk Latency via ResMon (Resource Monitor). Anything with three digit milliseconds (ms) is bad with high two digit numbers showing I/O strain.
At the host level, keep an eye on the Disk Queue Length for the partition hosting the RDS VM's VHDX file(s) as well as for the VHDX hosting the User Profile Disks (UPDs). The disk subsystem is usually the first culprit in performance problems so one should not skimp on the disk setup.
SQL Server/Database Services
SQL and any database server service are very I/O dependent as well as CPU bound. We'd baseline the setup in its existing environment before putting together a virtualized version.
There is a huge caveat with virtualizing database workloads: A virtualization stack will almost never perform anywhere near as well as a physical server.
Rule of Thumb:
Storage cannot be emphasized enough. Knowing the workload's needs as far as IOPS, throughput, and latency is critical to providing a solution that will meet a client's/customer's needs today and five years down the road near the end of the solution's life.
The storage question does _not_ go away just because a given workload is being "put in the cloud". In fact, the storage question becomes all the more relevant as stories of on-premises workloads moved to the cloud then pulled back on-premises because there was just no way cloud could meet the performance requirements.
Our testing started many years ago under the guidance of a fellow MVP of the time (Tim Barrett). At the time and to this day we use IOmeter to thrash a proposed disk and storage system. The process we use discovers the numerous sweet spots we can find in IOPS and Throughput which have an inverse relationship to each other.
The 512n, 512e, and 4Kn drive setups underpinning the storage being used for the VMs and their workloads whether that storage is SAN, DAS, or Hyper-Converged must be a known commodity.
Rule of Thumb:
Using the default setting above an eight 2.5" 10K SAS drive set up in a RAID 6 array will perform at about 250-300 IOPS per disk and 800MB/Second mean throughput.
The above was run on a SAS SSD configured Simple Storage Spaces setup, which is the equivalent of RAID 0, utilizing a block and Interleave size less than 64KB. It turned out that the node we ran the tests on saturated the two 6Gbps SAS Cables. There were two SAS HBAs per node with one cable per HBA connected to an Intel JBOD2224S2DP JBOD.
A 12Gbps SAS setup will run an easy 700K IOPS via one node. A three-node SOFS cluster would easily run 2.1M IOPS. Microsoft has publicly published performance documentation that backs up our numbers. :)
Please take note, overcommitting the setup as far as virtual CPU goes is not as important as knowing the disk subsystem load. That has been and will always be the primary bottleneck in any virtualization setup until solid-state drives (SSDs) and/or NVMe (and interconnectivity fabrics) become the norm.
I do hope this article has helped provide a practical frame of reference. It doesn't matter if we are architecting for on-premises or for the cloud. Knowing our workload's performance needs across all systems, subsystems, and interconnect fabrics is critical to providing a solution that keeps our clients/customers happy for years to come.
We've published a number of PowerShell and CMD based guides with more to come.
Please check them out as they should be quite helpful.