Multi vCPU VM performance benchmark against performance ?

Hi All,

Can anyone here please share some information or benchmark result for VMware VMs where multiple vCPU performance VM is not always give linear performance ?

Because I need to know as to why building bigger VMs with more than 6x vCPU is not always faster than 3x 2 vCPU VMs.

Note: this is for SharePoint web server deployment consideration.

LVL 12
Senior IT System EngineerSenior Systems EngineerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization ConsultantCommented:
vSMP (virtual SMP) can affect virtual machine performance, when adding too many vCPUs to virtual machines that cannot use the vCPUs effectly, e.g. Servers than can use vSMP correctly :- SQL Server, Exchange Server.

This is true, many VMware Administrators, think adding lots of processors, will increase performance - wrong! (and because they can, they just go silly!). Sometimes there is confusion between cores and processors. But what we are adding is additional processors in the virtual machine.

So 4 vCPU, to the VM is a 4 Way SMP (Quad Processor Server), if you have Enterprise Plus license you can add 8, (and only if you have the correct OS License will the OS recognise them all).

If applications, can take advantage e.g. Exchange, SQL, adding additional processors, can/may increase performance.

So usual rule of thumb is try 1 vCPU, then try 2 vCPU, knock back to 1 vCPU if performance is affected. and only use vSMP if the VM can take advantage.

Example, VM with 4 vCPUs allocated!

My simple laymans explaination of the "scheduler!"

As you have assigned 4 vCPUs, to this VM, the VMware scheulder, has to wait until 4 cores are free and available, to do this, it has to pause the first cores, until the 4th is available, during this timeframe, the paused cores are not available for processes, this is my simplistic view, but bottom line is adding more vCPUs to a VM, may not give you the performance benefits you think, unless the VM, it's applications are optimised for additional vCPUs.

See here

see here

also there is a document here about the CPU scheduler

I'll also find, another good reference as well, which includes Hyperthreading....
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization ConsultantCommented:
okay here are the articles, which I've listed in a different post

These VMware blogs discuss, over committing vCPUs in VMs, on host servers, with lack of physical cores, and also when hyperthreading is enabled on the host.

The Conclusions "state approx linear scaling" but performance does not double,

e.g. 4 vCPU double 2 vCPU etc, as you can see by the graphs.

CPU Scheduling, wait types, over subscription of CPU versus physical cores, all have impact. So do not assume, 6 vCPU will yield 3x performance better as 2 vCPU, and it could be slower.....
Multiple CPUs will never give linear performance.
1) Amdahl's law - ther is some serialized code in algorithm
2) CPU turbo mode - 16 cores will not use it 1st core will, and frequency is reduced gradually.
3) vmware adds some noise to environment too, especially if you make beefy machine trying to take over all resources.
Price Your IT Services for Profit

Managed service contracts are great - when they're making you money. Yes, you’re getting paid monthly, but is it actually profitable? Learn to calculate your hourly overhead burden so you can master your IT services pricing strategy.

Senior IT System EngineerSenior Systems EngineerAuthor Commented:
Thanks guys, So regarding the [b]CPU sockets vs. CPU cores ratio[/b] or comparison ?

as per this article: 

4 sockets 1 core
2 sockets 2 cores
1 socket 4 cores

it is always receommended to start from [b][i]multiple sockets single core[/i][/b] as the default VM deployment stadnard ?
or in the example above 4 sockets 1 core.
Senior IT System EngineerSenior Systems EngineerAuthor Commented:
See the image screenshot below:

Does that means as long as one big VM is not assigned 16 cores total, then it is still within the vNUMA limit ?

Correct me if I'm wrong.
You can hotplug 1 core per socket CPUs until you reach 16
After that only next working combination is 2 sockets with 16 cores each

vNUMA is enabled in guests that have multicore virtual sockets, and they are wasting time on hierarchical scheduling when they have flat topology.

You can have up to 64 vCPUs per CPU core, you will start to notice degraded performance if it is much more than 2 CPU-demanding vCPUs fighting for single CPU.
Senior IT System EngineerSenior Systems EngineerAuthor Commented:
Flat as in multiple sockets single core ?
That is the only way you can hot-add CPUs to running machine without enabling unnecessary numa hierarchy in guest.
6378      16      2.4GHz      3.3GHz      16MB       115W      G34

All 16 cores will have performance of
16x2.4/3.3 = 11.(63) 1st cores

I personally doubt you will be able to use 2nd such powerful CPU without that 2x16 vm frying at full speed.
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization ConsultantCommented:
Always assign sockets not cores.

and if you are building a super monster VM, do not use more vCPUs that physical cores you have.

e.g. ignore hyperthreading, which doubles it, as per the documents. as the assumption your monster VM will be heavily loaded.
And inside super monster VM one must partition task so that parts exactly fit in numa node each. The interconnect between numa nodes id 2-10x slower than CPU local memory.
If the machine is using nearly all of that monster CPUs for long time better you run guest OS on bare hardware to get that 5% of performance vmware takes away.
Your EVC masks out
SSSE3, SSE4.1, AES, AVX, XSAVE, XOP, and FMA4 FMA, TBM, BMI1, and F16C
Please rise it to get extra instruction sets and extra performance.
You need ESXi and vcenter 5.1 for that.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Senior IT System EngineerSenior Systems EngineerAuthor Commented:
Hi All,

Does vCPU ready VMs can be resolved or reduce by moving or grouping those 4 vCPU & 8 vCPU VMs into its own DRS cluster so that there is no single or 2x vCPU is in them ?
How that relates to initial problem statement?
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization ConsultantCommented:
Could you expand on that statement.
Senior IT System EngineerSenior Systems EngineerAuthor Commented:
I'll post another thread.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.