Disabling VMQ on Hyper-V Parent Partition with 1 Gbit Broadcom NICs - Any Major Concerns?


I have a client running a Hyper-V 2008 R2 SP1 environment with around 15 VM's including (Exchange 2010, Lync 2010, and SQL Server 2008 R2) running on a single Dell R720xd server that is not currently clustered.  Please note that I inherited this environment and had no say in how it was set up, so I am not asking about HA, shared storage etc. as they are currently not in the client's budget.

However, I did notice that copying large files (e.g. a 30 GB database file) between two VM's running on this same Hyper-V parent partition did so at speeds of around 7 MBytes per second.  This seems very slow to me, and upon doing research about slow network speeds between Hyper-V VM's running on top of Broadcom NICs, I discovered that it can be beneficial to disable VMQs on the Broadcom NICs (which is enabled by default).  I am not one who likes to just disable default settings without a logical reason, and I realize that VMQs have some major performance benefits when implemented correctly.  However, my server has only 1 Gbit NICs in it, and the network traffic to even the most used VM's is very light in general.

My questions are as follows:

1.  Do I lose anything major by disabling this VMQ feature on all Broadcom NICs in the parent partition running Hyper-V 2008 R2 at 1 Gbit speeds? My client has a mix of Server 2003, 2003 R2, and Server 2008 R2 servers running on this Hyper-V server, some of which cannot even make user of VMQs if I am not mistaken.  And currently, the parent partition is not setup via the registry to use VMQs for either 10 GBit or 1 GBit NICs.

2.  When I disable this setting on the NICs of the parent partition, will I need to reboot the underlying Hyper-V server for the change to take effect?  This is critical for me since the client's entire infrastructure currently must be shut down for the parent partition to be rebooted (again, in my defense- I inherited this setup!! ;-)

Thanks in advance for any guidance you may be able to provide.

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Cliff GaliherCommented:
You can disable VMQ. It only starts making a performance difference at 4-5 GB/s on modern hardware. So you won't me losing anything.

With that said, you won't be gaining anything either.  Even if it is turned on via the NIC control panel or registry, server 2012 does not utilize it on 1Gb NICs. The only time it matters at all is if you are using the old buggy Broadcom driver. But that has long been fixed. Furthermore VMQ is a hardware improvement that speeds up how fast a physical NIC can hand stuff off. In a VM to VM copy, VMQ never comes into play. In a VM to VM copy, all traffic goes through the virtual switch and therefore never hits the part of the networking stack where VMQ runs.

To the final part of your question, yes, a reboot will be required.
KPI1Author Commented:
Hi Cliff,

Ok.  But to clarify my situation-

The two virtual machines are on different NICs that are connected to separate virtual switches (there is no teaming- each switch is attached to a single physical NIC), so the network traffic will have to pass out of one physical NIC through the switch and back into the other physical NIC.

Also, I am running Windows Server 2008 R2 Hyper-V and not 2012 or later.

I am not sure what else would explain such miserable network performance across the NICs in the parent partition.  There was so much talk about VMQ and Broadcom bugs out there that I thought it might be related.  

Not sure what else to look at.  The drivers are almost the latest versions released by Broadcom (late 2013) for the Broadcom NIC being used.  I would update these to the latest, but I have to shut everything down, which will require weekend maintenance.

The server itself was installed in March of this year and is brand new.  Do you have any other thoughts of what might cause this issue in my situation?


Cliff GaliherCommented:
Well, as I said, it won't hurt (as long as you do it in a maintenance window.) And yes, if you have a buggy driver, it *could* cause a problem. But I wanted to set expectations.

Now continuing on that idea of setting expectations, you are definitely running an odd topology. Your performance is not that out of line with what I'd expect. You have disks that have to read data during a copy. It has to go through one vswitch, using CPU cycles. Then it has to traverse the PCI bus to your NIC, adding latency. Then out and through a physical switch...adding latency (and switch performance matters!!!)...then back to another NIC, and along the *same* PCI bus, adding more latency, through *another* vswitch, consuming CPU cycles (which is compounded because it can't all be handled and prioritized by one process)...and back to another VM which, if the VHD is on the same set of disks as the first, will cause I/O writes and thus impact throughput.

Now, by far your biggest bottleneck will be disk I/O. Especially if those VHDs are on the same set of spindles. Very fast disks can read, if configured optimally,  40MB/s. But that's in a perfect world with a stellar RAID card and high quality SAS disks. Writing is usually less. Now, if two VHDs are on the same disk *and* you meet the above, your theoretical throughput just got cut in half. Let's be kind and say 15MB/s.

Now, let's add the overhead for SMB2 (2008 R2 comes into play here, 2012 would be better wtih SMB3.) There can't be a bypass because of your chosen topology. You are now down to 10-12MB/s. That's *just* SMB2 overhead at the OS level.

Now let's take into account network latency. Every time a NIC has to request data, there is a pause, and that'll pause disk I/O. Like waiting at a stop light on a 10 mile-an-hour road. Sure, when you are moving, you are going at 10 miles an hour. But when the light changes, all traffic that direction pauses. So NIC/layer-2/misc latency does decrease throughput. Take that 10-12 and you are looking at 7-10MB/s.

Now I used average numbers for a middling system with suboptimal configurations. If you are running multiple RAID 10 arrays with a big battery cache, and if you put each VM on a different array, and if you 8 cores, cat6, and a 10g Cisco switch, sure, you should be expecting more. Probably a lot more.

But for perspective, when that broadcom bug rears its head, 1MB/s is not uncommon. So 7 tells me that isn't likely the issue.

I'm not trying to discourage you from trying. I clearly made a lot of assumptions and fuzzy math. But I just don't want you assuming something *must* be broke and that VMQ is a magic bullet. This may just be what you can expect performance-wise.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
KPI1Author Commented:

Excellent! Thank you for your detailed analysis.  I agree that this configuration is an odd one, and as I said, I did not set it up.

I forgot about the overhead for SMB2 and the fact that both virtual disks are on the same RAID 5 array is not helping matters much.  I just ran a test copy between VM's on the same virtual switch and I am seeing somewhere in the neighborhood of 18 MB/s, so going through the switch is adding some serious latency.

I agree with you that the VMQ is most likely not the issue and won't be a magic bullet.  I just wanted to do my due diligence in investigating any possible misconfigurations that are easily corrected.

Update: I just re-tested copying a 9 GB file to the VM on a separate virtual switch (as originally described) and now I am seeing around 15 MB/s throughput which puts me right inline with what you were talking about concerning latency and overhead.  This is nearly twice as fast as my original tests, so I am satisfied for now.

Thanks again for your insight!

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2008

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.