Solved

Hyper-V vNIC not responding

Posted on 2015-01-08
10
474 Views
Last Modified: 2015-01-13
Hello guys, I have a problem that Ive been pulling my hair out for quite some time.

What we have setup is a Windows Failover Cluster setup across a number of blade servers. Each server has Windows Server 2012R2 installed on it and the Hyper-V and failover clustering roles installed. Each server has 4 physical network interfaces, two HP NC373i Integrated and two HP NC373m Mezzanine. I only have 2 of 4 physical NICs connected to our switch at this time they are switch independently teamed through Windows. On top of this we have eight virtual NICs connected to a Hyper-V virtual switch:

Access/Management, Cluster network, Migration network, Replica network, and four SMB data transfer networks (for accessing VHDs on a storage server)

We have each virtual NIC on a separate VLAN and they all have statically assigned IP addresses. Occasionally, one of the vNICs will stop working and we will lose Live Migration on that blade, or it may lose communication with the cluster depending on which virtual interfaces have failed.

Ive tried updating the drivers on both sets of physical NICs, reflashing the firmware, turning on/off certain subsystems such as VMQ or RSC etc but nothing has solved this. The interesting thing to note is that if I toggle VMQ on or off it sometimes caused the affected NICs to start responding again, but only for a limited time. I should mention no where in the Network Connections does it state these NICs are malfunctioning or disconnected, it does however list it in Event Viewer as a clustering failure.

edit: when I say the NIC is not responding Im meaning the other hosts can not ping it even though it should. Yes Firewall is off
0
Comment
Question by:Lumenix
  • 5
  • 4
10 Comments
 
LVL 56

Assisted Solution

by:Cliff Galiher
Cliff Galiher earned 250 total points
ID: 40538613
1)  Disable VMQ. There is no use for it on gigabit adapters.

2) Grab updated Broadcom drivers. HP has been sadly very slow on updating drivers, and Broadcom is (also sadly) regularly subpar on driver quality. So combine the two, and you have a bad Broadcom driver that they have (probably) fixed but that HP hasn't rebranded and re-released yet.  Personally, I'd go with Intel, but depending the blades you chose, that may not be an option.

3) Make sure you've implemented reasonable QoS settings on your various vNICs. Otherwise the virtual switch won't prioritize packets and you can eventually end up with a vNIC feeling starved, even after load has resumed "normal" low levels. That's the nature of dynamic teaming and running a converged network. QoS is mandatory in such a setup to ensure no single vNIC can crash the others.

-Cliff
0
 

Author Comment

by:Lumenix
ID: 40538907
Thanks for suggestion Cliff, Ive tried turning off VMQ on my nics using the command.

Get-NetAdapterVmq | Disable-NetAdapterVmq

This hasnt provided a permanent fix unfortunately. Ive also removed all vNICs and recreated them. The same ones are not replying to pings after this either. I checked and updated the Broadcom drivers as well and reflashed the firmware on all four NICs and of course including a reboot. Still the same problem.
0
 
LVL 38

Expert Comment

by:Philip Elder
ID: 40538968
With Cliff. Broadcom requires VMQ to be disabled on _all_ physical NIC ports that run at Gigabit speeds.

Check to see if there is a firmware update for the NICs as well.

In this scenario we would:
 Team 1: Port 0 on each: Management (VLAN for services if required)
 Team 2: Port 1 on each: vSwitch (not shared with OS) (VLAN for VMs via Hyper-V vNIC Properties)

Philip
0
 

Author Comment

by:Lumenix
ID: 40544359
Thanks for the suggestions. I have checked and it seems VMQ is not enabled on the NIC Team but the problem persists. Is there some other way to disable it instead of in Powershell?

Ive managed to fix the problem by using a single NIC instead of a teamed one, however then we lose fault tolerant networking to the blade. Im trying to see if I can use the Broadcom utility (BASC) to configure a NIC team and see if the problem persists there. If any of you have further suggestions let me know please!
0
 
LVL 38

Expert Comment

by:Philip Elder
ID: 40544707
Not the team. The ports.

Click Start --> ncpa.cpl --> pNIC Properties --> Advanced --> Virtual Machine Queues (VMQ) --> Set DISABLED.

Do that for all physical NICs.
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 

Author Comment

by:Lumenix
ID: 40545346
Interestingly enough that option isnt there. The model of NIC is HP NC373i and NC373m. On our HP G6 blades, which Broadcom BCM57711e 10Gbe the option is there but I have not had the VMQ issue on these blades yet.
0
 
LVL 38

Expert Comment

by:Philip Elder
ID: 40545359
10GbE works fine with VMQ. It is on 1Gb connections that things get munged.
0
 

Author Comment

by:Lumenix
ID: 40547004
Alright, Ive done a fresh OS install and configured the vNICs using the BASC team instead of Windows software teaming. Everything works fine for now Ill maybe update later on for those who stumble across this post in the future. I do have one more thing to ask however. It looks like the Physical NICs Im using do not support VMQ anyways since there is no option to turn it on or off. However when running Get-NetAdapterVMQ is shows the NIC team (BASC) as using VMQ...

I try to disable it in Powershell and it tell me it cannot set the property to disabled, any ideas?
0
 
LVL 38

Accepted Solution

by:
Philip Elder earned 250 total points
ID: 40547065
The Broadcom management software may expose those settings.

If the actual physical NIC port does not show them then perhaps they are not supported at all as you say. If that is the case then the OS setting should be meaningless anyway.
0
 

Author Closing Comment

by:Lumenix
ID: 40547388
Wasnt the actual solution was was very valuable info for this issue. Thanks guys
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

What to do when Windows Update is not working correctly? What tools can I use to detect the cause of the malfunction problem? What does this numeric error code mean? These and other questions that you have been asking in the past are answered here (…
The recent Microsoft changes on update philosophy for Windows pre-10 and their impact on existing WSUS implementations.
This tutorial will walk an individual through the process of configuring basic necessities in order to use the 2010 version of Data Protection Manager. These include storage, agents, and protection jobs. Launch Data Protection Manager from the deskt…
This tutorial will walk an individual through the process of transferring the five major, necessary Active Directory Roles, commonly referred to as the FSMO roles from a Windows Server 2008 domain controller to a Windows Server 2012 domain controlle…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now