Solved

BSOD on Hyper-V 2012 R2 Cluster Node related to VMQ

Posted on 2014-11-07
4
783 Views
Last Modified: 2014-11-08
Windows Server 2012 r2
NICs:
- Model: I350-T2 x 2 ( 4 x 1gbit)
- Driver provider: Intel
- Driver version: 12.11.97.1  (13-08-2014)
- Teamed in one team (Switch Independent/Dynamic)
Storage: HBAs to fibre SAN
Faulting proces: Clussvc.exe
Bugcheck analysis: DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
Probably caused by : MsLbfoProvider.sys ( MsLbfoProvider!vmqcGetFirstMappedMNic+f )

Very similar to this thread in that

1.

Dump characteristics look very similar

2.

BSOD happens during LIvemigration

3.

VMQ is enabled (when turned off - no BSOD)Whats not simliar in the situation described in that link is that KB2887595 is not installed on my system. My Mslbfoprovider.sys version is higher (6.3.9600.17326) than was introduced with the KB2887595 update (6.3.9600.16423 / 08-Oct-2013).

Dump analysis
DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000028, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff800eb184773, address which referenced memory

Debugging Details:
------------------


READ_ADDRESS: GetPointerFromAddress: unable to read from fffff800fc1d5138
unable to get nt!MmNonPagedPoolStart
unable to get nt!MmSizeOfNonPagedPoolInBytes
 0000000000000028

CURRENT_IRQL:  2

FAULTING_IP:
MsLbfoProvider!vmqcGetFirstMappedMNic+f
fffff800`eb184773 66394228        cmp     word ptr [rdx+28h],ax

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT_SERVER

BUGCHECK_STR:  AV

PROCESS_NAME:  clussvc.exe

ANALYSIS_VERSION: 6.3.9600.17237 (debuggers(dbg).140716-0327) amd64fre

TRAP_FRAME:  ffffd00188197ea0 -- (.trap 0xffffd00188197ea0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=ffffe0003bddb250
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff800eb184773 rsp=ffffd00188198038 rbp=ffffe0003b197580
 r8=ffffd00188198090  r9=ffffd001881980a0 r10=ffffe8000c3016d0
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
MsLbfoProvider!vmqcGetFirstMappedMNic+0xf:
fffff800`eb184773 66394228        cmp     word ptr [rdx+28h],ax ds:00000000`00000028=????
Resetting default scope

LAST_CONTROL_TRANSFER:  from fffff800fbfcfce9 to fffff800fbfc41a0

STACK_TEXT:  
ffffd001`88197d58 fffff800`fbfcfce9 : 00000000`0000000a 00000000`00000028 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
ffffd001`88197d60 fffff800`fbfce53a : 00000000`00000003 ffffe000`3a93b350 ffffe000`3b0b1900 ffffd001`88197ea0 : nt!KiBugCheckDispatch+0x69
ffffd001`88197ea0 fffff800`eb184773 : fffff800`eb183e42 ffffe000`3bdddc70 00060000`2c000000 000007d8`00000000 : nt!KiPageFault+0x23a
ffffd001`88198038 fffff800`eb183e42 : ffffe000`3bdddc70 00060000`2c000000 000007d8`00000000 ffffe000`00000000 : MsLbfoProvider!vmqcGetFirstMappedMNic+0xf
ffffd001`88198040 fffff800`e96e5d50 : 00000000`00000000 ffffe800`0c3016d0 ffffe800`00000000 00000000`00000002 : MsLbfoProvider!LbfoReceiveNetBufferListsComplete+0x92
ffffd001`88198090 fffff800`e96e5c2d : ffffe000`00000001 ffffe000`3b15d040 ffffe000`00000018 ffffd001`028af79c : NDIS!ndisInvokeNextReceiveCompleteHandler+0x30
ffffd001`88198120 fffff800`e96e5cf2 : ffffe000`3abea1a0 00000000`00000002 ffffe800`0c3016d0 ffffe000`3b0e5330 : NDIS!ndisReturnNetBufferListsInternal+0x12d
ffffd001`88198180 fffff800`eaa83a01 : 00000000`00000000 00000000`00000000 ffffe000`3d0ff4b0 00000000`00000000 : NDIS!NdisReturnNetBufferLists+0x72
ffffd001`881981e0 fffff800`eaa8336b : ffffe000`3c566000 ffffd001`88190001 ffffe800`0c3016d0 fffff800`00000001 : vmswitch!VmsPtNicPvtPacketRouted+0x1b1
ffffd001`88198270 fffff800`eaa8303e : 00000000`00000000 00000000`00000000 ffffe800`0c3016d0 ffffd001`88198290 : vmswitch!VmsExtIoPacketRouted+0x28b
ffffd001`88198320 fffff800`e96ec245 : ffffd001`881983d9 00000000`00000000 00000000`00000000 ffffe800`0b5187c0 : vmswitch!VmsExtPtSendNetBufferListsComplete+0x9e
ffffd001`88198370 fffff800`e96ed617 : ffffe000`3ab331a0 ffffe800`0c3016d0 006e0017`00000004 035c0034`00360074 : NDIS!ndisMSendCompleteNetBufferListsInternal+0x135
ffffd001`88198440 fffff800`eaa833bd : 00000000`00000001 ffffe800`0c3016d0 ffffe800`0dc6cd70 fffff800`ea3512c5 : NDIS!NdisFSendNetBufferListsComplete+0x2b7
ffffd001`88198570 fffff800`e96eceb3 : ffffe000`00000000 00000000`00000000 ffffd001`881985c8 00000000`00000000 : vmswitch!VmsExtIoPacketRouted+0x2dd
ffffd001`88198620 fffff800`eaa833a5 : ffffe000`3ab331a0 ffffe800`0c3016d0 ffffe000`3aaed000 ffffe800`0b5187c0 : NDIS!NdisMSendNetBufferListsComplete+0x703
ffffd001`88198790 fffff800`eaa82f62 : 00000000`00000000 00000000`00000000 ffffd001`881987e8 00000000`00000000 : vmswitch!VmsExtIoPacketRouted+0x2c5
ffffd001`88198840 fffff800`e96e5d50 : ffffe000`3ab331a0 ffffe800`0c3016d0 00000000`00000000 ffffe800`0b5187c0 : vmswitch!VmsExtMpReturnNetBufferLists+0x42
ffffd001`88198870 fffff800`e96f58c3 : ffffe118`a67806b0 ffffe000`3b18e002 ffffe000`3d149c70 ffffe000`3b145200 : NDIS!ndisInvokeNextReceiveCompleteHandler+0x30
ffffd001`88198900 fffff800`eaa833d5 : ffffe000`3afed670 00000000`00000000 ffffd001`88199830 fffff800`fbfcf9b3 : NDIS!NdisFReturnNetBufferLists+0x282
ffffd001`88198980 fffff800`e96e5d50 : ffffe000`00000000 00000000`00000000 ffffd001`881989d8 00000000`00000000 : vmswitch!VmsExtIoPacketRouted+0x2f5
ffffd001`88198a30 fffff800`e96e5c2d : 00001f80`0010000f 0053002b`002b0010 00000246`0018002b 00000000`00000018 : NDIS!ndisInvokeNextReceiveCompleteHandler+0x30
ffffd001`88198ac0 fffff800`e96e5cf2 : ffffe000`3ab331a0 00000000`00000004 ffffe800`0c3016d0 ffffe000`3affcc10 : NDIS!ndisReturnNetBufferListsInternal+0x12d
ffffd001`88198b20 fffff800`eaa833ed : ffffd001`88198be0 ffffd001`88198bc9 ffffd001`88198bc0 0000001c`f3d88257 : NDIS!NdisReturnNetBufferLists+0x72
ffffd001`88198b80 fffff800`eaa82da6 : ffffd001`88198eb0 00000000`00000000 ffffd001`88198bd8 00000000`00000000 : vmswitch!VmsExtIoPacketRouted+0x30d
ffffd001`88198c30 fffff800`eaa82ce9 : 00000000`00000000 ffffe000`3ab5f1a0 ffffe000`3afd4000 00000000`00000286 : vmswitch!VmsNblHelperRefCountDecrementMany+0x46
ffffd001`88198c70 fffff800`e96e5d50 : ffffe000`3a93b350 ffffe000`3ab5f1a0 00000000`00000000 00000000`00000000 : vmswitch!VmsMpNicReturnNetBufferLists+0x19
ffffd001`88198ca0 fffff800`e96e5c2d : 00000000`00000000 00000000`0000001a ffffe800`0d16f801 fffff800`fbe0b58f : NDIS!ndisInvokeNextReceiveCompleteHandler+0x30
ffffd001`88198d30 fffff800`e96e5cf2 : ffffe000`3ab5f1a0 00000000`00000000 ffffe800`0c3016d0 ffffe000`3b15ac10 : NDIS!ndisReturnNetBufferListsInternal+0x12d
ffffd001`88198d90 fffff800`ea2e08f6 : ffffe800`0c301830 00000000`00000000 00000000`00000000 ffffff04`04040000 : NDIS!NdisReturnNetBufferLists+0x72
ffffd001`88198df0 fffff800`e98a58c1 : ffffe800`0c3016d0 00000000`00000001 00000000`00000000 fffff800`e98be7b0 : tcpip!FlpReturnNetBufferListChain+0xab
ffffd001`88198e40 fffff800`ea3e4333 : ffff2a6f`c59bde1d ffffe000`3a99f670 00000000`00000001 ffffe000`3a89bb00 : NETIO!NetioDereferenceNetBufferList+0xc1
ffffd001`88198ec0 fffff800`ea3e4431 : ffffe000`3a99f670 ffffe800`0a843180 ffffe800`0d131900 fffff800`e9c0756a : tcpip!UdpEndMessageIndication+0x1f
ffffd001`88198ef0 fffff800`e9c02752 : 00000000`00000000 00000000`00000001 ffffe800`0ae99070 ffffd001`88199108 : tcpip!UdpTlProviderReleaseIndicationList+0xd
ffffd001`88198f20 fffff800`e9c179dd : ffffd001`88199b00 00000000`00000000 00000000`00000000 fffff800`fbe0b69a : afd!AfdTLReleaseIndications+0x32
ffffd001`88198f70 fffff800`eafeb266 : ffffe000`3c642010 ffffe800`0d131900 ffffd001`8888d180 00000000`00000000 : afd!WskProAPIReleaseD+0x45
ffffd001`88198fa0 fffff800`eafe19c5 : 00000000`00000000 ffffe000`3d9412b0 00000000`00000000 ffffe000`3c642010 : netft!TaReturnPacket+0x46
ffffd001`88198fe0 fffff800`e96e5d50 : ffffe000`3d9412b0 00000000`00000000 00000000`00000000 ffffe000`3c642010 : netft!NetftMiniportReturnNetBufferLists+0x131
ffffd001`88199050 fffff800`e96e5c2d : ffffd001`88199158 00000000`00000001 ffffe800`0ae99070 00000000`00000000 : NDIS!ndisInvokeNextReceiveCompleteHandler+0x30
ffffd001`881990e0 fffff800`e96e5cf2 : ffffe000`3ab5b1a0 00000000`00000000 ffffe000`3d9412b0 ffffe000`3b171890 : NDIS!ndisReturnNetBufferListsInternal+0x12d
ffffd001`88199140 fffff800`ea2e08f6 : ffffe000`3d941410 00000000`00000000 00000000`00000000 00000000`00000000 : NDIS!NdisReturnNetBufferLists+0x72
ffffd001`881991a0 fffff800`e98a51f3 : 00000000`00000000 ffffe000`00000001 00000000`00000001 00000000`00000000 : tcpip!FlpReturnNetBufferListChain+0xab
ffffd001`881991f0 fffff800`ea2d649c : 00000000`00000000 00000000`00000100 ffffe000`3b1a5730 ffffe000`3d9412b0 : NETIO!NetioDereferenceNetBufferListChain+0xd3
ffffd001`88199290 fffff800`e9c02752 : ffffe000`3a812340 ffffe000`3d9412b0 00000000`00000128 ffffd001`88199908 : tcpip!TcpTlProviderReleaseIndicationList+0x84
ffffd001`881992c0 fffff800`e9c4346a : 00000000`00000000 ffffd001`88199908 00000000`ffffffff 00000000`00000000 : afd!AfdTLReleaseIndications+0x32
ffffd001`88199310 fffff800`e9c49ebe : 0000003d`8650f028 ffffe000`3a911790 00000000`00000128 0000003d`85775358 : afd!AfdReturnBuffer+0x10a
ffffd001`88199340 fffff800`e9c2d827 : 00000000`00000000 00000000`00000000 00000000`0000001b ffffd001`881995e0 : afd!AfdFastConnectionReceive+0x4de
ffffd001`88199510 fffff800`fc238c5c : 00000000`00000000 00000000`00000000 ffffe800`0d131900 00000000`00000004 : afd!AfdFastIoDeviceControl+0x817
ffffd001`88199880 fffff800`fc23aa76 : 00000000`00000001 0000003d`86c3e2f0 00000000`00000000 0000003d`8650f048 : nt!IopXxxControlFile+0x54c
ffffd001`88199a20 fffff800`fbfcf9b3 : ffffe800`0b786080 0000003d`8650ed08 ffffd001`88199aa8 fffff800`fc264366 : nt!NtDeviceIoControlFile+0x56
ffffd001`88199a90 00007fff`3f8716ea : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
0000003d`8650ede8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007fff`3f8716ea


STACK_COMMAND:  kb

FOLLOWUP_IP:
MsLbfoProvider!vmqcGetFirstMappedMNic+f
fffff800`eb184773 66394228        cmp     word ptr [rdx+28h],ax

SYMBOL_STACK_INDEX:  3

SYMBOL_NAME:  MsLbfoProvider!vmqcGetFirstMappedMNic+f

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: MsLbfoProvider

IMAGE_NAME:  MsLbfoProvider.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  53fbefc0

IMAGE_VERSION:  6.3.9600.17326

BUCKET_ID_FUNC_OFFSET:  f

FAILURE_BUCKET_ID:  AV_MsLbfoProvider!vmqcGetFirstMappedMNic

BUCKET_ID:  AV_MsLbfoProvider!vmqcGetFirstMappedMNic

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:av_mslbfoprovider!vmqcgetfirstmappedmnic

FAILURE_ID_HASH:  {3deba9b0-acee-a14b-f78d-79292ecc34f5}

.........................................................
Lines excluded
.........................................................
0
Comment
Question by:jmateknik
  • 2
  • 2
4 Comments
 
LVL 56

Expert Comment

by:Cliff Galiher
ID: 40429281
Turn off VMQ.  Windows will not use VMQ on 1GB NICs even when it is enabled. Internally it simply turns them off anyways. Neither RSS nor VMQ offer better performance until around the 3GB/s mark (depending on CPU) so the team made the decision to internally handle it.

For some drivers, the way Windows turns off VMQ causes a memory conflict, hence the clash. This is the first time I've heard of it with Intel, but is a common problem with Broadcom NetExtreme (but not NetXtreme II) NICs.

So disabling VMQ via the GUI sidesteps the bugs *and* has no negative impact whatsoever. Easy solution.
0
 
LVL 1

Author Comment

by:jmateknik
ID: 40430003
Thanks again Cliff - glad youre still offering your help after the last thread :)

So - in your response to my last question I did understand that VMQ is disabled internally on systems with 1 gbit adapters.
Its just that I have 2 more NICs that could be thrown into the mix if need be (I should have mentioned that). In that case were over the 3.5 gbit limit and VMQ suddenly makes sense or maybe 10 gbit adapters are put in the nodes and I would then have to face the challenges on a production system (it will be a production system within two weeks and I will disable VMQ).

Until then I am playing with BelowTenGigVmqEnabled set to 1 and CPU-mapping with Set-NetAdatperVmq and am facing this problem.
0
 
LVL 56

Accepted Solution

by:
Cliff Galiher earned 500 total points
ID: 40430031
VMQ is, by definition, per adapter. It basically maps packets to different cores. Three adapters *without* VMQ (even in a team) can already map to three different cores so VMQ offers no benefit. VMQ alleviates a bottleneck when a single NIC can receive more data than a single core can process. Which with modern CPUs is *about* 3Gb/s (fatter cores are even higher.

So no, even if you had 20-1Gb NICs all in a team, you wouldn't need VMQ. Disable it.

As far as putting 10Gb NICs in a node, of you did that, you'd be using different drivers since it is different hardware. As I said, this is a known issue with some drivers just not handling the OSes internal disabling of VMQ properly, so how it maps to multiple cores breaks down. With a 10Gb NIC, the OS won't attempt to disable VMQ internally so the driver doesn't mismanage VMQ so the error never occurs.

So really, unless you are doing it for academic reasons, there is no compelling reason to try and make VMQ work in your current environment, and there is no reason to believe it'd fail in a changed environment where you'd actually want it.
0
 
LVL 1

Author Closing Comment

by:jmateknik
ID: 40430496
Could you come and work with me in my company?

:-)

VMQ is disabled and stays so. I appreciate your explanation to the "What if we get 10gbit" scenario - great help/much appreciated!

Anders
0

Featured Post

Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

Join & Write a Comment

Even if you have implemented a Mobile Device Management solution company wide, it is a good idea to make sure you are taking into account all of the major risks to your electronic protected health information (ePHI).
OfficeMate Freezes on login or does not load after login credentials are input.
In this Micro Tutorial viewers will learn how to restore their server from Bare Metal Backup image created with Windows Server Backup feature. As an example Windows 2012R2 is used.
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now