asked on

Windows 2008 R2 hyper-V cluster node bluescreen assistance - BAD_POOL_HEADER

Hello,

I have a 4 node Server 2008 R2 failover cluster running hyper-v servers. Over the weekend we had a bluescreen out of no where, the cluster has been relatively solid. This is the first time I have seen this and was hoping that I could be pointed in the right direction. I did google and find a technet forum thread about something that seems to match the issue, but I wanted to get some other eyes on this. This cluster is an important part of our data center so I wanted to be extra careful before hotfixing a node and regretting it later. Thank you for any assistance you can provide.

hardware: ProLiant BL480c G1

This is the forum thread I have found so far:
http://social.technet.microsoft.com/Forums/en-US/7dab2216-f22a-433f-bf61-a7eeb2aa4adf/blue-screen-badpoolheader?forum=winservergen

-------------------------

Below is the minidump info:

*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

BAD_POOL_HEADER (19)
The pool is already corrupt at the time of the current request.
This may or may not be due to the caller.
The internal pool links must be walked to figure out a possible cause of
the problem, and then special pool applied to the suspect tags or the driver
verifier to a suspect driver.
Arguments:
Arg1: 0000000000000020, a pool block header size is corrupt.
Arg2: fffffa8023161000, The pool entry we were looking for within the page.
Arg3: fffffa8023161340, The next pool entry.
Arg4: 000000000c340000, (reserved)

Debugging Details:
------------------

BUGCHECK_STR: 0x19_20

POOL_ADDRESS: fffffa8023161000

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: DRIVER_FAULT_SERVER_MINIDUMP

PROCESS_NAME: WmiApSrv.exe

CURRENT_IRQL: 1

IRP_ADDRESS: fffffa8023160fc8

LAST_CONTROL_TRANSFER: from fffff800019ec6d3 to fffff800018b9880

STACK_TEXT:
fffff880`08e62648 fffff800`019ec6d3 : 00000000`00000019 00000000`00000020 fffffa80`23161000 fffffa80`23161340 : nt!KeBugCheckEx
fffff880`08e62650 fffff800`018d8cce : 00000000`a0000003 fffffa80`236bb230 00000000`20206f49 00000000`00000000 : nt!ExFreePoolWithTag+0x18b4
fffff880`08e62700 fffff800`018bc276 : fffffa80`23161040 00000000`00000000 00000000`00000001 fffff8a0`023dc810 : nt!IopCompleteRequest+0x5ce
fffff880`08e627d0 fffff800`01b4622a : fffffa80`1f3146a0 fffff800`019ed400 fffffa80`1e937dc0 00000000`00000000 : nt!IopfCompleteRequest+0x6f6
fffff880`08e628c0 fffff800`01bd08f7 : fffffa80`1f3146a0 fffff880`08e62ca0 fffff880`08e62ca0 fffffa80`236bb230 : nt!WmipIoControl+0xd6
fffff880`08e62a10 fffff800`01bd1156 : 00000000`ffffff01 00000000`000001b8 00000000`00000000 00000000`00000000 : nt!IopXxxControlFile+0x607
fffff880`08e62b40 fffff800`018b8ad3 : fffffa80`1bfc2d80 00000000`00000000 00000000`00000000 fffff800`018b5487 : nt!NtDeviceIoControlFile+0x56
fffff880`08e62bb0 00000000`77bdf72a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
00000000`00d8eee8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x77bdf72a

STACK_COMMAND: kb

PROCESS_OBJECT: fffffa801ce56060

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: WmiApSrv

IMAGE_NAME: WmiApSrv.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 0

FAILURE_BUCKET_ID: X64_0x19_20_IMAGE_WmiApSrv.exe

BUCKET_ID: X64_0x19_20_IMAGE_WmiApSrv.exe

Followup: MachineOwner
---------

ASKER CERTIFIED SOLUTION

Philip Elder

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

TNCIT

ASKER

Hi Philip!

Thank you for your response. The answer to your questions are as follows:

Its pretty old. It used to be a 3 blade cluster with identical hardware. A newer generation blade was added as the 4th node later on.

I will admit, the firmware on the chassis and the drivers are all pretty old. For something as critical as this cluster, we kind of have a 'if its not broke, dont fix it' mentality. This did cross my mind though, I just didnt want to go chasing a white rabbit that might lead me down another hole entirely. I wanted a little more evidence before using the shotgun blast appraoch of updating firmware/drivers accross the board with no direction. If that makes sense.

We do have the nodes at the same service pack and update levels across cluster.

The nodes do not run anything else except our Hyper-V cluster.

SOLUTION

Philip Elder

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

TNCIT

ASKER

My appologies for leaving this open for so long, I was out of work for several weeks on a personal matter.

I think we are going to observe and see if this happens again. Its has been up a month since this occured with no other bluescreens. We are slowly going to migrate VMs back and see what happens.

Philip Elder

Thank you for the points.

I've seen other mention of this. It may actually be a bad update. But, I am not sure which one yet...

Philip