Solved

Windows 2008 R2 hyper-V cluster node bluescreen assistance  - BAD_POOL_HEADER

Posted on 2014-02-18
5
1,084 Views
Last Modified: 2014-11-12
Hello,

I have a 4 node Server 2008 R2 failover cluster running hyper-v servers.  Over the weekend we had a bluescreen out of no where, the cluster has been relatively solid.  This is the first time I have seen this and was hoping that I could be pointed in the right direction.  I did google and find a technet forum thread about something that seems to match the issue, but I wanted to get some other eyes on this.  This cluster is an important part of our data center so I wanted to be extra careful before hotfixing a node and regretting it later.  Thank you for any assistance you can provide.

hardware:  ProLiant BL480c G1

This is the forum thread I have found so far:
http://social.technet.microsoft.com/Forums/en-US/7dab2216-f22a-433f-bf61-a7eeb2aa4adf/blue-screen-badpoolheader?forum=winservergen

-------------------------

Below is the minidump info:


*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

BAD_POOL_HEADER (19)
The pool is already corrupt at the time of the current request.
This may or may not be due to the caller.
The internal pool links must be walked to figure out a possible cause of
the problem, and then special pool applied to the suspect tags or the driver
verifier to a suspect driver.
Arguments:
Arg1: 0000000000000020, a pool block header size is corrupt.
Arg2: fffffa8023161000, The pool entry we were looking for within the page.
Arg3: fffffa8023161340, The next pool entry.
Arg4: 000000000c340000, (reserved)

Debugging Details:
------------------


BUGCHECK_STR:  0x19_20

POOL_ADDRESS:  fffffa8023161000

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  DRIVER_FAULT_SERVER_MINIDUMP

PROCESS_NAME:  WmiApSrv.exe

CURRENT_IRQL:  1

IRP_ADDRESS:  fffffa8023160fc8

LAST_CONTROL_TRANSFER:  from fffff800019ec6d3 to fffff800018b9880

STACK_TEXT:  
fffff880`08e62648 fffff800`019ec6d3 : 00000000`00000019 00000000`00000020 fffffa80`23161000 fffffa80`23161340 : nt!KeBugCheckEx
fffff880`08e62650 fffff800`018d8cce : 00000000`a0000003 fffffa80`236bb230 00000000`20206f49 00000000`00000000 : nt!ExFreePoolWithTag+0x18b4
fffff880`08e62700 fffff800`018bc276 : fffffa80`23161040 00000000`00000000 00000000`00000001 fffff8a0`023dc810 : nt!IopCompleteRequest+0x5ce
fffff880`08e627d0 fffff800`01b4622a : fffffa80`1f3146a0 fffff800`019ed400 fffffa80`1e937dc0 00000000`00000000 : nt!IopfCompleteRequest+0x6f6
fffff880`08e628c0 fffff800`01bd08f7 : fffffa80`1f3146a0 fffff880`08e62ca0 fffff880`08e62ca0 fffffa80`236bb230 : nt!WmipIoControl+0xd6
fffff880`08e62a10 fffff800`01bd1156 : 00000000`ffffff01 00000000`000001b8 00000000`00000000 00000000`00000000 : nt!IopXxxControlFile+0x607
fffff880`08e62b40 fffff800`018b8ad3 : fffffa80`1bfc2d80 00000000`00000000 00000000`00000000 fffff800`018b5487 : nt!NtDeviceIoControlFile+0x56
fffff880`08e62bb0 00000000`77bdf72a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
00000000`00d8eee8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x77bdf72a


STACK_COMMAND:  kb

PROCESS_OBJECT: fffffa801ce56060

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: WmiApSrv

IMAGE_NAME:  WmiApSrv.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  0

FAILURE_BUCKET_ID:  X64_0x19_20_IMAGE_WmiApSrv.exe

BUCKET_ID:  X64_0x19_20_IMAGE_WmiApSrv.exe

Followup: MachineOwner
---------
0
Comment
Question by:TNCIT
  • 3
  • 2
5 Comments
 
LVL 38

Accepted Solution

by:
Philip Elder earned 500 total points
Comment Utility
Out of curiosity, how old is this setup?

Are all nodes running at the same Service Pack and patch level as well as driver updates?

Are all of the nodes and the chassis running the most current firmware?

Yours is the second I've seen in a week where none were seen prior to that. :S

Philip
0
 
LVL 1

Author Comment

by:TNCIT
Comment Utility
Hi Philip!

Thank you for your response.  The answer to your questions are as follows:

Its pretty old.  It used to be a 3 blade cluster with identical hardware.  A newer generation blade was added as the 4th node later on.  

I will admit, the firmware on the chassis and the drivers are all pretty old.  For something as critical as this cluster, we kind of have a 'if its not broke, dont fix it' mentality.  This did cross my mind though, I just didnt want to go chasing a white rabbit that might lead me down another hole entirely.  I wanted a little more evidence before using the shotgun blast appraoch of updating firmware/drivers accross the board with no direction.  If that makes sense.  

We do have the nodes at the same service pack and update levels across cluster.  

The nodes do not run anything else except our Hyper-V cluster.
0
 
LVL 38

Assisted Solution

by:Philip Elder
Philip Elder earned 500 total points
Comment Utility
Another option would be to evict the node, flatten and re-install it, and bring it back into the cluster.

That may be a bit less extreme relatively speaking.

Philip
0
 
LVL 1

Author Comment

by:TNCIT
Comment Utility
My appologies for leaving this open for so long, I was out of work for several weeks on a personal matter.

I think we are going to observe and see if this happens again.  Its has been up a month since this occured with no other bluescreens.  We are slowly going to migrate VMs back and see what happens.
0
 
LVL 38

Expert Comment

by:Philip Elder
Comment Utility
Thank you for the points.

I've seen other mention of this. It may actually be a bad update. But, I am not sure which one yet...

Philip
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Hyper-convergence systems have taken the IT world by storm and have quickly started to change our point of view of how the data center should and could be architected. In this article, I’ll explain the benefits of employing a hyper-converged system …
Every server (virtual or physical) needs a console: and the console can be provided through hardware directly connected, software for remote connections, local connections, through a KVM, etc. This document explains the different types of consol…
This tutorial will give a an overview on how to deploy remote agents in Backup Exec 2012 to new servers. Click on the Backup Exec button in the upper left corner. From here, are global settings for the application such as connecting to a remote Back…
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now