What caused this BSOD?

dee_nz
dee_nz used Ask the Experts™
on
Can someone please help me figure out what caused this server to crash.
This is a Dell PowerEdge 2950 running Windows Server 2008 R2, SP1, x64
Crash was caused by intelppm.sys? What does this do? Is it something to do with processor power management?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Author

Commented:
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck A, {fffff08005788858, e, 0, fffff800016fe102}

*** ERROR: Module load completed but symbols could not be loaded for intelppm.sys
Probably caused by : intelppm.sys ( intelppm+39c2 )

Followup: MachineOwner
---------

3: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: fffff08005788858, memory referenced
Arg2: 000000000000000e, IRQL
Arg3: 0000000000000000, bitfield :
      bit 0 : value 0 = read operation, 1 = write operation
      bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff800016fe102, address which referenced memory

Debugging Details:
------------------


READ_ADDRESS:  fffff08005788858

CURRENT_IRQL:  e

FAULTING_IP:
nt!KiIpiProcessRequests+b2
fffff800`016fe102 488b0411        mov     rax,qword ptr [rcx+rdx]

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

BUGCHECK_STR:  0xA

PROCESS_NAME:  System

TRAP_FRAME:  fffff880020f6890 -- (.trap 0xfffff880020f6890)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=fffff880020f6a40 rbx=0000000000000000 rcx=fffff80003691e18
rdx=fffff880020f6a40 rsi=0000000000000000 rdi=0000000000000000
rip=fffff800016fe102 rsp=fffff880020f6a20 rbp=fffff880009bf180
 r8=fffff880020d90c0  r9=000000a918593fb2 r10=fffff880020d2e40
r11=000000000000005d r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na po nc
nt!KiIpiProcessRequests+0xb2:
fffff800`016fe102 488b0411        mov     rax,qword ptr [rcx+rdx] ds:17c8:8858=????????????????
Resetting default scope

LAST_CONTROL_TRANSFER:  from fffff800016de769 to fffff800016df1c0

STACK_TEXT:  
fffff880`020f6748 fffff800`016de769 : 00000000`0000000a fffff080`05788858 00000000`0000000e 00000000`00000000 : nt!KeBugCheckEx
fffff880`020f6750 fffff800`016dd3e0 : 00000000`00000000 fffffa80`06f5c700 fffff880`020ce180 f8800578`88580105 : nt!KiBugCheckDispatch+0x69
fffff880`020f6890 fffff800`016fe102 : fffffa80`06f5c790 fffffa80`097fdce0 fffffa80`06f80010 fffff880`01b80c27 : nt!KiPageFault+0x260
fffff880`020f6a20 fffff800`016e908a : 00000000`00000000 fffff880`020f6b80 00000000`00000001 fffffa80`0712f310 : nt!KiIpiProcessRequests+0xb2
fffff880`020f6b00 fffff880`034ab9c2 : fffff800`016e7cf9 00000000`002c97f0 fffffa80`07111698 00000000`00000000 : nt!KiIpiInterrupt+0x12a
fffff880`020f6c98 fffff800`016e7cf9 : 00000000`002c97f0 fffffa80`07111698 00000000`00000000 00000000`00000000 : intelppm+0x39c2
fffff880`020f6ca0 fffff800`016d6e9c : fffff880`020ce180 00000000`00000000 00000000`00000000 fffff800`0178db20 : nt!PoIdle+0x52a
fffff880`020f6d80 00000000`00000000 : fffff880`020f7000 fffff880`020f1000 fffff880`020f6d40 00000000`00000000 : nt!KiIdleLoop+0x2c


STACK_COMMAND:  kb

FOLLOWUP_IP:
intelppm+39c2
fffff880`034ab9c2 c3              ret

SYMBOL_STACK_INDEX:  5

SYMBOL_NAME:  intelppm+39c2

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: intelppm

IMAGE_NAME:  intelppm.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4a5bc0fd

FAILURE_BUCKET_ID:  X64_0xA_intelppm+39c2

BUCKET_ID:  X64_0xA_intelppm+39c2

Followup: MachineOwner
---------
Distinguished Expert 2017

Commented:
Is this a dc in an ad?  Check whether the network interface/card is configured for being put to sleep?

See if nirsoft's blue screen helps .

Author

Commented:
Thanks arnold
This is just a member server in a domain not a DC.
The server crashed again but no memory dump this time
The NIC did have power managment on so have turned this off and installed latest NIC and video card drivers.
Had a look at nirsoft blue screen but it didnt give me any more info.
Let me know if you can think of anything else?
11/26 Forrester Webinar: Savings for Enterprise

How can your organization benefit from savings just by replacing your legacy backup solutions with Acronis' #CyberProtection? Join Forrester's Joe Branca and Ryan Davis from Acronis live as they explain how you can too.

Distinguished Expert 2017

Commented:
Does the server's power management puts the system to sleep/hibernate?

Author

Commented:
I havent configured any power management settings just installed Server 2008 R2 and left the default settings.
In control panel the power plan is set to balanced so I guess power management is enabled?
Should I set this to high performance to disable power management? And see if this stops the server from crashing?
Distinguished Expert 2017

Commented:
Check the individual components under the plan to see whether it tries to stop any drive, etc.
See whether changing the setting to high performance negate this.
Check the open manager admin to make sure it is not hardware related issue, memory, that led to the recent reboot.

Author

Commented:
Server has crashed again
There is a "CPU 1 machine check detected" error in the DRAC log but open manage shows no problems. I've attached a zip file with the event logs, DRAC log and minidumps.
I booted from a USB stick and ran Dell hardware diagnostics on the server for about 12 hours with no errors.
The CPU power management settings are the same between the balanced and high performance power plans. Dont think this is it anyway and the server crashed again since changing the power plan to high performance.
Help!!
Logs.zip
Distinguished Expert 2017
Commented:
Are there any hardware related log entries in the openmanager log?

http://en.community.dell.com/support-forums/servers/f/956/p/18371787/20051061.aspx

Author

Commented:
I've run every hardware diagnostic I can think of on this server and cant find anything wrong with the hardware.
Booted from USB and ran Dell diagnostics
Ran Memtest86+
Checked the hard drives for bad sectors.
Server has the latest BIOS and firmware updates installed

I think this is a software problem. How do I track it down?
The server has Windows Server 2008 + Windows updates, Dell drivers and Shadow protect backup software installed on it..
Have you uninstalled and reinstalled the latest chipset drivers from Intel?  The intelppm.sys is a processor driver.

Author

Commented:
The server has all the latest drivers from Dell installed. I think it is a software problem am running driver verifier to see if I can find out which driver is causing the problem.
Commented:
I couldnt get this server to run reliably without crashing. It would crash when I tried to copy a large amount of files to disk over the network. I tried Windows Server 2003 and a different network card but the server would still crash with the latest Dell firmware and drivers installed. In the end I was able to return it and exchange it for an IBM server. The IBM server runs fine.

Author

Commented:
Problem not solved but awarding points to say thanks for taking the time to try and help me out - Thank you!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial