Need some help interpreting some MiniDumps

Posted on 2009-12-16
Last Modified: 2013-12-12
I have a server that is crashing randomly, runnin windows 2003 sp2.  It just started suddenly and seems to crash while loading the os, system runs fine in safe mode. the mindumps look like they point to a IRQ conflict, but I am not sure where to start, since I did not add any new hardware to the system or even update any drivers.  Not even windows updates.  Virus scan came up empty, so did malware.  Any other pointers or ideas?  Below are three dumps but I have a few more, they all point to a different thing and with the expcetion of the IRQ realted, where it seems that has come up a few times.
{first dump}
Use !analyze -v to get detailed debugging information.

BugCheck 50, {8b4a4fac, 0, 804f40f7, 0}

*** WARNING: Unable to verify timestamp for Ntfs.sys

Could not read faulting driver name
*** WARNING: Unable to verify timestamp for srv.sys
Probably caused by : srv.sys ( srv!SrvCloseSearch+29 )

Followup: MachineOwner

1: kd> !analyze -v
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *

Invalid system memory was referenced.  This cannot be protected by try-except,
it must be protected by a Probe.  Typically the address is just plain bad or it
is pointing at freed memory.
Arg1: 8b4a4fac, memory referenced.
Arg2: 00000000, value 0 = read operation, 1 = write operation.
Arg3: 804f40f7, If non-zero, the instruction address which referenced the bad memory
Arg4: 00000000, (reserved)

Debugging Details:

Could not read faulting driver name

READ_ADDRESS:  8b4a4fac

804f40f7 ??              ???






LAST_CONTROL_TRANSFER:  from 80529bbb to 80536da0

f78d2b18 80529bbb 00000050 8b4a4fac 00000000 nt!MmProbeAndLockPages+0x78d
f78d2b68 80504fd7 00000000 8b4a4fac 00000000 nt!ExpGetProcessInformation+0x74
f78d2b80 f7218d06 00000000 f78d2bf4 88d84008 nt!sqrt+0xa5
f78d2bf8 88f1b5b0 b07eec13 88f1b5b0 f78d2c64 Ntfs!NtfsReserveMftRecord+0x37
WARNING: Frame IP not in any known module. Following frames may be wrong.
f78d2c08 b07f0d68 89af1790 8960f538 f78d2c2c 0x88f1b5b0
f78d2c64 b07f0bdf 88f1b458 88f1b5b0 89987830 srv!SrvCloseSearch+0x29
f78d2ca4 b07f116e 88f1b568 89987830 8050037f srv!SrvStartWaitForOplockBreak+0xaf
f78d2cc4 b07f11ad f78d2cec 8050037f b07e7a88 srv!SrvFreeTransaction+0x74
f78d2ccc 8050037f b07e7a88 00000000 00000000 srv!SrvReferenceSecurityContext+0x15
f78d2cd4 00000000 00000000 00000000 e1b2a868 nt!ZwSaveKey+0xf


b07f0d68 ??              ???


SYMBOL_NAME:  srv!SrvCloseSearch+29

FOLLOWUP_NAME:  MachineOwner


IMAGE_NAME:  srv.sys


FAILURE_BUCKET_ID:  0x50_srv!SrvCloseSearch+29

BUCKET_ID:  0x50_srv!SrvCloseSearch+29

Followup: MachineOwner
{end first Dump}

{Start second dump}
The current thread is making a bad pool request.  Typically this is at a bad IRQL level or double freeing the same allocation, etc.
Arg1: 00000007, Attempt to free pool which was already freed
Arg2: 00001153, (reserved)
Arg3: 00000000, Memory contents of the pool block
Arg4: 886c4318, Address of the block of pool being deallocated

Debugging Details:

GetUlongFromAddress: unable to read from 8058fa98

POOL_ADDRESS:  886c4318





LAST_CONTROL_TRANSFER:  from 8055baa1 to 80536da0

f793aba0 8055baa1 000000c2 00000007 00001153 nt!MmProbeAndLockPages+0x78d
f793ac00 8055b416 886c4318 00000000 f71f403b nt!MiMakeOutswappedPageResident+0x3ed
f793ac58 8050c0af 89a21020 895f6008 886c4318 nt!MiAttemptPageFileReduction+0xa1
f793ac88 f72abf2e f72ab627 8a4ff030 895f6008 nt!CcFlushCache+0x206
f793acb4 8050c0af 00000000 8957c678 8957c810 CLASSPNP!ClassInterpretSenseInfo+0x932
f793ace4 f72e3aeb 896c9940 8a00d80c 8a00d2e4 nt!CcFlushCache+0x206
WARNING: Stack unwind information not available. Following frames may be wrong.
f793acf8 8a55f0e8 896c9940 8a00db04 f72e51e5 iaStor+0x24aeb
f793acfc 896c9940 8a00db04 f72e51e5 896c9940 0x8a55f0e8
f793ad00 8a00db04 f72e51e5 896c9940 f72e1504 0x896c9940
f793ad04 f72e51e5 896c9940 f72e1504 8a00d80c 0x8a00db04
f793ad08 896c9940 f72e1504 8a00d80c 896c9940 iaStor+0x261e5
f793ad0c f72e1504 8a00d80c 896c9940 8a2dc1f0 0x896c9940
f793ad10 8a00d80c 896c9940 8a2dc1f0 f72e057b iaStor+0x22504
f793ad14 896c9940 8a2dc1f0 f72e057b 896c9940 0x8a00d80c
f793ad18 8a2dc1f0 f72e057b 896c9940 f72f19af 0x896c9940
f793ad1c f72e057b 896c9940 f72f19af 8a00db04 0x8a2dc1f0
f793ad20 896c9940 f72f19af 8a00db04 896c9940 iaStor+0x2157b
f793ad24 f72f19af 8a00db04 896c9940 8a153828 0x896c9940
f793ad28 8a00db04 896c9940 8a153828 8a2dc1f0 iaStor+0x329af
f793ad2c 896c9940 8a153828 8a2dc1f0 00000000 0x8a00db04
f793ad30 8a153828 8a2dc1f0 00000000 f72f1abd 0x896c9940
f793ad34 8a2dc1f0 00000000 f72f1abd 8a2dc1f0 0x8a153828
f793ad38 00000000 f72f1abd 8a2dc1f0 f72ea8b2 0x8a2dc1f0


f72abf2e ??              ???


SYMBOL_NAME:  CLASSPNP!ClassInterpretSenseInfo+932

FOLLOWUP_NAME:  MachineOwner




FAILURE_BUCKET_ID:  0xc2_7_CLASSPNP!ClassInterpretSenseInfo+932

BUCKET_ID:  0xc2_7_CLASSPNP!ClassInterpretSenseInfo+932

Followup: MachineOwner
{end second dump}
{third dump}
0: kd> !analyze -v
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *

This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0x80000003.  This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG.  This is not supposed to happen as developers should never have
hardcoded breakpoints in retail code, but ...
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG.  This will let us see why this breakpoint is
Arg1: c0000005, The exception code that was not handled
Arg2: 805ab3c8, The address that the exception occurred at
Arg3: af86e96c, Trap Frame
Arg4: 00000000

Debugging Details:

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".

805ab3c8 8b4048          mov     eax,dword ptr [eax+48h]

TRAP_FRAME:  af86e96c -- (.trap 0xffffffffaf86e96c)
ErrCode = 00000000
eax=0a54f400 ebx=00000000 ecx=fdffffff edx=8962c921 esi=8962c920 edi=89a0bda0
eip=805ab3c8 esp=af86e9e0 ebp=af86ed48 iopl=0         nv up ei ng nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010282
805ab3c8 8b4048          mov     eax,dword ptr [eax+48h] ds:0023:0a54f448=????????
Resetting default scope





LAST_CONTROL_TRANSFER:  from 80502149 to 805ab3c8

af86ed48 80502149 00000001 01b1fe34 00000001 nt!RtlQueryRegistryValues+0x181
af86ed48 7ffe0304 00000001 01b1fe34 00000001 nt!KiTrap00+0xae
01b1fe90 00000000 00000000 00000000 00000000 SharedUserData!SystemCallStub+0x4


805ab3c8 8b4048          mov     eax,dword ptr [eax+48h]


SYMBOL_NAME:  nt!RtlQueryRegistryValues+181

FOLLOWUP_NAME:  MachineOwner


IMAGE_NAME:  ntoskrnl.exe


FAILURE_BUCKET_ID:  0x8E_nt!RtlQueryRegistryValues+181

BUCKET_ID:  0x8E_nt!RtlQueryRegistryValues+181

Followup: MachineOwner
{end third dump}
Question by:tsaico
    LVL 87

    Accepted Solution

    If a server crashes with different minidumps, I'd try to test the hardware. First clean out all dust and make sure the fans all run freely. Remove the heatsinks from the CPU's and clean both surfaces very thoroughly, then add a very small drop of fresh thermal transfer paste to the CPU's surfaces and firmly reattach the heatsinks.

    Test the server's RAM using memtest86+, and if that test finds errors, test each module separately to find the bad module. Also test each RAM slot in the Server separately as sometimes a slot can go bad. Replace any bad RAM module.

    Test the HD's using their manufacturer's diagnostic utility. Often with raided drives you'll have to test each disk on a PC that doesn't have RAID, as the diagnostic utility often can't access the drive through the RAID controller. Make sure you don't boot the server while a drive is missing, and also make sure that if you put it back into the same slot it came from. Also look at the drive's SMART status (most RAID controllers should show you any SMART alerts).

    If all that looks good run an sfc /scannow from a cmd prompt on the server to check the integrity of it's system files and replace them if necessary.

    You'll find the memory testing and HD diagnostic tools on the UBCD:
    LVL 12

    Assisted Solution

    by:John Griffith

    Hi -
    I would like to see additional driver information given the sudden onset of the
    BSODs.  I agree that various bugchecks do tend to point to hardware.  However, the
    fact that SAFEMODE produces no BSODs warrants a closer look at the drivers.

    0x50 = invlaid memory referenced; names srv.sys
     w/ timestamp = 44f80560 =  Fri Sep 01 06:03:12 2006

    0xc2 (0x7,,,) = attempt made to free the pool, which was already freed
     - named classpnp.sys, t/s = 3e800766 = Tue Mar 25 03:38:14 2003

    0x8e (0xc0000005,,,) = kernel mode exception - memory access violation
     - NT itself named, t/s 45e7eae6 =  Fri Mar 02 04:14:14 2007

    In my experience, these all could be attributable to 3rd party drivers clashing with
    the Microsoft drivers - named in each BSOD as the probable cause

    I would like to see loaded driver listings; however if you don't mind uploading the
    dumps, it would make things easier to check out.

    Otherwise, please issue the additional windbg commands:
    lmnt; lmntsm

    Happy Holidays!

    LVL 12

    Expert Comment

    by:John Griffith
    Any news on this?
    LVL 9

    Author Closing Comment

    Sorry, forgot to update.  It turned out it was some sort of sound card driver update.  for some reason, it was killing the server when trying to boot.  An associate took a look at the dump and said almost the identical thing you guys have said.
    Once we backed out of the sound card drivers, the server was booting fine.  But it also doesn't matter too much as I just found out it is slated for retirement at the end of Jan '10.  So thanks all!

    Featured Post

    How to improve team productivity

    Quip adds documents, spreadsheets, and tasklists to your Slack experience
    - Elevate ideas to Quip docs
    - Share Quip docs in Slack
    - Get notified of changes to your docs
    - Available on iOS/Android/Desktop/Web
    - Online/Offline

    Join & Write a Comment

    A list of useful business intelligence software.
    Healthcare organizations in the United States must adhere to the guidance of both the HIPAA (Health Insurance Portability and Accountability Act) and HITECH (Health Information Technology for Economic and Clinical Health Act) for securing and protec…
    The viewer will learn how to create multiple layers to apply various filters and how to delete areas from each layer’s filter.
    Video by: Tony
    This video teaches viewers how to export a project from Adobe Premiere Pro and the various file types involved.

    755 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    16 Experts available now in Live!

    Get 1:1 Help Now