Dell PowerEdge BSOD

markflexman
markflexman used Ask the Experts™
on
Hi

I support a client who has a 5 Year old Dell PowerEdge R520 Server 2008 r2 on which we've recently been seeing some BSOD's, one this month one last month and the one back in Oct. Dumpcheck says it could be caused by memory corruption but I'm no expert at analysing such files, see below. I know it says a possible problem with the disk subsystem but can anyone suggest a more detailed explanation, i.e. does this look like a hardware issue of some sort?

Thanks


Microsoft (R) Windows Debugger Version 10.0.14321.1024 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [D:\Temp\MEMORY.DMP]
Kernel Summary Dump File: Kernel address space is available, User address space may not be available.


************* Symbol Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       SRV*C:\Windows\symbol_cache*http://msdl.microsoft.com/download/symbols
Symbol search path is: SRV*C:\Windows\symbol_cache*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows 7 Kernel Version 7601 (Service Pack 1) MP (24 procs) Free x64
Product: LanManNt, suite: TerminalServer
Built by: 7601.24117.amd64fre.win7sp1_ldr_escrow.180422-1430
Machine Name:
Kernel base = 0xfffff800`0265e000 PsLoadedModuleList = 0xfffff800`0289dc90
Debug session time: Wed Jun  6 18:09:43.744 2018 (UTC + 1:00)
System Uptime: 24 days 6:16:38.431
Loading Kernel Symbols
...............................................................
................................................................
.........................................
Loading User Symbols

Loading unloaded module list
...........................
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 7A, {fffff6fc40034b58, ffffffffc000003c, 60b8e9be0, fffff8800696b000}

Probably caused by : memory_corruption ( nt!MiWaitForInPageComplete+6c5 )

Followup:     MachineOwner
---------

10: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

KERNEL_DATA_INPAGE_ERROR (7a)
The requested page of kernel data could not be read in.  Typically caused by
a bad block in the paging file or disk controller error. Also see
KERNEL_STACK_INPAGE_ERROR.
If the error status is 0xC000000E, 0xC000009C, 0xC000009D or 0xC0000185,
it means the disk subsystem has experienced a failure.
If the error status is 0xC000009A, then it means the request failed because
a filesystem failed to make forward progress.
Arguments:
Arg1: fffff6fc40034b58, lock type that was held (value 1,2,3, or PTE address)
Arg2: ffffffffc000003c, error status (normally i/o status code)
Arg3: 000000060b8e9be0, current process (virtual address for lock type 3, or PTE)
Arg4: fffff8800696b000, virtual address that could not be in-paged (or PTE contents if arg1 is a PTE address)

Debugging Details:
------------------


DUMP_CLASS: 1

DUMP_QUALIFIER: 401

BUILD_VERSION_STRING:  7601.24117.amd64fre.win7sp1_ldr_escrow.180422-1430

SYSTEM_MANUFACTURER:  Dell Inc.

SYSTEM_PRODUCT_NAME:  PowerEdge R520

SYSTEM_SKU:  SKU=NotProvided;ModelName=PowerEdge R520

BIOS_VENDOR:  Dell Inc.

BIOS_VERSION:  2.1.2

BIOS_DATE:  01/20/2014

BASEBOARD_MANUFACTURER:  Dell Inc.

BASEBOARD_PRODUCT:  051XDX

BASEBOARD_VERSION:  A00

DUMP_TYPE:  1

BUGCHECK_P1: fffff6fc40034b58

BUGCHECK_P2: ffffffffc000003c

BUGCHECK_P3: 60b8e9be0

BUGCHECK_P4: fffff8800696b000

ERROR_CODE: (NTSTATUS) 0xc000003c - {Data Overrun}  A data overrun error occurred.

BUGCHECK_STR:  0x7a_c000003c

CPU_COUNT: 18

CPU_MHZ: 898

CPU_VENDOR:  GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 2d

CPU_STEPPING: 7

DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT

PROCESS_NAME:  System

CURRENT_IRQL:  0

ANALYSIS_SESSION_HOST:  LAPMAX

ANALYSIS_SESSION_TIME:  06-06-2018 19:00:58.0037

ANALYSIS_VERSION: 10.0.14321.1024 amd64fre

LAST_CONTROL_TRANSFER:  from fffff800027c8065 to fffff800027024a0

STACK_TEXT:  
fffff880`02d8d768 fffff800`027c8065 : 00000000`0000007a fffff6fc`40034b58 ffffffff`c000003c 00000006`0b8e9be0 : nt!KeBugCheckEx
fffff880`02d8d770 fffff800`027d310f : fffffa80`1d092010 fffff880`02d8d890 fffff800`02904540 ffffffff`ffffffff : nt!MiWaitForInPageComplete+0x6c5
fffff880`02d8d840 fffff800`027d9f84 : ffffffff`ffffffff 02000000`00000001 00000000`c0033333 ffffffff`ffffffff : nt!MiIssueHardFault+0x4bf
fffff880`02d8d8d0 fffff800`026d3db5 : 02000000`00000001 fffff880`0696b000 fffff800`0265e000 fffff6fc`40034b30 : nt!MmAccessFault+0x4784
fffff880`02d8da20 fffff800`026d3b60 : fffffa80`00000001 fffff880`02d8db50 fffffa80`38721060 fffff800`026cabb3 : nt!MiInPageSingleKernelStack+0x225
fffff880`02d8db30 fffff800`026d2407 : fffffa80`38721060 00000000`00000080 fffffa80`18b3a040 fffffa80`3873d100 : nt!MmInPageKernelStack+0x40
fffff880`02d8db90 fffff800`026d23e0 : 00000000`00000000 00000000`00000000 fffffa80`18b3a000 fffffa80`18b3a000 : nt!KiInSwapKernelStacks+0x1f
fffff880`02d8dbc0 fffff800`029a6b5c : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KeSwapProcessOrStack+0x84
fffff880`02d8dc00 fffff800`02708916 : fffff880`0271e180 fffffa80`18b773e0 fffff880`0272d640 00000000`00000000 : nt!PspSystemThreadStartup+0x140
fffff880`02d8dc40 00000000`00000000 : fffff880`02d8e000 fffff880`02d88000 fffff880`02d8d4c0 00000000`00000000 : nt!KiStartSystemThread+0x16


STACK_COMMAND:  kb

THREAD_SHA1_HASH_MOD_FUNC:  540b92aed471c5d1c8687a625de5c23c059449fd

THREAD_SHA1_HASH_MOD_FUNC_OFFSET:  9c10981b0d9c40adfffe0f7c5c31e470bbbbd777

THREAD_SHA1_HASH_MOD:  bc100a5647b828107ac4e18055e00abcbe1ec406

FOLLOWUP_IP:
nt!MiWaitForInPageComplete+6c5
fffff800`027c8065 cc              int     3

FAULT_INSTR_CODE:  5a8a80cc

SYMBOL_STACK_INDEX:  1

SYMBOL_NAME:  nt!MiWaitForInPageComplete+6c5

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

DEBUG_FLR_IMAGE_TIMESTAMP:  5add19ea

IMAGE_VERSION:  6.1.7601.24117

IMAGE_NAME:  memory_corruption

FAILURE_BUCKET_ID:  X64_0x7a_c000003c_nt!MiWaitForInPageComplete+6c5

BUCKET_ID:  X64_0x7a_c000003c_nt!MiWaitForInPageComplete+6c5

PRIMARY_PROBLEM_CLASS:  X64_0x7a_c000003c_nt!MiWaitForInPageComplete+6c5

TARGET_TIME:  2018-06-06T17:09:43.000Z

OSBUILD:  7601

OSSERVICEPACK:  1000

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK:  16

PRODUCT_TYPE:  2

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 7

OSEDITION:  Windows 7 LanManNt (Service Pack 1) TerminalServer

OS_LOCALE:  

USER_LCID:  0

OSBUILD_TIMESTAMP:  2018-04-23 00:25:30

BUILDDATESTAMP_STR:  180422-1430

BUILDLAB_STR:  win7sp1_ldr_escrow

BUILDOSVER_STR:  6.1.7601.24117.amd64fre.win7sp1_ldr_escrow.180422-1430

ANALYSIS_SESSION_ELAPSED_TIME: f13

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:x64_0x7a_c000003c_nt!miwaitforinpagecomplete+6c5

FAILURE_ID_HASH:  {dc41fff2-03bf-16a7-5755-19e974c53aef}

Followup:     MachineOwner
---------
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2016

Commented:
a minidump would assist greatly

Author

Commented:
Hi David, it didn't seem to contain much as far as I could see but here you go.
060618-34055-01.dmp

Author

Commented:
Machine Name:
Kernel base = 0xfffff800`0265e000 PsLoadedModuleList = 0xfffff800`0289dc90
Debug session time: Wed Jun  6 18:09:43.744 2018 (UTC + 1:00)
System Uptime: 24 days 6:16:38.431
Loading Kernel Symbols
..

Press ctrl-c (cdb, kd, ntsd) or ctrl-break (windbg) to abort symbol loads that take too long.
Run !sym noisy before .reload to track down problems loading symbols.

.............................................................
................................................................
.........................................
Loading User Symbols
Loading unloaded module list
...........................
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 7A, {fffff6fc40034b58, ffffffffc000003c, 60b8e9be0, fffff8800696b000}

Probably caused by : memory_corruption ( nt!MiWaitForInPageComplete+6c5 )

Followup:     MachineOwner
---------

10: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

KERNEL_DATA_INPAGE_ERROR (7a)
The requested page of kernel data could not be read in.  Typically caused by
a bad block in the paging file or disk controller error. Also see
KERNEL_STACK_INPAGE_ERROR.
If the error status is 0xC000000E, 0xC000009C, 0xC000009D or 0xC0000185,
it means the disk subsystem has experienced a failure.
If the error status is 0xC000009A, then it means the request failed because
a filesystem failed to make forward progress.
Arguments:
Arg1: fffff6fc40034b58, lock type that was held (value 1,2,3, or PTE address)
Arg2: ffffffffc000003c, error status (normally i/o status code)
Arg3: 000000060b8e9be0, current process (virtual address for lock type 3, or PTE)
Arg4: fffff8800696b000, virtual address that could not be in-paged (or PTE contents if arg1 is a PTE address)

Debugging Details:
------------------


DUMP_CLASS: 1

DUMP_QUALIFIER: 400

BUILD_VERSION_STRING:  7601.24117.amd64fre.win7sp1_ldr_escrow.180422-1430

SYSTEM_MANUFACTURER:  Dell Inc.

SYSTEM_PRODUCT_NAME:  PowerEdge R520

SYSTEM_SKU:  SKU=NotProvided;ModelName=PowerEdge R520

BIOS_VENDOR:  Dell Inc.

BIOS_VERSION:  2.1.2

BIOS_DATE:  01/20/2014

BASEBOARD_MANUFACTURER:  Dell Inc.

BASEBOARD_PRODUCT:  051XDX

BASEBOARD_VERSION:  A00

DUMP_TYPE:  2

BUGCHECK_P1: fffff6fc40034b58

BUGCHECK_P2: ffffffffc000003c

BUGCHECK_P3: 60b8e9be0

BUGCHECK_P4: fffff8800696b000

ERROR_CODE: (NTSTATUS) 0xc000003c - {Data Overrun}  A data overrun error occurred.

BUGCHECK_STR:  0x7a_c000003c

CPU_COUNT: 18

CPU_MHZ: 898

CPU_VENDOR:  GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 2d

CPU_STEPPING: 7

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT_SERVER

PROCESS_NAME:  System

CURRENT_IRQL:  0

ANALYSIS_SESSION_HOST:  LAPMAX

ANALYSIS_SESSION_TIME:  06-06-2018 21:54:35.0763

ANALYSIS_VERSION: 10.0.14321.1024 amd64fre

LAST_CONTROL_TRANSFER:  from fffff800027c8065 to fffff800027024a0

STACK_TEXT:  
fffff880`02d8d768 fffff800`027c8065 : 00000000`0000007a fffff6fc`40034b58 ffffffff`c000003c 00000006`0b8e9be0 : nt!KeBugCheckEx
fffff880`02d8d770 fffff800`027d310f : fffffa80`1d092010 fffff880`02d8d890 fffff800`02904540 ffffffff`ffffffff : nt!MiWaitForInPageComplete+0x6c5
fffff880`02d8d840 fffff800`027d9f84 : ffffffff`ffffffff 02000000`00000001 00000000`c0033333 ffffffff`ffffffff : nt!MiIssueHardFault+0x4bf
fffff880`02d8d8d0 fffff800`026d3db5 : 02000000`00000001 fffff880`0696b000 fffff800`0265e000 fffff6fc`40034b30 : nt!MmAccessFault+0x4784
fffff880`02d8da20 fffff800`026d3b60 : fffffa80`00000001 fffff880`02d8db50 fffffa80`38721060 fffff800`026cabb3 : nt!MiInPageSingleKernelStack+0x225
fffff880`02d8db30 fffff800`026d2407 : fffffa80`38721060 00000000`00000080 fffffa80`18b3a040 fffffa80`3873d100 : nt!MmInPageKernelStack+0x40
fffff880`02d8db90 fffff800`026d23e0 : 00000000`00000000 00000000`00000000 fffffa80`18b3a000 fffffa80`18b3a000 : nt!KiInSwapKernelStacks+0x1f
fffff880`02d8dbc0 fffff800`029a6b5c : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KeSwapProcessOrStack+0x84
fffff880`02d8dc00 fffff800`02708916 : fffff880`0271e180 fffffa80`18b773e0 fffff880`0272d640 00000000`00000000 : nt!PspSystemThreadStartup+0x140
fffff880`02d8dc40 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16


STACK_COMMAND:  kb

THREAD_SHA1_HASH_MOD_FUNC:  540b92aed471c5d1c8687a625de5c23c059449fd

THREAD_SHA1_HASH_MOD_FUNC_OFFSET:  9c10981b0d9c40adfffe0f7c5c31e470bbbbd777

THREAD_SHA1_HASH_MOD:  bc100a5647b828107ac4e18055e00abcbe1ec406

FOLLOWUP_IP:
nt!MiWaitForInPageComplete+6c5
fffff800`027c8065 cc              int     3

FAULT_INSTR_CODE:  5a8a80cc

SYMBOL_STACK_INDEX:  1

SYMBOL_NAME:  nt!MiWaitForInPageComplete+6c5

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

DEBUG_FLR_IMAGE_TIMESTAMP:  5add19ea

IMAGE_VERSION:  6.1.7601.24117

IMAGE_NAME:  memory_corruption

FAILURE_BUCKET_ID:  X64_0x7a_c000003c_nt!MiWaitForInPageComplete+6c5

BUCKET_ID:  X64_0x7a_c000003c_nt!MiWaitForInPageComplete+6c5

PRIMARY_PROBLEM_CLASS:  X64_0x7a_c000003c_nt!MiWaitForInPageComplete+6c5

TARGET_TIME:  2018-06-06T17:09:43.000Z

OSBUILD:  7601

OSSERVICEPACK:  1000

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK:  16

PRODUCT_TYPE:  2

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 7

OSEDITION:  Windows 7 LanManNt (Service Pack 1) TerminalServer

OS_LOCALE:  

USER_LCID:  0

OSBUILD_TIMESTAMP:  2018-04-23 00:25:30

BUILDDATESTAMP_STR:  180422-1430

BUILDLAB_STR:  win7sp1_ldr_escrow

BUILDOSVER_STR:  6.1.7601.24117.amd64fre.win7sp1_ldr_escrow.180422-1430

ANALYSIS_SESSION_ELAPSED_TIME: 956

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:x64_0x7a_c000003c_nt!miwaitforinpagecomplete+6c5

FAILURE_ID_HASH:  {dc41fff2-03bf-16a7-5755-19e974c53aef}

Followup:     MachineOwner
CompTIA Security+

Learn the essential functions of CompTIA Security+, which establishes the core knowledge required of any cybersecurity role and leads professionals into intermediate-level cybersecurity jobs.

Sudeep SharmaTechnical Designer

Commented:
From your logs in the post:
IMAGE_NAME:  memory_corruption

It seems like memory issue. Run the memory check on the server to verify if memory is the issue.

Further your BIOS is too old, 2014, latest could be find below, please update it before your ask for support from Dell.

Latest BIOS Update:
http://www.dell.com/support/home/in/en/indhs1/drivers/driversdetails?driverId=30M7X

Thanks,
Sudeep
President and Btrieve Guru
Commented:
This error code does not indicate a memory issue.  An InPage I/O Error can occur when part of the OS has been pushed out of memory (or to the swap file) and it must be read back in in order to continue running.  In this case, the disk subsystem was unable to fulfill the request in a timely manner (or at all).  Because the OS cannot continue, the BSOD prevents "other bad things" from occurring.

Check your disk subsystem from start to finish.  Check drivers, BIOS versions of the controller, BIOS versions of the hard disks, etc.  Just update everything!  If this is a mission-critical server, then it may just be due for replacement, as it is 5 years old and likely running out of warranty anyway.

Author

Commented:
Thanks Bill I suspect you are right about it now needing replacement, it is now out of warranty. Considering that it has been happily ticking along for 5 years with all the same firmware/drivers/BIOS etc could one assume that a problem which has developed such as this would probably not be solved by updating anything? (not that I won't of course try all that as a last ditch attempt!).
Bill BachPresident and Btrieve Guru

Commented:
Check Windows (system) logs for other details on the errors, too.  You might see some other information about impending failures.  Don't forget to check the Dell hardware-specific logs, too, from Dell Server Manager, or whatever solution you might have installed.

Author

Commented:
Haven't had a BSOD for a while now and customer is talking about replacing this server so will probably leave it alone unless it gets critical in which case I'll go for the BIOS and firmware updates.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial