Solved

Help analyzing Windows minidump files to troubleshoot Blue Screens

Posted on 2008-10-09
11
842 Views
Last Modified: 2013-12-01
One of the Dell computers at our company has been experiencing the BSOD for quite some time, however, they are few and far between (about once per month).

The most recent message states:

The problem seems to be caused by the following file: Ntfs.sys
PAGE_FAULT_IN_NONPAGED_AREA
Stop: 0x00000050
NTFS.sys -Address F730663F base at F7304000, DATESTAMP 45XX56A7

I am not able to find out what the previous ones said, but by analyzing the minidumps in WinDbg, i can see that the errors are not the same.

Attached are the 9 most recent minidump files. I have tried opening some of these in WinDbg, a few of them say Memory Corruption, others reference problems with other files, such as NTFS.sys, ntoskrnl. My initial impression is that the ones referencing specific files are probably caused by the underlying memory problem.

I ran the dell diagnostic utility included in the boot menu for this computer, and found no errors. I know that bad memory can still pass this test, but I wanted to make sure that there aren't other problems and that memory is the only problem.

Can we tell if those other errors are caused by bad memory as well? I am new to using windbg. EE will not let me attach a zip file containing the dmp files, i will try hosting it and updating this question.
0
Comment
Question by:bradl3y
  • 5
  • 4
  • 2
11 Comments
 
LVL 6

Author Comment

by:bradl3y
ID: 22679926
0
 
LVL 6

Author Comment

by:bradl3y
ID: 22686299
We ran 20 loops of the Dell memory test, and all passed. Any idea what else could be wrong?
0
 
LVL 27

Expert Comment

by:Jonvee
ID: 22689877
Have used WinDbg to open four of the minidumps & i'm getting random errors also.
Examples>
IMAGE_NAME:  win32k.sys

FAILURE_BUCKET_ID:  0xD1_CODE_AV_BAD_IP_win32k!GreAcquireSemaphore+18

FAILURE_BUCKET_ID:  0x8E_win32k!EXLATEOBJ::bInitXlateObj+66

FAILURE_BUCKET_ID:  0x8E_SiSGRV+77b0

This is indicative of memory problems as you suspected.  Suspect RAM even though you had no failures.

Recommend you try the excellent memtest86+  v1.7
http://www.memtest.org/
0
 
LVL 27

Expert Comment

by:Jonvee
ID: 22689988
Cannot  be absolutely sure that faulty RAM is your *only* problem, but changing it seems the next logical step.
If you change it we could take a look at more Minidumps, and look for some consistancy in the errors.

Also suggest you check RAM socket(s) condition, and if you have more than one RAM stick, you could try removing all but one, then retest.

Presume cabinet and CPU cooling are ok  .. higher than normal temperature, dust?
Also assuming that you're using the correct RAM type.
0
 
LVL 6

Author Comment

by:bradl3y
ID: 22690001
Thanks, I will give memtest86+ a try. I am afraid dell is still going to be very stubborn with providing a replacement as it will not be their utility that is reporting an error. I am considering just telling them that their test failed. The Dell representative did not agree that if the memory passed 10 loops of their memory test, that it can still be defective. I know that it can, and I know that no memory test will be able to detect an error 100% of the time.

That is the whole reason you get to choose the number of loops, to increase the chance that the error will occur and be detected, keyword there being chance. Dell does not seem to want to accept that and was basically ignoring my results of WinDbg and simply stating "blue screens and minidumps can be cause by a long list of problems, such as OS corruption, hard drive errors, driver errors, or memory errors, if it passes 10 loops of memory testing, you will need to reinstall the OS". That is what windbg is for, to narrow down that long list!

Sorry, end of rant.

I will give Memtest86+ a try tonight to see if it reproduces errors. I just wanted to make sure that I was correct, as I am not very experienced using WinDbg.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 27

Expert Comment

by:Jonvee
ID: 22690116
> I know that no memory test will be able to detect an error 100% of the time <
i absolutely agree with you, although if a RAM *is* faulty memtest will almost certainly indicate that this is so.   And i understand your rant!

If you'd like some assistance in analysing your own dump file, this should help>
"How to read the small memory dump files that Windows creates for debugging":
http://support.microsoft.com/kb/315263

The !analyze -v command will probably be your most used command.

You can download windbg from this microsoft website.
http://www.microsoft.com/whdc/devtools/debugging/default.mspx

Perhaps the best article of all>
"Windows system crashes":
http://www.networkworld.com/news/2005/041105-windows-crash.html

Note that even with WinDbg, there's only about a 50% chance of a good result .. but we've made a start!
0
 
LVL 27

Accepted Solution

by:
Jonvee earned 350 total points
ID: 22692957
From your most recent Stop error, please view the "aumha" link below:
0x00000050: PAGE_FAULT_IN_NONPAGED_AREA

Here, suspect memory (including main memory, L2 RAM cache, video RAM) is named as the possible cause, but also incompatible software including remote control, and antivirus s/w.  It can also be other hardware problems.
http://aumha.org/a/stop.htm

Notes:  With only about one Stop occurance a month i have to admit that it does seem a little unusual if RAM is the cause.  Thought Stop error would have appeared more frequently.

From WinDbg you may notice that half of the time the failing module is shown as win32k.sys and ntoskrnl.exe, but it's unlikely these are the culprit(s).

Analysed 4 more of your Minidumps but with similar results, as expected.

Finally, you may wish to look at your antivirus software on that one machine, & perhaps consider an AV uninstall/reinstall.
0
 
LVL 87

Assisted Solution

by:rindi
rindi earned 150 total points
ID: 22693679
Also test your HD with the HD manufacturer's diagnostic tool. You'll find it on the UBCD.

http://ultimatebootcd.com
0
 
LVL 6

Author Comment

by:bradl3y
ID: 22701923
Thanks for your help so far. I let Memtest86+ run all weekend, it passed 195 loops, no errors. So it seems the RAM isn't the cause. Video RAM is shared i beleive, and would a CPU stress test find problems with the L2 Cache, or is there another way to test it?

I am running the dell harddrive diagnostics from their utility partition right now. Is there any point in running this in more than one loop, or should one be good? I've got the UBCD, so i will try the manufacturers test next.

If everything passes with the harddrive, i will check on the issue of AV software.
0
 
LVL 27

Expert Comment

by:Jonvee
ID: 22701940
i was just about to post this>>

Reviewing our past comments ...
Refering to an earlier statement of mine that Memtest will almost certainly indicate that a RAM is suspect if indeed it is, should have said that Memtest will *not necessarily* indicate that a RAM is faulty, when it is.

Assuming by now that Memtest also gave the green light to your RAM, and you've considered our other proposals, perhaps a faulty driver is running over the memory.
You could therefore try checking for driver updates.
0
 
LVL 87

Expert Comment

by:rindi
ID: 22702049
I wouldn't run the Dell HD test at all, they don't build disks.
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

We have adopted the strategy to use Computers in Student Labs as the bulletin boards. The same target can be achieved by using a Login Notice feature in Group policy but it’s not as attractive as graphical wallpapers with message which grabs the att…
The article will include the best Data Recovery Tools along with their Features, Capabilities, and their Download Links. Hope you’ll enjoy it and will choose the one as required by you.
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now