Review of method to diagnose bluescreens?

Hi Experts,

I work at an IT repair shop and we are constantly (as you can imagine) running across bluescreen issues.   My normal process for diagnosing these goes like this.

1. test ram with windows memory diag
2. analyze dump files with bluescreenview
3. test hard drive for bad sectors
4. scan for viruses
 if all the above come out good/clean and the .dmp files don't point to anything definitive i continue with a few more steps
5. update all drivers from manufactures website
6. uninstall any unecessary apps
7. uninstall/reinstall antivirus software
8. test run for 12 hrs to see if bluescreen occurs - if no bluescreen's we give back to customer - if bluescreens occur again for customer or one occurs during 12 hr run then i continue with a few more steps
9. hook up a new power supply of at least 100 watts greater than one currently installed and let it run for 12 hrs - if no bluescreens we give it back to customer - if it still bluescreens we replace the ram and give it back to customer
10.  if we still see bluescreens after replacing the ram we backup data and wipe the system

What do you think of what i'm doing here and can anyone provide input on a better way to be more thorough on reading dump files?  It seems like i'm running constantly into ntfs.sys, hal.dll, ntoskrnl.exe bluescreens and from what i read on these they don't point to anything definitive...
thecomputerplaceAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Netman66Commented:
Sounds pretty thorough to me.

This article describes much of why this happens.

http://support.microsoft.com/kb/314477

0
ded9Commented:
If you need help with a particular issue we can help you ...upload the dmp file. ....You can use microsoft debugging tools for proper anlaysis of dump files.


http://msdn.microsoft.com/en-us/windows/hardware/gg463009



Ded9
0
rindiCommented:
I'd do the bluescreen analysis first, and examine at least the 3 most recent of the dmp's.

Sometimes it may also be better to install older driver.

Remove any 3rd party hardware (including printer drivers etc).

Clean the PC's insides. Check all Fans run smoothly, clean CPU surface and heatsink from dust and old thermal transfer paste or pads, add very small drop of fresh thermal transfer paste and firmly reattach the heatsinks.

When testing the HD start with the HD manufacturer's diagnostic utility and only do chkdsk after that if the first test didn't find any errors.

Check BIOS settings (disable overclocking, or use "Fail-Safe" settings if that is available).

Visually examine the mainboard for bad caps, replace the CMOS battery.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

BCipolloneCommented:
Seems good. I don't like all that turn around though. This is what I do:

1. Ask user if anything on the system changed recently (software or hardware)
  If yes (Fix drivers for whatever is new, if not assume hardware, windows update or system corruption)
2. Use bluescreenview (look for bad driver)
3. Do Hardware Diagnostics
4. Wipe and reload system
5. Done

If bad hardware replace it if not a wipe and reload always fixes the problem and prevents returns. (System file corruption is common)

Overall Time to fix computer - 1 day if it's a wipe and reload case. (XP Updates are a itch)
0
BCipolloneCommented:
Side note: Replacing hardware seems like a bigger cost that wiping the system. Not sure why you would try replacing the power supply or ram before a wipe. Especially if you ran hardware diagnostics and the system was running fine previously.

Also if you can not easily reproduce the blue screen then trouble shooting it within the os with drivers is going to be hard and time consuming. A wipe and reload will take less time and resolve any software issues 99% of the time.
0
thecomputerplaceAuthor Commented:
Thank you all for the info,  that about clears things up for me,
Here are the 3 most recent bluescreen dumps from a machine that pushed me to ask this question.
I ended up replacing the ram on this one even though it tested good with windows memory diag.
I couldn't find anything definitive that the bluescreens pointed to - do you see anything different on those?

 032811-16567-01.dmp 032811-28594-01.dmp 032411-14274-01.dmp
0
ded9Commented:
Thanks Working on it


Ded9
0
ded9Commented:
First Dump points to  hardware module and X64_0x124_GenuineIntel_PROCESSOR_BUS. Something to do with processor bus(Processor).


Second Dump points to ntfs.sys.

Third Dump  points to ntfs.sys but give more info. Error 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s


First dump clearly points to hardware the rest point to ntfs. I think full dump might help to debug these issue.

But first check the computer hardware specially the processor.


Ded9
0
BCipolloneCommented:
Could just be the chipset drivers
0
BCipolloneCommented:
Wish I could edit. Check you bios version as well.
0
thecomputerplaceAuthor Commented:
ded9 - On that system we replaced the ram with known-good and it's been 3 days now with no bluescreens.  They were previously getting at least 1 a day.  Hopefully that resolved it but if not i've never run across a way to test processors, do you recommend something?

BCipollone - i updated the chipset drivers and bios and we still had a bluescreen after...thanks for the input.
0
ded9Commented:
Crash dump accuray is not 100% but the first dump points to hardware. So better run hardware test to find out whether its hardware issue.

ntfs.sys in dump basically indicates the dump was not captured properly.

I know about dell computer ...dell has a dell diagnostic tool which can run a test on processor.

I think UBCD has tools to find out whether its hardware issue.
www.ultimatebootcd.com/ 



Ded9
0
thecomputerplaceAuthor Commented:
Thank you all for the input, very good advice on how to approach bluescreens.
Glad to have posted.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Vista

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.