Hardware diagnostics apps for memory / processor

I am having intermittent bluescreens on my computer and I'm trying to get some apps that I can use to diagnose exactly where the fault is.  I've used the MS memory tester, which showed no errors and I've also used "Hot CPU Tester" which showed an error in the "Complex Matrix" test (Error:CPU 0: Checksums do not match).  Ideally, I want some software which will actually tell me what caused the error (Processor, memory etc) so that I can give this information in as proof for a warranty claim.

I don't mind paying for the software, and I'd prefer it if the software ran outside of windows so that the hardware company can't come back to me and say it's a driver / windows error rather than a hardware problem.

Thanks in advance.
LVL 2
PsychotextAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

brokegeniusCommented:
This is just a partial answer:
http://www.memtest86.com/

memtest86 will OFTEN find information/errors that other programs won't, just like you said, it is NOT windows based

for general information, because retailers often sell ram that is not up to advertised specs:
http://www.cpuid.com/

Cpu stuff:
http://www.benchmarkhq.ru/english.html?/be_cpu.html

Good luck!
0
kode99Commented:
Here's another memory tester,
  http://www.simmtester.com/

Like memtest86 this boots and runs on its own.  I would try both and see if the results are the same.  If the memory fails these tests it is most likely going to be isolated to a memory problem.

Also if you have more than one stick of memory try using the system with one at a time - or even better if you could test them in another machine to isolate your problem.

If the memory does not show any problem something else you could try would be to backup your system and do a fresh install.  If it is any type of software problem that would clear it - if it does not you then have pretty compelling claim as if a clean install does not work thier is obviously a problem with some part of the hardware.  Assuming this is a basic system with no funky stuff going on, so no overclocking or non-typical hardware etc.
0
PsychotextAuthor Commented:
Just got through two full passes of memtest with zero errors whatsoever.  The reason I think it's hardware related is that the system was running stable for about 18 months and then started bluescreening on all sorts of different things with no apparent reasons.  Windows error reporting has told me that the cpu has reported a hardware error, the memory address for the app was corrupted and that it suspects faulty drivers (All over a number of different bluescreens).  Have clean installed and still getting bluescreens (Even got them just after installing on one of the attempts, before I had even added the driver packs).  System is not overclocked, very well cooled and uses top of the line hardware.

I'm still suspecting a motherboard / cpu problem as I just can't get the memory to fail.
0
How do you know if your security is working?

Protecting your business doesn’t have to mean sifting through endless alerts and notifications. With WatchGuard Total Security Suite, you can feel confident that your business is secure, meaning you can get back to the things that have been sitting on your to-do list.

Paul SDesktop Support Manager / Network AdministratorCommented:
i use this cd for hardware tests.

http://www.ultimatebootcd.com/

everything you need on 1 disc.

you should list all the stop errors you've got so we can tell you whats wrong.

if the cpu is bad it is hard to test. i would just put it into another system to see what happens. sounds like your mobo or the cpu is bad.
0
nedvisCommented:
Here is the list of most frequently used and most popular hardware diagnistic utilities .
It's a Google list of "most wanted" apps :
http://directory.google.com/Top/Computers/Software/Shareware/Diagnostics
   
 good luck
nedvis
0
PsychotextAuthor Commented:
The_Computer_Guru_777: Are these stop errors stored anywhere?  I have a ghosted backup of the system prior to rebuild that I can get them out of.  Nothing in my event log after the rebuild as I cleared it this morning.
0
PsychotextAuthor Commented:
I'm now 99% sure it's not a memory issue.  Haven't been able to get a single failure on memtest, MS memory test or Docmemory.

Ultimate Boot CD is pretty useful, thanks, although there's not much in the way of cpu related testing on it.  Funny really, you'd think that Intel / AMD would have diagnostic utilities for their chips.  I'm going to have to go through the google / benchmarkhq lists to see if any of those apps are more helpful.
0
brokegeniusCommented:
#1, in the future, please always post your Operating System/any upgrades with any hardware questions, just helps us help you.

memtest is THEE #1 pplication for memory testing, while it never hurts to try multiple apps, rest assured that memtest is one tool to keep and bookmark for ever.

honestly, one problem could be a BAD CD....a cd with scratches or defective will sometimes install, but you won't know that it was the installation itself that cause the problem

#2 one thing you could check, but not always helpful is (hoping you're on xp)
1)open up MY COMPUTER
2)right click in the UPPER LEFT corner, directly on the Computer icon
3)choose MANAGE
4)on the left hand side click the PLUS sign next to EVENT VIEWER....there are 3 different categories, you might be able to track it down via those error messages
0
PsychotextAuthor Commented:
Ok, well I'm running XP SP2 (With all patches as of 27/02/2005).  Hardware is as follows:

Asus A7N8X Rev 1.04 (Tried latest standard and beta bios),
AMD XP 2800+ (Not overclocked),
1GB Corsair Twinx XMS3200 Dual Channel DDR-SDRAM (Cas 2, Ras to Cas Delay 2, Ras Precharge 2, Cycle Time 6),
ATI Radeon 9800+,
Western Digital Raptor 36gb HDD.

Actually considered the bad cd and tried another, but with the same results after the install (I've spent a lot of time on this!).  Get no errors in the event logs, other than when I get the bluescreen (But I don't have any right now as I cleared the logs this morning to make them easier to look through).
0
PsychotextAuthor Commented:
(That's 27th Feb 2005 for those confused by UK date style!)
0
sciwriterCommented:
<< The reason I think it's hardware related is that the system was running stable for about 18 months and then started bluescreening on all sorts of different things with no apparent reasons. >>

Funny, that would make me suspect that it is a windows problem, typical symptoms of windows going bad.

SP2 is a potential problem, an uninstall is often needed.  Another thing that can totally hang a system is a bad CD/R or disks in the CD/DVD drive that are bad and cannot be "seeked" by windows explorer.  Short of that, you might be looking at some significant windows debugging, not hardware.

Have you checked the CPU temp. with the ASUS monitor?  If it goes over 55 degrees, especially over 60, start getting worried.  Have you checked the PS -- temp swap in a different one?
0
PsychotextAuthor Commented:
I'd have thought the same thing, but even after the clean rebuilds it would bluescreen, both before / after sp2 and before / after installing the driver packs.

I used the "Mersenne prime" application on the Ultimate Boot CD last night.    After five hours and one minute it failed with "FATAL ERROR: Rounding was 0.499..... expected less than 0.4.  Hardware failure detected, consult stress.txt".  Haven't looked up the exact meaning of that yet.

CPU never gets above 55 Celsius at full load (Case temp never higher than 25 Celsius).  Voltages are as follows (Low / High / Average / Max Percent Outside Target Voltage):

Core 0: 1.60v / 1.68v / 1.65v / 3.1%
Core 1: 1.60v / 1.68v / 1.65v / 3.1%
+3.3: 3.25v / 3.30v / 3.27v / 1.5%
+5.0: 4.78v / 4.81v / 4.79v / 4.6%
+12.00: 11.55v / 11.61v / 11.58v / 3.8%
-12.00: -12.13v / -12.07v / -12.08v / 1.0%
-5.00: -5.09v / -5.06v / -5.07v / 1.8%

All voltages are within 4.6% (or better) percent of target, but then I'm not sure how tight they should be.

XP 2800+ (Barton) Specs:
Nominal Voltage: 1.65v
Max Die Temp: 85 Celsius
0
cpc2004Commented:
Attach the 3 to 4 minidumps at any webspace. I will process the dump and find which hardware component is faulty. You can find the minidump at the folder \windows\minidump
0
PsychotextAuthor Commented:
cpc2004: www.tacticaladvantage.co.uk/minidump.zip

The dump files in that zip are all the ones from the most recent clean build of XP (The one I'm working on now).  Thanks.
0
cpc2004Commented:
Three minidumps reports there have memory corruption. 2 minidumps with 4 errors and 1 minidump with 16 errors. The memory corruption is caused by the faulty motherboard.

Mini022605-01.dmp D1 (9ed6162c, 00000002, 00000000, f7b10831)

CHKIMG_EXTENSION: !chkimg -lo 50 -db !usbohci
4 errors : !usbohci (f7b10822-f7b1083a)
f7b10820  c1  f6 *16  02  74  13  f6  40  0c  01 *8b  0d  8b  45  14  80 ....t..@.....E..
f7b10830  4f  02 *0c  83  48  10  01  eb  03  8b *82  14  80  67  02  1f O...H........g..

MODULE_NAME:  memory_corruption
IMAGE_NAME:  memory_corruption
FOLLOWUP_NAME:  memory_corruption
DEBUG_FLR_IMAGE_TIMESTAMP:  0
MEMORY_CORRUPTOR:  STRIDE
STACK_COMMAND:  kb
FAILURE_BUCKET_ID:  MEMORY_CORRUPTION_STRIDE
BUCKET_ID:  MEMORY_CORRUPTION_STRIDE

Mini022405-02.dmp D1 (9b84e474, 00000002, 00000000, f7ae8831)
CHKIMG_EXTENSION: !chkimg -lo 50 -db !usbohci
4 errors : !usbohci (f7ae8822-f7ae883a)
f7ae8820  c1  f6 *16  02  74  13  f6  40  0c  01 *8b  0d  8b  45  14  80 ....t..@.....E..
f7ae8830  4f  02 *0c  83  48  10  01  eb  03  8b *82  14  80  67  02  1f O...H........g..

MODULE_NAME:  memory_corruption
IMAGE_NAME:  memory_corruption
FOLLOWUP_NAME:  memory_corruption
DEBUG_FLR_IMAGE_TIMESTAMP:  0
MEMORY_CORRUPTOR:  STRIDE
STACK_COMMAND:  kb
FAILURE_BUCKET_ID:  MEMORY_CORRUPTION_STRIDE
BUCKET_ID:  MEMORY_CORRUPTION_STRIDE

Mini022505-03.dmp 0A (f104e9b8, 00000002, 00000001, 804e2b65)

CHKIMG_EXTENSION: !chkimg -lo 50 -d !nt
    804e2d64-804e2d67  4 bytes - nt!KiServiceTable+44
      [ 77 87 56 80:30 4b 3e f1 ]
    804e2df4-804e2df7  4 bytes - nt!KiServiceTable+d4 (+0x90)
      [ 62 f2 57 80:f0 46 3e f1 ]
    804e2ed0-804e2ed3  4 bytes - nt!KiServiceTable+1b0 (+0xdc)
      [ 04 3c 57 80:70 44 3e f1 ]
    804e2f44-804e2f47  4 bytes - nt!KiServiceTable+224 (+0x74)
      [ 4d 49 57 80:50 4c 3e f1 ]
16 errors : !nt (804e2d64-804e2f47)

MODULE_NAME:  memory_corruption
IMAGE_NAME:  memory_corruption
FOLLOWUP_NAME:  memory_corruption
DEBUG_FLR_IMAGE_TIMESTAMP:  0
MEMORY_CORRUPTOR:  LARGE
STACK_COMMAND:  kb
FAILURE_BUCKET_ID:  MEMORY_CORRUPTION_LARGE
BUCKET_ID:  MEMORY_CORRUPTION_LARGE

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
PsychotextAuthor Commented:
Interesting, thanks for that.
0
sciwriterCommented:
Still, I would at least install something basic like win98se -- maybe even DOS?? -- and run it for a day.  Wouldn't you be surprised if 98 never froze, but XP did, right?
0
PsychotextAuthor Commented:
I've tried running the DOS mode (and linux mode) diagnostic apps and so far I haven't made it past 8 hours of testing.
0
cpc2004Commented:
It is a hardware error. There has no sharewre can diagnostic which hardware is faulty. Windows use special coding to make use cache memory and it is different to DOS mode program.  Hence most faulty CPU and m/b can pass memest utility. Only the PC computer manufacturer such as IBM notwebook has built-in utility to diagnostic hardware. Your PCs crashes due to hardware problem and the minidumps have the snap shot when the hardware occurs. It is the cache memory problem either in the m/b or cpu.  According to my experience 70% is faulty m/b and 30% is the faulty CPU.
0
PsychotextAuthor Commented:
It would have to happen in the only part that's not in warranty!  Ok, thanks all. I've got to work out a fair way of awarding the points on this one which I'll try to do later today.
0
cpc2004Commented:
The analyze report of microsoft windbg reports the hardware error such as memory corruption. It is best tool and you don't have to spend money to buy the software.

Sample hardware message from Windbg
MEMORY_CORRUPTION_STRIDE
MEMORY_CORRUPTION_LARGE
MEMORY_CORRUPTION_ONE_BIT
MEMORY_CORRUPTION_ONE_BYTE
TWO_BIT_CPU_CALL_ERROR
INTEL_CPU_MICROCODE_ZERO
0
PsychotextAuthor Commented:
Ok... I think that's about as fair as I could make it.  Ideally I would have opened up a new 500 point question for CPC for going above and beyond the call of duty; but apparently that's not allowed. Sorry.

Good answers, thank you very much.
0
brokegeniusCommented:
no, you can and should open a NEW question when wandering from the original post this happens all of the time:)

I have low points, but do this too:)
Cheers
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Components

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.