Link to home
Start Free TrialLog in
Avatar of aphuk
aphuk

asked on

Different error at each reboot - Blue Screen and sometimes NO SCREEN AT ALL

PC is a DELL GX270 (service tag 856M11J)
It connects via its Serial port to a PLC (Programmable Logic Controller) which is controlling a desalination process.

512Mb RAM
80Gb Maxtor IDE HD
P4 2.8Ghz
XP Pro 2002 SP1

It has been working fine for at least 10 months and since it was installed it has not been changed in any way (no additional software).

PC is not on a LAN

PC is not connected to the Internet at any time

It is a standalone PC running a single application.

In the last few weeks the PC has been freezing ( the mouse and keyboard will not work/respond).
A hard reboot (power off) is the only way out.
The frequency of the freeze has been increasing to the point where it occurs three or four time a day and each time a hard reboot is required.

The PC is now disconnected from the PLC and is exhibiting the following behaviour.

Each time you reboot the PC you get a different error message (pick from any of the following)

...PAGE_FAULT_IN_NON_PAGED_AREA...

...Local DLLs needed for HAL...

...IRQL_NOT_LESS_OR_EQUAL...

....classpnd.sys...

etc, etc.

There is absolutley no consistent error message and I have tried putting the PC in SAFE mode and DEBUGGING mode but in either case Windows starts and you get a desktop but as soon as you try to do anything the PC shuts down without warning (blackscreen)
*********************************
This morning I got the DELL XP CD out and booted from the CD. I started off the OS install and everything was going fine and then the PC halted and went to a BLACKSCREEN with no error message whatsoever. I tried this twice more whilst monitoring the installation messages to see if it was always halting at the same place but each time it was at a different point.
What I call BLACKSCREEN is the equivalent of the power going off (although that is definitely not the case). During the OS re-install, at the point where it stops you actually hear the CD stop spinning!!
*********************************
Tests....
I have carried out hardware checks which all report that the Hard Drive is fine.
I have attached the hard drive to another PC and I can see all the data on the drive without any problem.

*********************************
Finally, all through this text I have spoken about 1 (one) PC. In actual fact there are another 3 , all displaying very similar behaviour. In all respects they are individual machines and are not physically connected in any way whatsoever (inc WiFi).

Since June 2 when the first PC had problems the other three have also started freezing. The PCs are known as A,BC & D and at this point in time A & D cannot be started at all. If you switch the power on they get as far as the ENTER BIOS screen and as soon as that clears the screen goes blank.

Does any one have any idea what may be happening here ?




Avatar of nobus
nobus
Flag of Belgium image

I would start testing the ram with memtest86+ from :    http://www.memtest.org/

another thing to try is swapping the power supply

It sure looks like something hard, since you got it while booting from a cd too.

you can also diagnose by disconnecting devices, or disabling them in the bios.



I suppose it's some kind of hardware failure, too.

Test the memory as 'nobus' proposed. If the program will not show memory failures, clean the computers from dust. Maybe they get overheated. Check especially the front cover if there's dust. I have some Optiplex computers here and after a year I have a cloud of dust under the front cover.

If that all will not solve the problem... Do you have still warranty on the workstation?
Are the hard disk connected via scsi, please check the scsi controleer
Avatar of aphuk
aphuk

ASKER

>> JBlond
Machines are very clean as they are in a conditioned environment (IP65). Temperature is controlled and there is minimum dust.

>>1stITMAN  asked <Are the hard disk connected via scsi>
No, HD controller is IDE

>>nobus
Will download and try the memtest utility and let you know.
Will also try LINUX via knoppix

to All
Forgot to mention...
When I tried to take a Ghost image from one of the PC's I got the message
< Encountered NTFS volume with a logfile that has not been flushed (536)
NTFS problem detected (1969) >
well do the memtest anyway (good practice), then if you can still boot, do a scan disc for errors on the drive.
Try doing a disk check boot with win2k cd and then recovery console and then type chdsk /r
As you said that the problem appeared on four independent computers on the same date...
Did something special happen on that day? A voltage peak maybe...?

If the computers are still within the period of the guarantee, I would contact Dell and let them check at least one PC. I suppose there's something wrong with the mainboard, the CPU or the PSU.

I don't think that the RAM is faulty because then the PCs should show a different behaviour and not all the same.

Maybe a capacitor in the PSU or on the mainboard got damaged by a voltage peak or another reason, but it will function for some time. That would explain why the problem get worser since it happend the first time, because capacitors can loose function stealthy (hpe this is the right word)...
When Windows crashes with blue screen, it writes a system event 1001 and a minidump to the folder \windows\minidump. Check system event 1001 and it has the content of the blue screen

Control Panel -> Adminstrative Tools -> Event Viewer -> System -> Event 1001. Copy the content and paste it back here. Zip 5 to 6 minidumps and attach the zip files at any webspace. I will study the dump and find out the culprit.
Avatar of aphuk

ASKER

>>1stITMAN
Have used the DELL utility partition diagnostics and also the BIOS utility and both report the HD is fine. Have also used put the hard drive into another working PC and it appears intact (all the folders are there and they are all browsable and copiable).
 I do not have 2K CD and I do not know what Recovery Console is or how to start it?

>>JBlond
The problems occurred around the same period but not on the same day. PC "A" was the first to have a problem on June 2nd and it was the first to ramp up its failure rate to the point where it will now not boot at all. The other three (B,C & D) akk followed within a week and each one is failing at its own 'sweet' rate such that yesterday "C" gave up the ghost so I amnow down to 2 PC's !!HELP The town is going to end up with no fresh water....

>>cpc2004
Both A & C are inaccessible (can't get them to boot) but I did take a copy of the contents of the ROOT (C:\) and all the folders. I can search for the files which contain windows events. What name will it be under?


You can find the minidumps at the folder \windows\minidump.
Avatar of aphuk

ASKER

I have put six files at:

http://uk.f1.pg.briefcase.yahoo.com/bc/aph_home/lst?.dir=/Public&.order=&.view=l&.src=bc&.done=http%3a//uk.f1.pg.briefcase.yahoo.com/

All the files have had their extensions changed from '.dmp' to '.jpg' to fool Yahoo that they are pictures so that they are visible for downloading.

You can right click on any file and select 'Save Target as'.

Thanks and good luck
One minidump is inaccessible which is the symptom of hardware error. One minidump was crashed with one bit memory corruption. One minidump was crashed with CPU_CALL_ERROR. I believe that it is caused by faulty mother board.

Mini060205-01.dmp
POSSIBLE_INVALID_CONTROL_TRANSFER:  from bf8151a9 to bf953f60
FOLLOWUP_NAME:  MachineOwner
MODULE_NAME:  hardware
IMAGE_NAME:  hardware
DEBUG_FLR_IMAGE_TIMESTAMP:  0
STACK_COMMAND:  .trap fffffffff8045b20 ; kb
BUCKET_ID:  CPU_CALL_ERROR
Followup: MachineOwner
---------
 *** Possible invalid call from bf8151a9 ( win32k!ValidateSmwp+0x16 )
 *** Expected target bf953f60 ( win32k!HMValidateHandleNoSecure+0x0 )



Mini063005-01.dmp
STACK_TEXT:  
WARNING: Frame IP not in any known module. Following frames may be wrong.
f8953f08 f864f154 82279978 82279a40 82279978 0x78648380
f8953f1c f864f10a 82279a40 82279a24 f8953fa0 serial+0x7154
f8953f40 f86522b2 82279a40 82279a24 f8953fa0 serial+0x710a
f8953f6c f864f8a2 82279a40 82279a24 f8953fa0 serial+0xa2b2
f8953f94 f864adbf 82279978 00000000 f864ad02 serial+0x78a2
f8953fd0 80532169 82279b64 82279978 00000000 serial+0x2dbf
f8953ff4 80531e3a ef920d44 00000000 00000000 nt!KiRetireDpcList+0x46


FAILED_INSTRUCTION_ADDRESS:
+78648380
78648380 ??               ???

POSSIBLE_INVALID_CONTROL_TRANSFER:  from f864f14f to f8648380
SINGLE_BIT_ERROR:  1
TWO_BIT_ERROR:  1
FOLLOWUP_NAME:  MachineOwner
MODULE_NAME:  hardware
IMAGE_NAME:  hardware
DEBUG_FLR_IMAGE_TIMESTAMP:  0
STACK_COMMAND:  .trap fffffffff8953e98 ; kb
BUCKET_ID:  SINGLE_BIT_CPU_CALL_ERROR

Followup: MachineOwner
---------
 *** Possible invalid call from f864f14f ( serial+0x714f )
 *** Expected target f8648380 ( serial+0x380 )
ASKER CERTIFIED SOLUTION
Avatar of cpc2004
cpc2004
Flag of Hong Kong image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of aphuk

ASKER

cpc2004 and all who participated - Thank You

To cpc2004 :

It is the obvious question, how did you translate the minidumps into meaningful English?
Let me know if you need further help of this problem
Avatar of aphuk

ASKER

cpc2004

I got all the help I needed here. As far as pursuing the actual failure with DELL. I posted in their various tech support forums (same time as my posting here) and, 3 months on, I have still to receive any response whatsoever. Must have been too difficult for them, I guess!

Thanks for checking back. Would still love to know how you translated the minidumps tho'?