Link to home
Start Free TrialLog in
Avatar of chrismanncalgavin
chrismanncalgavin

asked on

2 PCs Crashing Always Early Morning!

I have a baffling issue.
At our company we have 2 PC's which are located in the factory (ground floor office).
Both are completely different machines, one is an AMD Athlon 2600+, 512mb RAM, 40GB HDD. The other is a Dell Precision 380 workstation Pentium D 930, 1GB RAM, 200GB HDD which is quite new.
Both running XP Professional and have been totally overhauled (chkdsk, defrag, win updates, driver updates, internal cleanup, hardware diagnostics).
What we have is both machines seem to crash without fail nearly every morning, and ONLY in the morning (before 8:00).
On the AMD machine it happens after the user logs on, the machine gives blue screen dump and reboots. On the Dell machine, it happens either at the logon, or when left logged on for a couple of minutes.
I have been unable to recreate the crash on either machine throughout the day, have rebooted at least 20 times on each and run lots of diagnostics on RAM, HDD etc.
As a precaution I have placed surge protectors on both machines, and also a MAX - MIN thermometer in the room as I wonder if this is a temperature issue. The temperature does not seem to get below 10 degrees C.

Any ideas? Or is it just an irritating coincidence?
Avatar of rhutzel
rhutzel

Does both machines have similar or same antivirus software or maybe a peice of software that needs to run when the computer is booted once daily?  If they only crash once, assuming the first time they are booted.  There must be some kind of link to an event that is trying to fire up like a "scan hard drive for a virus" or something of that nature.

Could you give the error code that he blue screen give (Like: 0x00000000, blah, blah).  There are often sentences also on the screen some where disclosing the type of error like:  THIS_IS_AN_ERROR except it will say what kind it is.

You can keep the system from auto rebooting by right clicking on the my computer icon and going to properties, then the advanced tab.  Click on the Startup and recovery settings button.  Under system failure subject, uncheck "automatically restart."

I would tell you to disable stuff, but I would like to know what that blue screen is saying first.

Ryan
Good points rhutzel.  I would also like to add that you could check both systems for identical error logs.  This can help narrow down whatever the common problem is.

-Keola
Avatar of Mark Poirier
Go into the control panel and click on-->Administrative tools--> event viewer-->system
In here check for errors that correspond to  the time of startup. Post them here.
The BSOD error would be very helpful also.
This just start happening?

10C is at the bottom edge of some components operating range.  If the machines are being turned on cold it could be a problem that clears as powering up heats things up just enough that the 2nd run is good.  If they are left running all night they probably would be warm enough inside even at 10C room temperature.

Factories can be nasty for dirty power.  A surge protector will do nothing to clean up power it only protects against big spikes.  It is possible that a particular machine or combination of machines kicking on/warming up could be sending out some crap on the building power circuits.  To counter this sort of problem you need to look at a power conditioner or line conditioner,  true online UPS's also isolate equipment properly.  These cost a lot more than surge protectors but are worth it.

Here's an example,
http://www.tripplite.com/products/conditioners/index.cfm

Monster power also have some 'clean power' power bars that would work.  Dirty power is a big issue for guys with home theatre stuff anybody that sells home theatre probably sells power conditioners.

Anyway a couple of things to consider.  There must be some commonality here.  Just try and change only 1 thing at a time so that you can find the actual problem and not just have it disappear without figuring it out.
i would check the power for that room - can you hook on a monitor? or check the voltage during the period the crash occurs. It seeems related to the environment, and prime suspect id the AC power too low or too high
...or a grounding problem; have that checked too
Avatar of chrismanncalgavin

ASKER

Thanks everyone so far, some interesting comments!

Ok, both machines have certain software in common that we have installed on the other 13 machines on the network. This is:
-Symantec Antivirus Corporate Edition 10.1
-Microsoft ISA Firewall Client
-Microsoft Office 2003 Pro
We have had none of the explained problems on the other 13 machines (4 of which are identical to PC2).
Symantec updates the virus definitions on the server machine at 8:00 every day, and runs a scheduled full scan every friday at 12:00.

The first machine (Pentium D) has displayed several differing blue screens. See the following:
SYMTDI.SYS
STOP: 0x00000050
PAGE_FAULT_IN_NONPAGED_AREA
and also STOP: 0x0000008E (0xC00000005, 0x805AFD9E, 0xB99C0BA0, 0x00000000).

The second machine (AMD 2600+) suffers the following errors:
-System Event Log shows the following before rebooting 26 Application popup:  : Machine Check: Regs
-After reboot shows the following: The computer has rebooted from a bugcheck.  The bugcheck was: 0x100000be (0xcdb28808, 0x07462121, 0xf2783524, 0x0000000b). A dump was saved in: C:\WINDOWS\Minidump\Mini120106

I have saved the system and app logs for both pcs.
PC1: http://www.auwi44.dsl.pipex.com/pc1applog.csv
        http://www.auwi44.dsl.pipex.com/pc1systemlog.csv
PC2: http://www.auwi44.dsl.pipex.com/pc2applog.csv
        http://www.auwi44.dsl.pipex.com/pc2systemlog.csv

Have asked the users to leave both machines on overnight for a day, and to check if problem occurs. This will at least rule out the temperature issue.
I don't think it is power related, but may be wrong. I've been told that the machines we use in the factory are on and off all day long and use a different circuit, so should not interfere, but you never know. Unfortunately I don't have a means of logging the voltage to each computer etc!

check it at a 15' interval during the period it happens then !
Nobus,

I do not work during the time it happens, but can arrange to come in early. Although I don't believe seeing the error will make any difference, as I can still see the logs.
SOLUTION
Avatar of fallenknight308
fallenknight308

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
To clear the enviromental issues move the machines out of theer current enviroment one morning and have the users sigb in there and see if the issue still happens.

This should then point you to either a enviromental or machine issue. If it turns out to be a enviroment look for large motors starting i.e. in production lines or large power supplies nearby .

If it is a machine issue consider going into Msconfig and disabling all startup items and see what happens then.
For the first machine with the
SYMTDI.SYS
STOP: 0x00000050
PAGE_FAULT_IN_NONPAGED_AREA
and also STOP: 0x0000008E (0xC00000005, 0x805AFD9E, 0xB99C0BA0, 0x00000000).
error, I found an interesting tidbit on the symantic website. This is in reference to searching for SYMTDI.SYS on the symantic website.
http://service1.symantec.com/SUPPORT/nip.nsf/0/108e3e6dbf49f33a88256dc700778e2a?OpenDocument
They don't have a fix for it though.

The 2nd machine with the 0x000000BE error

0x000000BE: ATTEMPTED_WRITE_TO_READONLY_MEMORY
(Click to consult the online Win XP Resource Kit article.)
A driver attempted to write to read-only memory. Commonly occurs after installing a faulty device driver, system service, or firmware. If a driver file is named in the error message, try to correct the problem by disabling, removing, or rolling back the driver.

In conjunction with the Applogs you posted for this machine, I would suspect a problem with Norton on both these machines.
How you want to go about getting Norton to behave is up to you. I personally have no use for symantic products as I find them very intrusive and hard to fix when something is wrong.
You may want to remove these computers from the network and then remove Norton to test. It may have something to do with the antivirus scheduling program as it happens at a set time.
Ok an update finally!

I have left both machines on overnight for about a week now.
To begin with no problems, but the first PC (AMD 2600+) crashed sometime during the day.
I also came in early myself on a cold morning, and both machines failed to startup. Only after leaving them in the crashed state for a while (blue screen on one, black on the other), they would finally boot and load windows.

So it seems the Dell Precision 380 has not crashed since being left on all the time, but the AMD has.
Could it be lasting damaged caused to the hard drives ? How can they both be failing?

I have put a new hard drive in the AMD PC and ghosted the old one onto it. Seems to work fine!
Not done this on the Dell machine yet as i'm not sure about the warranty, and possibly voiding it. I would prefer to use my own preference of hard drives though. I've never had a problem with Seagate drives, and the drives in these 2 pcs are Samsung and Western Digital.
The BSODs along with the testing you have done with leaving them in a running state( less temperature swings) certainly points to what you are suspecting with the hard drives. The temperature changes can make the mechanics of the drive work much harder and shorten its life due to the extra stress. As far as the BSODs , if the hard drive s were failing the possibility of BSODs are very real.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks again,

Just as I thought I was onto something, the AMD machine crashed again with the new hard drive in it!
Have requested an onsite engineer for this machine on warranty. Have had no end of problems with this particular manufacturer. (Evesham Technology cough cough).
The machine in question has had a new motherboard, RAM and harddisk since we had it! And the other 4 machines we brought with it as well!
I suppose it must be CPU or motherboard faulty ? Have tested the RAM with several RAM testers, no errors.

Not sure what to do with the Dell machine, they told me to reformat but will be difficult with the user needing the PC 24/7.
May have to buy second hard drive for it and Ghost onto it while I reformat and reinstall.
When the old Home computig initiative was going , my employer hooked up with evesham to run the scheme ( against IT's advice I may say) and it has been an unmitigated disaster. Doesnt add anything to the case but just thought I would have a bitch.

If the machines are suffering from cold envioronment conditions it is one of those rare occassions where I would advise leaving them on constantly at least they will not suufer thermal shock when they are turned on after they have been off for a while.  
Ok thanks everyone.

Will award points to those I feel gave the most useful answers towards eliminating possible problems.
Don't have a firm answer so may have to open a new question, as it's proving difficult.