2 PCs Crashing Always Early Morning!

I have a baffling issue.
At our company we have 2 PC's which are located in the factory (ground floor office).
Both are completely different machines, one is an AMD Athlon 2600+, 512mb RAM, 40GB HDD. The other is a Dell Precision 380 workstation Pentium D 930, 1GB RAM, 200GB HDD which is quite new.
Both running XP Professional and have been totally overhauled (chkdsk, defrag, win updates, driver updates, internal cleanup, hardware diagnostics).
What we have is both machines seem to crash without fail nearly every morning, and ONLY in the morning (before 8:00).
On the AMD machine it happens after the user logs on, the machine gives blue screen dump and reboots. On the Dell machine, it happens either at the logon, or when left logged on for a couple of minutes.
I have been unable to recreate the crash on either machine throughout the day, have rebooted at least 20 times on each and run lots of diagnostics on RAM, HDD etc.
As a precaution I have placed surge protectors on both machines, and also a MAX - MIN thermometer in the room as I wonder if this is a temperature issue. The temperature does not seem to get below 10 degrees C.

Any ideas? Or is it just an irritating coincidence?
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

kode99Connect With a Mentor Commented:
Running anything close to the limits will simply shorten its life in general.   You said the Dell was new so it is likely that it is fine as its amount of exposure is less.  Sure the drive could fail sooner than 'normal' but it is likely not a big deal as long as you don't go back to the 'cold boots'.  Probably about the best thing would be to run some diagnostics on it and maybe keep an eye on it via smartdrive.  In any case if it does start to act up the hard drive would likely be a good place to start checking.

Something like this maybe,  
allow you to see how cold it gets in the middle of the night even. You might want to verify for sure what the PC's internal temperatures are and also be sure that the temperature in the room does not go much lower than 10 C as even a running system will have problems as you head on down in temperature.

There are also lots of freeware utilities to look at disk smart info.

The AMD machine was how old?  
Let's face it drive failures under normal circumstances are not unheard of after only a year or two (depending on model/brands somewhat).  So combine the less than idea startups it likely pushed it over the edge.

I like the good old SpinRite for putting a disk through its paces also,

Don't sweat the Dell warranty.  You probably are voiding it by running a machine so cold ;).  I favor seagate and WD raptor's myself.
Does both machines have similar or same antivirus software or maybe a peice of software that needs to run when the computer is booted once daily?  If they only crash once, assuming the first time they are booted.  There must be some kind of link to an event that is trying to fire up like a "scan hard drive for a virus" or something of that nature.

Could you give the error code that he blue screen give (Like: 0x00000000, blah, blah).  There are often sentences also on the screen some where disclosing the type of error like:  THIS_IS_AN_ERROR except it will say what kind it is.

You can keep the system from auto rebooting by right clicking on the my computer icon and going to properties, then the advanced tab.  Click on the Startup and recovery settings button.  Under system failure subject, uncheck "automatically restart."

I would tell you to disable stuff, but I would like to know what that blue screen is saying first.

Good points rhutzel.  I would also like to add that you could check both systems for identical error logs.  This can help narrow down whatever the common problem is.

[Webinar] Improve your customer journey

A positive customer journey is important in attracting and retaining business. To improve this experience, you can use Google Maps APIs to increase checkout conversions, boost user engagement, and optimize order fulfillment. Learn how in this webinar presented by Dito.

Go into the control panel and click on-->Administrative tools--> event viewer-->system
In here check for errors that correspond to  the time of startup. Post them here.
The BSOD error would be very helpful also.
This just start happening?

10C is at the bottom edge of some components operating range.  If the machines are being turned on cold it could be a problem that clears as powering up heats things up just enough that the 2nd run is good.  If they are left running all night they probably would be warm enough inside even at 10C room temperature.

Factories can be nasty for dirty power.  A surge protector will do nothing to clean up power it only protects against big spikes.  It is possible that a particular machine or combination of machines kicking on/warming up could be sending out some crap on the building power circuits.  To counter this sort of problem you need to look at a power conditioner or line conditioner,  true online UPS's also isolate equipment properly.  These cost a lot more than surge protectors but are worth it.

Here's an example,

Monster power also have some 'clean power' power bars that would work.  Dirty power is a big issue for guys with home theatre stuff anybody that sells home theatre probably sells power conditioners.

Anyway a couple of things to consider.  There must be some commonality here.  Just try and change only 1 thing at a time so that you can find the actual problem and not just have it disappear without figuring it out.
i would check the power for that room - can you hook on a monitor? or check the voltage during the period the crash occurs. It seeems related to the environment, and prime suspect id the AC power too low or too high
...or a grounding problem; have that checked too
chrismanncalgavinAuthor Commented:
Thanks everyone so far, some interesting comments!

Ok, both machines have certain software in common that we have installed on the other 13 machines on the network. This is:
-Symantec Antivirus Corporate Edition 10.1
-Microsoft ISA Firewall Client
-Microsoft Office 2003 Pro
We have had none of the explained problems on the other 13 machines (4 of which are identical to PC2).
Symantec updates the virus definitions on the server machine at 8:00 every day, and runs a scheduled full scan every friday at 12:00.

The first machine (Pentium D) has displayed several differing blue screens. See the following:
STOP: 0x00000050
and also STOP: 0x0000008E (0xC00000005, 0x805AFD9E, 0xB99C0BA0, 0x00000000).

The second machine (AMD 2600+) suffers the following errors:
-System Event Log shows the following before rebooting 26 Application popup:  : Machine Check: Regs
-After reboot shows the following: The computer has rebooted from a bugcheck.  The bugcheck was: 0x100000be (0xcdb28808, 0x07462121, 0xf2783524, 0x0000000b). A dump was saved in: C:\WINDOWS\Minidump\Mini120106

I have saved the system and app logs for both pcs.
PC1: http://www.auwi44.dsl.pipex.com/pc1applog.csv
PC2: http://www.auwi44.dsl.pipex.com/pc2applog.csv

Have asked the users to leave both machines on overnight for a day, and to check if problem occurs. This will at least rule out the temperature issue.
I don't think it is power related, but may be wrong. I've been told that the machines we use in the factory are on and off all day long and use a different circuit, so should not interfere, but you never know. Unfortunately I don't have a means of logging the voltage to each computer etc!

check it at a 15' interval during the period it happens then !
chrismanncalgavinAuthor Commented:

I do not work during the time it happens, but can arrange to come in early. Although I don't believe seeing the error will make any difference, as I can still see the logs.
fallenknight308Connect With a Mentor Commented:
(Power is a subject near and dear to my heart, if nobody could tell yet)
Well, enough of that.

chrismann: temp rarely effects hardware/circuits if its not extreme temps. For instance, its got to be pretty brutal in there cold wise for an issue to occur. below 40F and high humidity I'm thinking you'd have trouble maybe.
But comps are ideally run at aprox 60-80F external temp.
Anyhow, This is could be a cold boot issue. by cold boot I mean first boot of the day, not in reference to actual thermal conditions of said environment.
Have you tried waiting half the day before booting one of them for the very first time that day?
And again: POWER..........POWER............POWER
Power: Quality PSU's: like fortron group www.fsp-group.com
And clean power: http://en.wikipedia.org/wiki/Uninterruptible_power_supply
Use together to help avoid things like this:
Caps: http://www.badcaps.net/

all very important!
Just my humble advice.
Good Luck!
David WallCommented:
To clear the enviromental issues move the machines out of theer current enviroment one morning and have the users sigb in there and see if the issue still happens.

This should then point you to either a enviromental or machine issue. If it turns out to be a enviroment look for large motors starting i.e. in production lines or large power supplies nearby .

If it is a machine issue consider going into Msconfig and disabling all startup items and see what happens then.
For the first machine with the
STOP: 0x00000050
and also STOP: 0x0000008E (0xC00000005, 0x805AFD9E, 0xB99C0BA0, 0x00000000).
error, I found an interesting tidbit on the symantic website. This is in reference to searching for SYMTDI.SYS on the symantic website.
They don't have a fix for it though.

The 2nd machine with the 0x000000BE error

(Click to consult the online Win XP Resource Kit article.)
A driver attempted to write to read-only memory. Commonly occurs after installing a faulty device driver, system service, or firmware. If a driver file is named in the error message, try to correct the problem by disabling, removing, or rolling back the driver.

In conjunction with the Applogs you posted for this machine, I would suspect a problem with Norton on both these machines.
How you want to go about getting Norton to behave is up to you. I personally have no use for symantic products as I find them very intrusive and hard to fix when something is wrong.
You may want to remove these computers from the network and then remove Norton to test. It may have something to do with the antivirus scheduling program as it happens at a set time.
chrismanncalgavinAuthor Commented:
Ok an update finally!

I have left both machines on overnight for about a week now.
To begin with no problems, but the first PC (AMD 2600+) crashed sometime during the day.
I also came in early myself on a cold morning, and both machines failed to startup. Only after leaving them in the crashed state for a while (blue screen on one, black on the other), they would finally boot and load windows.

So it seems the Dell Precision 380 has not crashed since being left on all the time, but the AMD has.
Could it be lasting damaged caused to the hard drives ? How can they both be failing?

I have put a new hard drive in the AMD PC and ghosted the old one onto it. Seems to work fine!
Not done this on the Dell machine yet as i'm not sure about the warranty, and possibly voiding it. I would prefer to use my own preference of hard drives though. I've never had a problem with Seagate drives, and the drives in these 2 pcs are Samsung and Western Digital.
The BSODs along with the testing you have done with leaving them in a running state( less temperature swings) certainly points to what you are suspecting with the hard drives. The temperature changes can make the mechanics of the drive work much harder and shorten its life due to the extra stress. As far as the BSODs , if the hard drive s were failing the possibility of BSODs are very real.
chrismanncalgavinAuthor Commented:
Thanks again,

Just as I thought I was onto something, the AMD machine crashed again with the new hard drive in it!
Have requested an onsite engineer for this machine on warranty. Have had no end of problems with this particular manufacturer. (Evesham Technology cough cough).
The machine in question has had a new motherboard, RAM and harddisk since we had it! And the other 4 machines we brought with it as well!
I suppose it must be CPU or motherboard faulty ? Have tested the RAM with several RAM testers, no errors.

Not sure what to do with the Dell machine, they told me to reformat but will be difficult with the user needing the PC 24/7.
May have to buy second hard drive for it and Ghost onto it while I reformat and reinstall.
David WallCommented:
When the old Home computig initiative was going , my employer hooked up with evesham to run the scheme ( against IT's advice I may say) and it has been an unmitigated disaster. Doesnt add anything to the case but just thought I would have a bitch.

If the machines are suffering from cold envioronment conditions it is one of those rare occassions where I would advise leaving them on constantly at least they will not suufer thermal shock when they are turned on after they have been off for a while.  
chrismanncalgavinAuthor Commented:
Ok thanks everyone.

Will award points to those I feel gave the most useful answers towards eliminating possible problems.
Don't have a firm answer so may have to open a new question, as it's proving difficult.
All Courses

From novice to tech pro — start learning today.