Solved

2 PCs Crashing Always Early Morning!

Posted on 2006-11-30
18
539 Views
Last Modified: 2008-02-01
I have a baffling issue.
At our company we have 2 PC's which are located in the factory (ground floor office).
Both are completely different machines, one is an AMD Athlon 2600+, 512mb RAM, 40GB HDD. The other is a Dell Precision 380 workstation Pentium D 930, 1GB RAM, 200GB HDD which is quite new.
Both running XP Professional and have been totally overhauled (chkdsk, defrag, win updates, driver updates, internal cleanup, hardware diagnostics).
What we have is both machines seem to crash without fail nearly every morning, and ONLY in the morning (before 8:00).
On the AMD machine it happens after the user logs on, the machine gives blue screen dump and reboots. On the Dell machine, it happens either at the logon, or when left logged on for a couple of minutes.
I have been unable to recreate the crash on either machine throughout the day, have rebooted at least 20 times on each and run lots of diagnostics on RAM, HDD etc.
As a precaution I have placed surge protectors on both machines, and also a MAX - MIN thermometer in the room as I wonder if this is a temperature issue. The temperature does not seem to get below 10 degrees C.

Any ideas? Or is it just an irritating coincidence?
0
Comment
Question by:chrismanncalgavin
  • 5
  • 3
  • 3
  • +5
18 Comments
 
LVL 1

Expert Comment

by:rhutzel
Comment Utility
Does both machines have similar or same antivirus software or maybe a peice of software that needs to run when the computer is booted once daily?  If they only crash once, assuming the first time they are booted.  There must be some kind of link to an event that is trying to fire up like a "scan hard drive for a virus" or something of that nature.

Could you give the error code that he blue screen give (Like: 0x00000000, blah, blah).  There are often sentences also on the screen some where disclosing the type of error like:  THIS_IS_AN_ERROR except it will say what kind it is.

You can keep the system from auto rebooting by right clicking on the my computer icon and going to properties, then the advanced tab.  Click on the Startup and recovery settings button.  Under system failure subject, uncheck "automatically restart."

I would tell you to disable stuff, but I would like to know what that blue screen is saying first.

Ryan
0
 
LVL 4

Expert Comment

by:keola75
Comment Utility
Good points rhutzel.  I would also like to add that you could check both systems for identical error logs.  This can help narrow down whatever the common problem is.

-Keola
0
 
LVL 32

Expert Comment

by:Mark
Comment Utility
Go into the control panel and click on-->Administrative tools--> event viewer-->system
In here check for errors that correspond to  the time of startup. Post them here.
The BSOD error would be very helpful also.
0
 
LVL 25

Expert Comment

by:kode99
Comment Utility
This just start happening?

10C is at the bottom edge of some components operating range.  If the machines are being turned on cold it could be a problem that clears as powering up heats things up just enough that the 2nd run is good.  If they are left running all night they probably would be warm enough inside even at 10C room temperature.

Factories can be nasty for dirty power.  A surge protector will do nothing to clean up power it only protects against big spikes.  It is possible that a particular machine or combination of machines kicking on/warming up could be sending out some crap on the building power circuits.  To counter this sort of problem you need to look at a power conditioner or line conditioner,  true online UPS's also isolate equipment properly.  These cost a lot more than surge protectors but are worth it.

Here's an example,
http://www.tripplite.com/products/conditioners/index.cfm

Monster power also have some 'clean power' power bars that would work.  Dirty power is a big issue for guys with home theatre stuff anybody that sells home theatre probably sells power conditioners.

Anyway a couple of things to consider.  There must be some commonality here.  Just try and change only 1 thing at a time so that you can find the actual problem and not just have it disappear without figuring it out.
0
 
LVL 91

Expert Comment

by:nobus
Comment Utility
i would check the power for that room - can you hook on a monitor? or check the voltage during the period the crash occurs. It seeems related to the environment, and prime suspect id the AC power too low or too high
0
 
LVL 91

Expert Comment

by:nobus
Comment Utility
...or a grounding problem; have that checked too
0
 
LVL 8

Author Comment

by:chrismanncalgavin
Comment Utility
Thanks everyone so far, some interesting comments!

Ok, both machines have certain software in common that we have installed on the other 13 machines on the network. This is:
-Symantec Antivirus Corporate Edition 10.1
-Microsoft ISA Firewall Client
-Microsoft Office 2003 Pro
We have had none of the explained problems on the other 13 machines (4 of which are identical to PC2).
Symantec updates the virus definitions on the server machine at 8:00 every day, and runs a scheduled full scan every friday at 12:00.

The first machine (Pentium D) has displayed several differing blue screens. See the following:
SYMTDI.SYS
STOP: 0x00000050
PAGE_FAULT_IN_NONPAGED_AREA
and also STOP: 0x0000008E (0xC00000005, 0x805AFD9E, 0xB99C0BA0, 0x00000000).

The second machine (AMD 2600+) suffers the following errors:
-System Event Log shows the following before rebooting 26 Application popup:  : Machine Check: Regs
-After reboot shows the following: The computer has rebooted from a bugcheck.  The bugcheck was: 0x100000be (0xcdb28808, 0x07462121, 0xf2783524, 0x0000000b). A dump was saved in: C:\WINDOWS\Minidump\Mini120106

I have saved the system and app logs for both pcs.
PC1: http://www.auwi44.dsl.pipex.com/pc1applog.csv
        http://www.auwi44.dsl.pipex.com/pc1systemlog.csv
PC2: http://www.auwi44.dsl.pipex.com/pc2applog.csv
        http://www.auwi44.dsl.pipex.com/pc2systemlog.csv

Have asked the users to leave both machines on overnight for a day, and to check if problem occurs. This will at least rule out the temperature issue.
I don't think it is power related, but may be wrong. I've been told that the machines we use in the factory are on and off all day long and use a different circuit, so should not interfere, but you never know. Unfortunately I don't have a means of logging the voltage to each computer etc!

0
 
LVL 91

Expert Comment

by:nobus
Comment Utility
check it at a 15' interval during the period it happens then !
0
 
LVL 8

Author Comment

by:chrismanncalgavin
Comment Utility
Nobus,

I do not work during the time it happens, but can arrange to come in early. Although I don't believe seeing the error will make any difference, as I can still see the logs.
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 1

Assisted Solution

by:fallenknight308
fallenknight308 earned 250 total points
Comment Utility
Hurrah! to kode99 for mentioning: "POWER, AND THE COMMONLY OVERLOOKED ISSUES THEREOF"
(Power is a subject near and dear to my heart, if nobody could tell yet)
Well, enough of that.

chrismann: temp rarely effects hardware/circuits if its not extreme temps. For instance, its got to be pretty brutal in there cold wise for an issue to occur. below 40F and high humidity I'm thinking you'd have trouble maybe.
But comps are ideally run at aprox 60-80F external temp.
Anyhow, This is could be a cold boot issue. by cold boot I mean first boot of the day, not in reference to actual thermal conditions of said environment.
Have you tried waiting half the day before booting one of them for the very first time that day?
And again: POWER..........POWER............POWER
Power: Quality PSU's: like fortron group www.fsp-group.com
And clean power: http://en.wikipedia.org/wiki/Uninterruptible_power_supply
Use together to help avoid things like this:
Caps: http://www.badcaps.net/
http://www.liebert.com/support/whitepapers/documents/lbthowto.asp
http://www.smartcomputing.com/editorial/article.asp?article=articles/1992/may92/0515/92n0515.asp&articleid=6115&guid=
http://www.google.com/search?hl=en&lr=&q=+computer+surge+damage++&btnG=Search

all very important!
Just my humble advice.
Good Luck!
 
0
 
LVL 12

Expert Comment

by:WallD
Comment Utility
To clear the enviromental issues move the machines out of theer current enviroment one morning and have the users sigb in there and see if the issue still happens.

This should then point you to either a enviromental or machine issue. If it turns out to be a enviroment look for large motors starting i.e. in production lines or large power supplies nearby .

If it is a machine issue consider going into Msconfig and disabling all startup items and see what happens then.
0
 
LVL 32

Expert Comment

by:Mark
Comment Utility
For the first machine with the
SYMTDI.SYS
STOP: 0x00000050
PAGE_FAULT_IN_NONPAGED_AREA
and also STOP: 0x0000008E (0xC00000005, 0x805AFD9E, 0xB99C0BA0, 0x00000000).
error, I found an interesting tidbit on the symantic website. This is in reference to searching for SYMTDI.SYS on the symantic website.
http://service1.symantec.com/SUPPORT/nip.nsf/0/108e3e6dbf49f33a88256dc700778e2a?OpenDocument
They don't have a fix for it though.

The 2nd machine with the 0x000000BE error

0x000000BE: ATTEMPTED_WRITE_TO_READONLY_MEMORY
(Click to consult the online Win XP Resource Kit article.)
A driver attempted to write to read-only memory. Commonly occurs after installing a faulty device driver, system service, or firmware. If a driver file is named in the error message, try to correct the problem by disabling, removing, or rolling back the driver.

In conjunction with the Applogs you posted for this machine, I would suspect a problem with Norton on both these machines.
How you want to go about getting Norton to behave is up to you. I personally have no use for symantic products as I find them very intrusive and hard to fix when something is wrong.
You may want to remove these computers from the network and then remove Norton to test. It may have something to do with the antivirus scheduling program as it happens at a set time.
0
 
LVL 8

Author Comment

by:chrismanncalgavin
Comment Utility
Ok an update finally!

I have left both machines on overnight for about a week now.
To begin with no problems, but the first PC (AMD 2600+) crashed sometime during the day.
I also came in early myself on a cold morning, and both machines failed to startup. Only after leaving them in the crashed state for a while (blue screen on one, black on the other), they would finally boot and load windows.

So it seems the Dell Precision 380 has not crashed since being left on all the time, but the AMD has.
Could it be lasting damaged caused to the hard drives ? How can they both be failing?

I have put a new hard drive in the AMD PC and ghosted the old one onto it. Seems to work fine!
Not done this on the Dell machine yet as i'm not sure about the warranty, and possibly voiding it. I would prefer to use my own preference of hard drives though. I've never had a problem with Seagate drives, and the drives in these 2 pcs are Samsung and Western Digital.
0
 
LVL 32

Expert Comment

by:Mark
Comment Utility
The BSODs along with the testing you have done with leaving them in a running state( less temperature swings) certainly points to what you are suspecting with the hard drives. The temperature changes can make the mechanics of the drive work much harder and shorten its life due to the extra stress. As far as the BSODs , if the hard drive s were failing the possibility of BSODs are very real.
0
 
LVL 25

Accepted Solution

by:
kode99 earned 250 total points
Comment Utility
Running anything close to the limits will simply shorten its life in general.   You said the Dell was new so it is likely that it is fine as its amount of exposure is less.  Sure the drive could fail sooner than 'normal' but it is likely not a big deal as long as you don't go back to the 'cold boots'.  Probably about the best thing would be to run some diagnostics on it and maybe keep an eye on it via smartdrive.  In any case if it does start to act up the hard drive would likely be a good place to start checking.

Something like this maybe,  
http://www.stellarinfo.com/hard-drive-monitor.htm
allow you to see how cold it gets in the middle of the night even. You might want to verify for sure what the PC's internal temperatures are and also be sure that the temperature in the room does not go much lower than 10 C as even a running system will have problems as you head on down in temperature.

There are also lots of freeware utilities to look at disk smart info.

The AMD machine was how old?  
Let's face it drive failures under normal circumstances are not unheard of after only a year or two (depending on model/brands somewhat).  So combine the less than idea startups it likely pushed it over the edge.

I like the good old SpinRite for putting a disk through its paces also,
http://www.grc.com/spinrite.htm

Don't sweat the Dell warranty.  You probably are voiding it by running a machine so cold ;).  I favor seagate and WD raptor's myself.
0
 
LVL 8

Author Comment

by:chrismanncalgavin
Comment Utility
Thanks again,

Just as I thought I was onto something, the AMD machine crashed again with the new hard drive in it!
Have requested an onsite engineer for this machine on warranty. Have had no end of problems with this particular manufacturer. (Evesham Technology cough cough).
The machine in question has had a new motherboard, RAM and harddisk since we had it! And the other 4 machines we brought with it as well!
I suppose it must be CPU or motherboard faulty ? Have tested the RAM with several RAM testers, no errors.

Not sure what to do with the Dell machine, they told me to reformat but will be difficult with the user needing the PC 24/7.
May have to buy second hard drive for it and Ghost onto it while I reformat and reinstall.
0
 
LVL 12

Expert Comment

by:WallD
Comment Utility
When the old Home computig initiative was going , my employer hooked up with evesham to run the scheme ( against IT's advice I may say) and it has been an unmitigated disaster. Doesnt add anything to the case but just thought I would have a bitch.

If the machines are suffering from cold envioronment conditions it is one of those rare occassions where I would advise leaving them on constantly at least they will not suufer thermal shock when they are turned on after they have been off for a while.  
0
 
LVL 8

Author Comment

by:chrismanncalgavin
Comment Utility
Ok thanks everyone.

Will award points to those I feel gave the most useful answers towards eliminating possible problems.
Don't have a firm answer so may have to open a new question, as it's proving difficult.
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Windows 7 does not have the best desktop search built in. This is something Windows 7 users have struggled with. You type something in, and your search results don’t always match what you are looking for, or it doesn’t actually work at all. There ar…
I use more than 1 computer in my office for various reasons. Multiple keyboards and mice take up more than just extra space, they make working a little more complicated. Using one mouse and keyboard for all of my computers makes life easier. This co…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now