computer randomly shuts off

I've got a new custom built machine which I installed centos on.  It runs ok for an hour or two then randomly shuts off.  The only thing I can think of is that it's overheating but I've got pretty sufficient cooling on it, is there anyway to check temps in software + if it was overheating is Centos shutting it down or would it be the hardware that shuts it down.  I'm new to linux are there any logs or anything written to when a machine shuts down.  Any ideas on other things to check for I'm pretty stumped.
Who is Participating?
ctwaleyConnect With a Mentor Commented:
You can only check for temperatures if the motherboard supports it........Generally, you can check the running temperature, right after it shuts down, through the BIOS.  There may be a page, or section, which shows the cpu and system temps, fan speed, voltage values, etc., in the BIOS....

If there is such a section in the BIOS, then most likely the shutdown is happening if the the system reaches critical temperatures.  That's only if that feature is enabled and what the cuttoff temps are set at......

For linux, the software solution would be to install 'lm_sensors', which is a command line tool to check temps (there should be a RPM package you can install).........But this will only work if the motherboard has the sensors built in to begin with........Their are GUI apps out which use lm_sensors, the most widely known one being GKrellM (which also monitors a lot of other systems stats, as well)......

Otherwise, if there is no sensor support, and overheating is the cause, then the system is shutting down on it's own, which means there will be hardware damage with continued use, and you most likely have a faulty board....

If CPU/System temperature is not the problem, then it could be a RAM problem.....This is easier to check by replacing the RAM stick with a known good one and see if the shutdown continues..........Or if there is more than one RAM installed, take them all out, except one, and try them one at a time to see if the problem persists.....

Also, physically check to see if all the fans are working properly and there is good airflow inside the case (no IDE ribbons causing airflow problems, etc)

Anyway, could you post your hardware setup, including if you're using rounded cables or ribbons for the hard drives and the number of fans being used?......This will make things a tiny bit easier to diagnose...... ;-)
ICPooremanAuthor Commented:
Bought it off of ebay here's what its got

Motherboard       :           GIGABYTE GA-945P-S3 MAINBOARD 1066MHz FSB SUPPORT
Memory       :       2GB DDR-2 533FSB (PC4200)  Memory
Video Card       :         256 MB nVIDIA Ge-Force 7300GS DVI/TV-OUT PCI EXPRESS VIDEO CARD
Hard Drive       :       250GB 7200RPM SATA II  Hard Drive
Network Card       :       10/100 Fast Ethernet Network Controller
Sound Card       :       CMI 9739A 6 CHANNEL CODEC
Case       :       ATX Case  w/ Power Supply and Front USB Port

The case has only got one fan installed, the hard drive has a fan, and I'm a real dummy with hardware there's another fan installed running in the case I believe it's a heatsink but maybe I'm wrong. Most everything is connected with rounded cables and there's actually a good amount of free space in the box.
ibu1System AdministratorCommented:
I think there is problem with Power supply.Change it
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

check the memory

first clean and swap the memory's
Okay, your board supports temp sensors and the BIOS has a health status page to view the cpu and board stats.........So, for that particular cpu, the running temp should be mid 40s deg C (not F) for an air-cooled system, or less........If it's above 60 deg C, then you have an overheating problem......

What you do is reboot the machine after it's been running for a while, say an hour since that's when it has problems.  When it reboots, you will need to press the <Delete> key (you can tap the Delete key repeatedly until the BIOS screen comes up).  Then highlight the "PC Health Status" entry by using the down arrow and hit ENTER......On the next screen, you will see "Current CPU Temperature" about half way down..........To quit the BIOS, press the <Esc> key twice and don't save on exit......or just hit the reset button on the case if it has one.......This will tell you what's going on without opening the case......

If the temp's okay, then the next thing to check is the RAM (memory), which means opening up the case......There are four slots for the RAM sticks, and if it has 2 gigs of RAM, then there will be at least two in there........If you feel confident enough for that, I'll walk you through the process, else take it in to a computer repair shop (one that you trust) and have it looked at.......

ICPooremanAuthor Commented:
Well when I first got into the bios it said the temp was around 60C but then within a minute or two it said it was down to 35C and stayed there.  Looks like it's overheating but why the drastic temp change within a minute or two?  
When the OS is running and there are a lot of daemons running in the background (plus any apps being used), the CPU is working harder to handle all that traffic, thus running the CPU hotter..........When you're in the BIOS, the CPU has nothing to do and is just idling, allowing it to run much cooler........

If CentOS is anything like the old RedHat, there are probably a number of daemons running unnecessarily in the background, so you should kill anything you really don't need, and disable them in the startup.....

There should be an app to help you manage the startup scripts and I'm not sure what it's called for your distro.........I run Slackware normally, which uses a BSD-style startup routine, not SysV, but I have run Debian (a while back), which uses the SysV startup routine and had a helper app to manage the startup scripts (although I usually managed them by hand ;-) ).....

Anyhow, reducing the number of background process will help lighten the load for the CPU and I suggest installing lm_sensors, along with GKrellM, to monitor the stats in real time while running the OS...........Besides these two, you will also need to make sure you have the correct "I2C" modules loaded for your particular chipset that lm_sensors relies on for retrieving the needed info..........

One of the most common cause(s) for overheating, which is easily fixable, would be the CPU fan not working properly (running too slow)......Even a slight drop in fan speed can be critical, especially if the heatsink was improperly installed (such as no thermal compound between the heatsink and cpu surfaces).......Since you got this PC from eBay, you have no assurance it was assembled properly, unless there was a guarantee or some sort of warranty which came with it..............If so, I suggest returning it and have it replaced, if possible......

But all is not lost if no permanent damage has happened so far.................It will require physically inspecting the machine and closely monitoring of the stats............If the inside of the case was pretty dusty when you first opened it up, that would be a good indication you taken in..........But, even if it was clean, that wouldn't necessarily tell you much, either........What you need to do is check up on the seller you bought it from through eBay, to see what kind of report there is for that seller......

Bottom line is, if you can't get it replaced, then it's just a process of elimination to narrow down the cause of the problem......

Another stupid question............The PC isn't located near any source of heat, such as an heater vent, is it..........and is there plenty of air circulation outside the case?.........
ICPooremanAuthor Commented:
<<The PC isn't located near any source of heat, such as an heater vent, is it
no it's actually in a fairly cool room not next to any source of heat

<<If the inside of the case was pretty dusty when you first opened it up,
no, it's  pretty clean inside the case.

I'll check up with the seller to see if there is anything they'll do.  They actually have a really good rating so I'm a little surprised.  
>  They actually have a really good rating so I'm a little surprised.

With a good rating, you should not have much problems, hopefully........Even if they did some testing before shipping, not all problems will be encountered, so it's not really surprising to find something amiss every once in a while (even for the big vendors).........Which is where RMAs come into the picture...... ;-)
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.