linux machine shutdown by itself

Hi

We got an older linux running SLES9.  Sometimes the machine got shut down during the night by itself.  I don't quite see useful info from /var/log/messages and /var/log/messages-yyyymmdd.gz.  How do I find out the reason?  Where can I check for useful log info?

Thanks.
asugriAsked:
Who is Participating?
 
rindiConnect With a Mentor Commented:
Uncontrolled shutdowns like that are usually caused by hardware problems and overheating.

Clean out all the dust from your server and make sure all the fans run smoothly. Test the RAM using memtest86+. On most Linux distro's it is included with it's boot menu, if it isn't in yours, boot using the UBCD:

http://ultimatebootcd.com
http://pharry.org/data/ubcd523.iso

Also test the HD's. If you are using a RAID controller, some have built-in options to test them, if not, the manufacturer's diagnostics are also included on the CD above. If the RAID controller doesn't have built-in diagnostics available, it should at least tell you what state the disks are in, and if it tells you a disk is bad, replace it.
0
 
ganesh4282Connect With a Mentor Commented:
You can configure Disk dump.. But you need to send the coredump to Novell to find the root cause.
0
 
asugriAuthor Commented:
Rindi,

The machine was purchased about 8 years ago.  We just got limited info now.  I believe it has a RAID.  How do I find out the RAID and test if any drive is bad?  More specific steps are very much appreciated.  

Ganesh4282,

OS is outside of support period, too.  I don't thin Novell will handle this case.
Thanks.
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
rindiConnect With a Mentor Commented:
You'll first have to found out what hardware you have, what RAID controller your server has builtin. RAID controllers generally have utilities or options included that tell you a general status of the disks connected to them. If it tells you a disk is bad, replace it. Also how you should do that depends on the hardware. Many servers have the disks in hot-swap caddies, and those should be removed while the server is running and replaced with the new disk, and then it should automatically rebuild the array...
0
 
asugriAuthor Commented:
Rindi,

I was hoping you (or somebody) can provide some kind of linux command to find out more about the RAID.  Perhaps I will post another question.

Thanks.
0
 
rindiCommented:
It is different from hardware and RAID controller manufacturer to manufacturer, they provide you with the utilities or tools to diagnose their hardware, or they don't, it depends on them.

To properly check the state of the disks, you also need to run their diagnostics out of the RAID system. There's no Linux command that can do that.

Only if you are using Linux built-in Software RAID do you have some commands to check the state of the array, but also that won't tell you the reason for it failing, or whether the hardware / disks are actually good or not. For that you again have to run the manufacturer's diagnostics.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.