Server hang problem

Since 7 days now one of our Linux server as started
to hang aproximatively once a day.

I am really stuck here, I can't find out why it does this.

I checked all logs, and nothing suspicious appears in the log.


I also have checked my tripwire report and nothing have changed.

I port scanned the station but still, nothing abnormal.

The server is a Dell power edge 1550.
Pentium III - 900 Mhz
256 Mo RAM
running RH 7.1 on kernel 2.4.2

When I say hang.. it really hang.. nothing works except
the LED on the box. The power led is still lit and the network led still blinking, but the server can't be pinged.

Thanx in advance for any suggestion or comment on this issue.
Computer101Connect With a Mentor Commented:
Presumably this server has run for some time without hanging, so the question is what has changed. Have any new applications been installed or has something in the system configuration changed recently? Do you have a console screen, not the GUI, up so that you can see console messages?

Have you opened the case to be sure that all of the cooling fans are running. In particular a failed or slow CPU cooling fan can cause this sort of problem.
LazypeteAuthor Commented:
First of all thanx for your fast answer jlevie

No new software installed...
Only security update...

And yes it been running since 8 month without any reboot
( except on kernel update and stuff )

Yeah I try to look at the screen for message but because of APM the screen is always blanked out before the hang occurs. And I did not found how to make it stop yet.

The unit is not particuliary hot.. (well it didn't feel hot at the touch) All other server near this one are pretty much hot compared to it and they don't fail like this one.

There is no fan on the CPU only heatsink.

As soon as I can I'll rack it out and check the fan, but since its a mission critical server and some software refers to it directly I can't open it now.

Thanx again
Umm, wouldn't a security update count as new software that's been installed?
Hmm, looks like you might be behind a bit on updates. My 7.1 systems are using kernel-2.4.9-31, but I don't know if that's related to what's going on now or not.

To gain control of the console and stop it from blanking I first disable APM in the BIOS. Then I disable apmd with 'chkconfig --level 2345 apmd off; /etc/init.d/apmd stop'. As long as you don't have the system configured for a GUI login (and you shouldn't do that on a server anyway) the console screen should remain visible.
LazypeteAuthor Commented:
No GUI installed.

I think all APM BIOS options are off..
I'll take a look at the APM deamon tho...

Yeah I know my kernel is kinda out of date...

I'll try to update it.. hope it works tho...

is there a way to update the kernel without removing the old one ?
LazypeteAuthor Commented:
No apmd running either...
Yes, you can update the kernel without removing the old one. From the script that I use to intelligently apply updates:

#        Also, if the updates include a new kernel it is best to manually save y#        existing kernel before applying the updates. This can be done something#        like:
#        root # cd /lib/modules
#        root # cp -pdr 2.4.2-2 2.4.2-2-old
#        root # cd /boot
#        root # cp -p initrd-2.4.2-2.img initrd-2.4.2-2.img-old
#        root # cp -p kernel.h-2.4.2 kernel.h-2.4.2-old
#        root # cp -p module-info-2.4.2-2 module-info-2.4.2-2-old
#        root # cp -p
#        root # cp -p vmlinuz-2.4.2-2 vmlinuz-2.4.2-2-old
#        After the updates have been applied the saved files/dirs can be moved
#        back to their normal names, like:
#        root # cd /libmodules
#        root # mv 2.4.2-2-old 2.4.2-2
#        root # cd /boot
#        root # mv initrd-2.4.2-2.img-old initrd-2.4.2-2.img
#        root # mv kernel.h-2.4.2-old kernel.h-2.4.2
#        root # mv module-info-2.4.2-2-old module-info-2.4.2-2
#        root # mv
#        root # mv vmlinuz-2.4.2-2-old vmlinuz-2.4.2-2
#        Doing this allows you to include a boot stanza in /etc/lilo.conf like:
#        image=/boot/vmlinuz-2.4.9-12
#            label=linux
#            initrd=/boot/initrd-2.4.9-12.img
#            read-only
#            root=/dev/sda6
#        enabling you to boot the old kernel if the updated kernel has problems.#
#        If your boot configuration requires requires an initrd, you'll need to #        a new one with:
#       root# cd /boot
#       root# mkinitrd -v initrd-2.4.9-12.img 2.4.9-12
#        This patch list is complete as of 1 Mar 2002

If you like a copy of my script, send an email to

LazypeteAuthor Commented:
Well I checked and there is no such thing as power managment on this category of server.
( In fact I never saw a more empty BIOS config utility...)

But someone told me how to set the screen blanking delay for virtual terminal.

setterm -blank 0

So now I will be able to see if there's any output when I happens.
Okay, hopefully there'll be something on the console...

Did you get the script I sent?
LazypeteAuthor Commented:
Yes but I didn't have time to update the kernel yet.

Someone told me about a Dell Harware test utility
Im downloading em right now, I'll try this and see if its a hardware problem.
That sounds like a plan...
Easy way to update kernel without removing old one...

rpm -ivh kernel-2.4.9-31.rpm
is there any partition that is 100% full, specially the "/" partition where "/tmp" is in?
LazypeteAuthor Commented:
Problem solved!

Sorry for the delay in posting the answer

It was a hardware problem.
A voltage failure on the mainboard.

Dell came changed the board and everything is now fine.

Thanx everyone for your help with this.
LazypeteAuthor Commented:

If someone think he deserve the point tell me.
Well see if we can agree.
I don't think were were really  all that much help on this problem. I'd suggest going to Community Support and asking to have the question deleted and the points returned.
Given that it was a hardware problem in the end, and nobody suggested that, I'd say refund the points.
