?
Solved

Server hang problem

Posted on 2002-03-26
20
Medium Priority
?
253 Views
Last Modified: 2013-12-16
Since 7 days now one of our Linux server as started
to hang aproximatively once a day.

I am really stuck here, I can't find out why it does this.

I checked all logs, and nothing suspicious appears in the log.

/var/log/
message
ksyms
secure
mysql

I also have checked my tripwire report and nothing have changed.

I port scanned the station but still, nothing abnormal.

The server is a Dell power edge 1550.
Pentium III - 900 Mhz
256 Mo RAM
running RH 7.1 on kernel 2.4.2

When I say hang.. it really hang.. nothing works except
the LED on the box. The power led is still lit and the network led still blinking, but the server can't be pinged.

Thanx in advance for any suggestion or comment on this issue.
0
Comment
Question by:Lazypete
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 6
  • 2
  • +5
20 Comments
 
LVL 40

Expert Comment

by:jlevie
ID: 6897023
Presumably this server has run for some time without hanging, so the question is what has changed. Have any new applications been installed or has something in the system configuration changed recently? Do you have a console screen, not the GUI, up so that you can see console messages?

Have you opened the case to be sure that all of the cooling fans are running. In particular a failed or slow CPU cooling fan can cause this sort of problem.
0
 
LVL 1

Author Comment

by:Lazypete
ID: 6897148
First of all thanx for your fast answer jlevie

No new software installed...
Only security update...

And yes it been running since 8 month without any reboot
( except on kernel update and stuff )

Yeah I try to look at the screen for message but because of APM the screen is always blanked out before the hang occurs. And I did not found how to make it stop yet.

The unit is not particuliary hot.. (well it didn't feel hot at the touch) All other server near this one are pretty much hot compared to it and they don't fail like this one.

There is no fan on the CPU only heatsink.

As soon as I can I'll rack it out and check the fan, but since its a mission critical server and some software refers to it directly I can't open it now.

Thanx again
0
 
LVL 14

Expert Comment

by:chris_calabrese
ID: 6897346
Umm, wouldn't a security update count as new software that's been installed?
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 40

Expert Comment

by:jlevie
ID: 6897353
Hmm, looks like you might be behind a bit on updates. My 7.1 systems are using kernel-2.4.9-31, but I don't know if that's related to what's going on now or not.

To gain control of the console and stop it from blanking I first disable APM in the BIOS. Then I disable apmd with 'chkconfig --level 2345 apmd off; /etc/init.d/apmd stop'. As long as you don't have the system configured for a GUI login (and you shouldn't do that on a server anyway) the console screen should remain visible.
0
 
LVL 1

Author Comment

by:Lazypete
ID: 6897372
No GUI installed.

I think all APM BIOS options are off..
I'll take a look at the APM deamon tho...

Yeah I know my kernel is kinda out of date...

I'll try to update it.. hope it works tho...

is there a way to update the kernel without removing the old one ?
0
 
LVL 1

Author Comment

by:Lazypete
ID: 6897379
No apmd running either...
0
 
LVL 40

Expert Comment

by:jlevie
ID: 6897441
Yes, you can update the kernel without removing the old one. From the script that I use to intelligently apply updates:

#        Also, if the updates include a new kernel it is best to manually save y#        existing kernel before applying the updates. This can be done something#        like:
#
#        root # cd /lib/modules
#        root # cp -pdr 2.4.2-2 2.4.2-2-old
#        root # cd /boot
#        root # cp -p initrd-2.4.2-2.img initrd-2.4.2-2.img-old
#        root # cp -p kernel.h-2.4.2 kernel.h-2.4.2-old
#        root # cp -p module-info-2.4.2-2 module-info-2.4.2-2-old
#        root # cp -p System.map-2.4.2-2 System.map-2.4.2-2-old
#        root # cp -p vmlinuz-2.4.2-2 vmlinuz-2.4.2-2-old
#
#        After the updates have been applied the saved files/dirs can be moved
#        back to their normal names, like:
#
#        root # cd /libmodules
#        root # mv 2.4.2-2-old 2.4.2-2
#        root # cd /boot
#        root # mv initrd-2.4.2-2.img-old initrd-2.4.2-2.img
#        root # mv kernel.h-2.4.2-old kernel.h-2.4.2
#        root # mv module-info-2.4.2-2-old module-info-2.4.2-2
#        root # mv System.map-2.4.2-2-old System.map-2.4.2-2
#        root # mv vmlinuz-2.4.2-2-old vmlinuz-2.4.2-2
#
#        Doing this allows you to include a boot stanza in /etc/lilo.conf like:
#
#        image=/boot/vmlinuz-2.4.9-12
#            label=linux
#            initrd=/boot/initrd-2.4.9-12.img
#            read-only
#            root=/dev/sda6
#
#        enabling you to boot the old kernel if the updated kernel has problems.#
#        If your boot configuration requires requires an initrd, you'll need to #        a new one with:
#
#       root# cd /boot
#       root# mkinitrd -v initrd-2.4.9-12.img 2.4.9-12
#
#        This patch list is complete as of 1 Mar 2002

If you like a copy of my script, send an email to jim@entrophy-free.net.

0
 
LVL 1

Author Comment

by:Lazypete
ID: 6899226
Well I checked and there is no such thing as power managment on this category of server.
( In fact I never saw a more empty BIOS config utility...)

But someone told me how to set the screen blanking delay for virtual terminal.

setterm -blank 0

So now I will be able to see if there's any output when I happens.
0
 
LVL 40

Expert Comment

by:jlevie
ID: 6899265
Okay, hopefully there'll be something on the console...

Did you get the script I sent?
0
 
LVL 1

Author Comment

by:Lazypete
ID: 6899286
Yes but I didn't have time to update the kernel yet.

Someone told me about a Dell Harware test utility
Im downloading em right now, I'll try this and see if its a hardware problem.
0
 
LVL 40

Expert Comment

by:jlevie
ID: 6899341
That sounds like a plan...
0
 
LVL 5

Expert Comment

by:BlackDiamond
ID: 6900279
Easy way to update kernel without removing old one...

rpm -ivh kernel-2.4.9-31.rpm
0
 
LVL 3

Expert Comment

by:hnminh
ID: 6974441
is there any partition that is 100% full, specially the "/" partition where "/tmp" is in?
0
 
LVL 1

Author Comment

by:Lazypete
ID: 6975101
Problem solved!

Sorry for the delay in posting the answer

It was a hardware problem.
A voltage failure on the mainboard.

Dell came changed the board and everything is now fine.

Thanx everyone for your help with this.
0
 
LVL 1

Author Comment

by:Lazypete
ID: 6975108

If someone think he deserve the point tell me.
Well see if we can agree.
0
 
LVL 40

Expert Comment

by:jlevie
ID: 7024271
I don't think were were really  all that much help on this problem. I'd suggest going to Community Support and asking to have the question deleted and the points returned.
0
 
LVL 1

Accepted Solution

by:
Computer101 earned 0 total points
ID: 7127252
Points refunded and placed in PAQ.

Computer101
E-E Moderator
0
 

Expert Comment

by:CleanupPing
ID: 9077044
Lazypete:
This old question needs to be finalized -- accept an answer, split points, or get a refund.  For information on your options, please click here-> http:/help/closing.jsp#1 
EXPERTS:
Post your closing recommendations!  No comment means you don't care.
0
 
LVL 1

Expert Comment

by:drewber
ID: 9220329
This question has been classified abandoned. I will make a recommendation to the moderators on its resolution in a week or two. I appreciate any comments that would help me to make a recommendation.
 

Unless it is clear to me that the question has been answered I will recommend delete. It is possible that a Grade less than A will be given if no expert makes a case for an A grade. It is assumed that any participant not responding to this request is no longer interested in its final disposition.

 
If the user does not know how to close the question, the options are here:
http://www.experts-exchange.com/help/closing.jsp
 
drewber
0
 
LVL 14

Expert Comment

by:chris_calabrese
ID: 9223771
Given that it was a hardware problem in the end, and nobody suggested that, I'd say refund the points.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Fine Tune your automatic Updates for Ubuntu / Debian
In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question