Solved

Server hang problem

Posted on 2002-03-26
20
243 Views
Last Modified: 2013-12-16
Since 7 days now one of our Linux server as started
to hang aproximatively once a day.

I am really stuck here, I can't find out why it does this.

I checked all logs, and nothing suspicious appears in the log.

/var/log/
message
ksyms
secure
mysql

I also have checked my tripwire report and nothing have changed.

I port scanned the station but still, nothing abnormal.

The server is a Dell power edge 1550.
Pentium III - 900 Mhz
256 Mo RAM
running RH 7.1 on kernel 2.4.2

When I say hang.. it really hang.. nothing works except
the LED on the box. The power led is still lit and the network led still blinking, but the server can't be pinged.

Thanx in advance for any suggestion or comment on this issue.
0
Comment
Question by:Lazypete
  • 7
  • 6
  • 2
  • +5
20 Comments
 
LVL 40

Expert Comment

by:jlevie
Comment Utility
Presumably this server has run for some time without hanging, so the question is what has changed. Have any new applications been installed or has something in the system configuration changed recently? Do you have a console screen, not the GUI, up so that you can see console messages?

Have you opened the case to be sure that all of the cooling fans are running. In particular a failed or slow CPU cooling fan can cause this sort of problem.
0
 
LVL 1

Author Comment

by:Lazypete
Comment Utility
First of all thanx for your fast answer jlevie

No new software installed...
Only security update...

And yes it been running since 8 month without any reboot
( except on kernel update and stuff )

Yeah I try to look at the screen for message but because of APM the screen is always blanked out before the hang occurs. And I did not found how to make it stop yet.

The unit is not particuliary hot.. (well it didn't feel hot at the touch) All other server near this one are pretty much hot compared to it and they don't fail like this one.

There is no fan on the CPU only heatsink.

As soon as I can I'll rack it out and check the fan, but since its a mission critical server and some software refers to it directly I can't open it now.

Thanx again
0
 
LVL 14

Expert Comment

by:chris_calabrese
Comment Utility
Umm, wouldn't a security update count as new software that's been installed?
0
 
LVL 40

Expert Comment

by:jlevie
Comment Utility
Hmm, looks like you might be behind a bit on updates. My 7.1 systems are using kernel-2.4.9-31, but I don't know if that's related to what's going on now or not.

To gain control of the console and stop it from blanking I first disable APM in the BIOS. Then I disable apmd with 'chkconfig --level 2345 apmd off; /etc/init.d/apmd stop'. As long as you don't have the system configured for a GUI login (and you shouldn't do that on a server anyway) the console screen should remain visible.
0
 
LVL 1

Author Comment

by:Lazypete
Comment Utility
No GUI installed.

I think all APM BIOS options are off..
I'll take a look at the APM deamon tho...

Yeah I know my kernel is kinda out of date...

I'll try to update it.. hope it works tho...

is there a way to update the kernel without removing the old one ?
0
 
LVL 1

Author Comment

by:Lazypete
Comment Utility
No apmd running either...
0
 
LVL 40

Expert Comment

by:jlevie
Comment Utility
Yes, you can update the kernel without removing the old one. From the script that I use to intelligently apply updates:

#        Also, if the updates include a new kernel it is best to manually save y#        existing kernel before applying the updates. This can be done something#        like:
#
#        root # cd /lib/modules
#        root # cp -pdr 2.4.2-2 2.4.2-2-old
#        root # cd /boot
#        root # cp -p initrd-2.4.2-2.img initrd-2.4.2-2.img-old
#        root # cp -p kernel.h-2.4.2 kernel.h-2.4.2-old
#        root # cp -p module-info-2.4.2-2 module-info-2.4.2-2-old
#        root # cp -p System.map-2.4.2-2 System.map-2.4.2-2-old
#        root # cp -p vmlinuz-2.4.2-2 vmlinuz-2.4.2-2-old
#
#        After the updates have been applied the saved files/dirs can be moved
#        back to their normal names, like:
#
#        root # cd /libmodules
#        root # mv 2.4.2-2-old 2.4.2-2
#        root # cd /boot
#        root # mv initrd-2.4.2-2.img-old initrd-2.4.2-2.img
#        root # mv kernel.h-2.4.2-old kernel.h-2.4.2
#        root # mv module-info-2.4.2-2-old module-info-2.4.2-2
#        root # mv System.map-2.4.2-2-old System.map-2.4.2-2
#        root # mv vmlinuz-2.4.2-2-old vmlinuz-2.4.2-2
#
#        Doing this allows you to include a boot stanza in /etc/lilo.conf like:
#
#        image=/boot/vmlinuz-2.4.9-12
#            label=linux
#            initrd=/boot/initrd-2.4.9-12.img
#            read-only
#            root=/dev/sda6
#
#        enabling you to boot the old kernel if the updated kernel has problems.#
#        If your boot configuration requires requires an initrd, you'll need to #        a new one with:
#
#       root# cd /boot
#       root# mkinitrd -v initrd-2.4.9-12.img 2.4.9-12
#
#        This patch list is complete as of 1 Mar 2002

If you like a copy of my script, send an email to jim@entrophy-free.net.

0
 
LVL 1

Author Comment

by:Lazypete
Comment Utility
Well I checked and there is no such thing as power managment on this category of server.
( In fact I never saw a more empty BIOS config utility...)

But someone told me how to set the screen blanking delay for virtual terminal.

setterm -blank 0

So now I will be able to see if there's any output when I happens.
0
 
LVL 40

Expert Comment

by:jlevie
Comment Utility
Okay, hopefully there'll be something on the console...

Did you get the script I sent?
0
 
LVL 1

Author Comment

by:Lazypete
Comment Utility
Yes but I didn't have time to update the kernel yet.

Someone told me about a Dell Harware test utility
Im downloading em right now, I'll try this and see if its a hardware problem.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 40

Expert Comment

by:jlevie
Comment Utility
That sounds like a plan...
0
 
LVL 5

Expert Comment

by:BlackDiamond
Comment Utility
Easy way to update kernel without removing old one...

rpm -ivh kernel-2.4.9-31.rpm
0
 
LVL 3

Expert Comment

by:hnminh
Comment Utility
is there any partition that is 100% full, specially the "/" partition where "/tmp" is in?
0
 
LVL 1

Author Comment

by:Lazypete
Comment Utility
Problem solved!

Sorry for the delay in posting the answer

It was a hardware problem.
A voltage failure on the mainboard.

Dell came changed the board and everything is now fine.

Thanx everyone for your help with this.
0
 
LVL 1

Author Comment

by:Lazypete
Comment Utility

If someone think he deserve the point tell me.
Well see if we can agree.
0
 
LVL 40

Expert Comment

by:jlevie
Comment Utility
I don't think were were really  all that much help on this problem. I'd suggest going to Community Support and asking to have the question deleted and the points returned.
0
 
LVL 1

Accepted Solution

by:
Computer101 earned 0 total points
Comment Utility
Points refunded and placed in PAQ.

Computer101
E-E Moderator
0
 

Expert Comment

by:CleanupPing
Comment Utility
Lazypete:
This old question needs to be finalized -- accept an answer, split points, or get a refund.  For information on your options, please click here-> http:/help/closing.jsp#1
EXPERTS:
Post your closing recommendations!  No comment means you don't care.
0
 
LVL 1

Expert Comment

by:drewber
Comment Utility
This question has been classified abandoned. I will make a recommendation to the moderators on its resolution in a week or two. I appreciate any comments that would help me to make a recommendation.
 

Unless it is clear to me that the question has been answered I will recommend delete. It is possible that a Grade less than A will be given if no expert makes a case for an A grade. It is assumed that any participant not responding to this request is no longer interested in its final disposition.

 
If the user does not know how to close the question, the options are here:
http://www.experts-exchange.com/help/closing.jsp
 
drewber
0
 
LVL 14

Expert Comment

by:chris_calabrese
Comment Utility
Given that it was a hardware problem in the end, and nobody suggested that, I'd say refund the points.
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

How many times have you wanted to quickly do the same thing to a list but found yourself typing it again and again? I first figured out a small time saver with the up arrow to recall the last command but that can only get you so far if you have a bi…
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!

Get 1:1 Help Now