Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Server hangs

Posted on 2011-09-10
13
Medium Priority
?
469 Views
Last Modified: 2012-05-12
Hello,

After 4 months of perfect work my server with opensuse 11.4 started to freeze. When I'm trying to login into it from another server in the same network I get just the following message:
Last login: Sat Sep 10 23:17:38 2011 from blablabla
Have a lot of fun...

and nothing else. From my home pc I can't even get connected with it, just form the neighbour server.
I have 16 Gb of ram and don't believe that swap gets full..Also I have cron job, that is running every 2 min and killing those processes that are using more than 90% of cpu. So it shouldn't be the overload issue either..

Could somebody explain such server behaivior and how to find the cause ? (no error logs)
0
Comment
Question by:tanel
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 3
13 Comments
 
LVL 21

Expert Comment

by:Papertrip
ID: 36517379
Also I have cron job, that is running every 2 min and killing those processes that are using more than 90% of cpu.

That is not a wise thing to do, and is probably the source of your problem.  Have you tried commenting out that cron job and rebooting to start fresh?  Who knows what that cron job killed that shouldn't have been...
0
 

Author Comment

by:tanel
ID: 36517413
I have another server, where autokill cron script is setup to 85% of cpu and everything is fine.
And as I mentioned before the server was stable almost for 5 months and the script kills the processes just with specific name (top -n 1 -b | grep "hlds_i686"). So, any system process can't be killed.
0
 
LVL 21

Expert Comment

by:Papertrip
ID: 36517423
the script kills the processes just with specific name (top -n 1 -b | grep "hlds_i686")

You didn't say that at first, you said "those processes that are using more than 90% of cpu".

Are you able to login via console?  Is the problem happening to all users?  Have you tried logging in from more than 1 other server?

What exactly do you mean by it's starting to freeze?  Is the only symptom here that you are having trouble logging in remotely or is there something else that makes you think the system is hanging?
0
WEBINAR - Latest Cyber Tips for Defense

Join the WatchGuard Threat Research Team on October 26th for an informative webinar featuring expert tips and tricks for defending your organization from today's latest cyber threats. Don't leave yourself vulnerable to attack. Register for the webinar today!

 
LVL 11

Expert Comment

by:maeltar
ID: 36517526
It could well be an issue with HalfLife, can you post your debug.log please
0
 

Author Comment

by:tanel
ID: 36518273
Thanks for your replies!

Every time I have to go to datacentre for hard reboot, since I can't even directly send any command to the shell, it asks for password , accepts and nothing appears (with all system users). No any strange log in messages and the drive space is okey.

There are several hlds's running and it's hard to find the right debug log. Also if something gets to the debug log, the system tells about is in "messages" - "segmentation fault..." BUT always the last logs are as usual.  

I have SSD(with trim) , 16 GB ram and 2 Gb of swap installed. Is it possible, that some process can eat the whole ram and swap in 2 hours ? I don't even know if it's a kernel/OS or hardware issue.. The first step i gonna do now is to switch the kernel with the default suse's one..
0
 

Author Comment

by:tanel
ID: 36519362
I've requested that this question be deleted for the following reason:

have to test
0
 
LVL 21

Expert Comment

by:Papertrip
ID: 36519363
I don't understand why this is being deleted in order to "test".
0
 

Accepted Solution

by:
tanel earned 0 total points
ID: 36521751
Well, I have changed my recompiled (optimized for hlds) kernel with the opensuse't new one 3.0.4-2.
Also I have turned off cron job, that was every 5 minutes changing priority of hlds processes to -99 (chrt).

The cron for cpu checks is still runnig but i changd its frequency from 1 min to 3.
I made a new other cron for swap chekings.And every 4 minutes it logs the "free -m" output to the file. Lets see..
up 1 day  0:34,  1 user,  load average: 0.02, 0.02, 0.05

Open in new window

0
 

Author Comment

by:tanel
ID: 36536100
Hello,

After some hours of detailed log investigation I have found the CAUSE of freezing:

Sep 11 12:11:52 cs kernel: [  802.825331] [drm:pch_irq_handler] *ERROR* PCH poison interrupt

Open in new window


Any thoughts ?
0
 

Author Comment

by:tanel
ID: 36536128
I google it and it has to be some 2.6.38 kernel issue, that I have been using before..
If someone has more information please let me know.
0
 

Author Closing Comment

by:tanel
ID: 36902138
Fixed by myself.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

this article is a guided solution for most of the common server issues in server hardware tasks we are facing in our routine job works. the topics in the following article covered are, 1) dell hardware raidlevel (Perc) 2) adding HDD 3) how t…
Moving your enterprise fax infrastructure from in-house fax machines and servers to the cloud makes sense — from both an efficiency and productivity standpoint. But does migrating to a cloud fax solution mean you will no longer be able to send or re…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses

596 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question