?
Solved

Server unresponsive issue

Posted on 2008-11-19
4
Medium Priority
?
1,577 Views
Last Modified: 2012-05-05
I had a Linux server running RHEL4 become un-responsive recently and required a reboot. When i came up i looked through the logs and dmesg and couldnt see anything that was conclusive as to why it had rebooted. But some of what i saw in /var/log/messages at the time this started im posting below so hopefully someone can tell me what may have caused the server to become unresponsive. Thanks!


Oct 5 06:26:00 servername kernel: request_module: runaway loop modprobe net-pf-10
Oct 5 06:26:00  servername hald[3763]: Timed out waiting for hotplug event 626. Rebasing to 631
Oct 5 06:26:00  servername kernel: usb 1-1.1: USB disconnect, address 4
Oct 5 06:26:00  servername kernel: usb 1-1.1: new full speed USB device using address 5
Oct 5 06:26:00  servername kernel: oom-killer: gfp_mask=0x1d2
Oct 5 06:26:00  servername kernel: Mem-info:
Oct 5 06:26:00  servername kernel: Node 0 DMA per-cpu:
Oct 5 06:26:00  servername kernel: cpu 0 hot: low 2, high 6, batch 1
Oct 5 06:26:00  servername kernel: cpu 0 cold: low 0, high 2, batch 1
Oct 5 06:26:00  servername kernel: cpu 1 hot: low 2, high 6, batch 1
Oct 5 06:26:00  servername kernel: cpu 1 cold: low 0, high 2, batch 1
Oct 5 06:26:00  servername kernel: cpu 2 hot: low 2, high 6, batch 1
Oct 5 06:26:00  servername kernel: cpu 2 cold: low 0, high 2, batch 1
Oct 5 06:26:00  servername kernel: cpu 3 hot: low 2, high 6, batch 1
Oct 5 06:26:00  servername kernel: cpu 3 cold: low 0, high 2, batch 1
Oct 5 06:26:00  servername kernel: Node 0 Normal per-cpu:
Oct 5 06:26:00  servername kernel: cpu 0 hot: low 32, high 96, batch 16
Oct 5 06:26:00  servername kernel: cpu 0 cold: low 0, high 32, batch 16
Oct 5 06:26:00  servername kernel: cpu 1 hot: low 32, high 96, batch 16
Oct 5 06:26:00  servername kernel: cpu 1 cold: low 0, high 32, batch 16
Oct 5 06:26:00  servername kernel: cpu 2 hot: low 32, high 96, batch 16
Oct 5 06:26:00 servername kernel: cpu 2 cold: low 0, high 32, batch 16
Oct 5 06:26:00 servername kernel: cpu 3 hot: low 32, high 96, batch 16
Oct 5 06:26:00 servername kernel: cpu 3 cold: low 0, high 32, batch 16
Oct 5 06:26:00 servername kernel: Node 0 HighMem per-cpu: empty
Oct 5 06:26:00 servername kernel:
Oct 5 06:26:00 servername kernel: Free pages:       19348kB (0kB HighMem)
0
Comment
Question by:linuxpig
3 Comments
 
LVL 14

Expert Comment

by:cjl7
ID: 22995216
Hi,


Have you got sysstat installed? Doing a 'sar' will give you a hint if the system was busy with swapping and the cpu was waiting for disk-io.

If the server is really slow, that is the most common reason.

//jonas
0
 
LVL 7

Accepted Solution

by:
macker- earned 1500 total points
ID: 23000896
There's one big clue in the output:

Oct 5 06:26:00  servername kernel: oom-killer: gfp_mask=0x1d2

OOM is an acronym for Out Of Memory.  Generally speaking, this suggests that the server ran out of memory (ram+swap), and had to resort to killing processes.  The previous log messages may give clues as to the circumstances which lead to this.

As Jonas suggested, having sysstat installed (and running; it's off by default) will help you to track trends.  The default sampling period is 10 minutes, but you may want to change this to something more frequent, such as every 5 or 1 minute.  The primary cost is disk space, which is a small price to pay.

Lastly, if you suspect a kernel panic, there is options such as kdump and netdump, depending on your version of RHEL.  (RHEL3 and RHEL4 still use netdump.)  The software lets you capture a detailed kernel dump, including a memory image, to local disk or over the network, and then reboot the server.  This gives you the opportunity for very detailed post-mortem, though usually the detail is excessive for the average user.  I'd consider it more appropriate (post-mortem) for systems running a specialized software stack, that is crashing on a repeated basis, and vendor support exists for troubleshooting the cause.

For the average situation, which seems to encapsulate your usage, sysstat logging every 5 or 10 minutes, and watching memory usage / swap usage, will be a good start.  Make sure you don't have an excess of swap on an IDE or SATA drive, e.g. 2G of RAM and 8G of swap.  Use sysstat (sar) to watch trends, and snapshots from tools like top, free (`free -h`), and iostat (`iostat -x 5 5`) for current status.
0
 

Author Closing Comment

by:linuxpig
ID: 31518230
The response steered me in the right direction.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Linux users are sometimes dumbfounded by the severe lack of documentation on a topic. Sometimes, the documentation is copious, but other times, you end up with some obscure "it varies depending on your distribution" over and over when searching for …
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Suggested Courses
Course of the Month17 days, 4 hours left to enroll

862 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question