Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

runaway apache processes

Posted on 2007-08-05
20
Medium Priority
?
548 Views
Last Modified: 2013-12-16
The Redhat Linux server that hosts my website has recently started crashing with 100% memory usage.

When I look a top I see the following:

top - 19:12:23 up 51 min,  1 user,  load average: 145.93, 144.28, 120.78
Tasks: 248 total,   1 running, 247 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.8% us,  1.2% sy,  0.0% ni, 35.7% id, 61.4% wa,  0.0% hi,  0.0% si
Mem:   2066044k total,  1984556k used,    81488k free,    14988k buffers
Swap:   522072k total,   522072k used,        0k free,    30108k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2644 mysql     15   0  558m 110m 2272 S  1.0  5.5   1:44.00 mysqld
 4904 apache    15   0 46276  28m 2916 D  0.0  1.4   0:07.84 httpd
 4804 apache    15   0 46316  28m 2916 D  0.0  1.4   0:07.74 httpd
 4797 apache    15   0 46380  27m 2916 D  0.0  1.4   0:07.84 httpd
 4773 apache    18   0 46384  27m 2916 D  0.0  1.4   0:07.30 httpd
 4803 apache    15   0 46380  27m 2916 D  0.3  1.4   0:07.75 httpd
 4684 apache    15   0 46360  27m 2908 D  0.0  1.4   0:07.73 httpd
 4796 apache    15   0 46344  27m 2916 D  0.0  1.4   0:07.86 httpd
 4650 apache    15   0 46412  27m 2916 D  0.0  1.4   0:07.92 httpd
 4732 apache    15   0 46252  27m 2908 D  0.0  1.4   0:07.78 httpd
 4774 apache    15   0 46416  27m 2916 D  0.0  1.4   0:08.13 httpd
 4742 apache    18   0 46276  27m 2808 D  0.0  1.4   0:07.30 httpd
 4870 apache    15   0 46308  27m 2916 D  0.0  1.4   0:07.85 httpd
 4679 apache    18   0 46168  27m 2416 D  0.0  1.3   0:07.19 httpd
 4721 apache    18   0 46168  26m 2416 D  0.0  1.3   0:07.28 httpd
 4794 apache    18   0 46168  26m 2416 D  0.0  1.3   0:07.35 httpd
 4802 apache    18   0 46168  26m 2416 D  0.0  1.3   0:07.30 httpd
 4943 apache    15   0 46288  26m 2908 D  0.0  1.3   0:07.88 httpd
 4666 apache    15   0 46368  26m 2876 D  0.0  1.3   0:08.60 httpd
 3578 apache    15   0 46368  26m 2916 D  0.0  1.3   0:08.75 httpd
 4508 apache    15   0 46324  26m 2916 D  0.0  1.3   0:08.07 httpd
 4772 apache    18   0 46168  26m 2416 D  0.3  1.3   0:07.28 httpd
 4501 apache    15   0 46248  26m 2908 D  0.0  1.3   0:07.73 httpd
 3439 apache    15   0 46328  26m 2916 D  0.0  1.3   0:08.60 httpd
 4511 apache    15   0 46376  26m 2916 D  0.0  1.3   0:07.80 httpd
 4510 apache    15   0 46356  26m 2916 D  0.0  1.3   0:08.29 httpd
 4505 apache    15   0 46408  26m 2916 D  0.0  1.3   0:08.05 httpd
 2812 apache    15   0 46340  25m 2916 D  0.0  1.3   0:08.61 httpd
 5002 apache    18   0 46168  25m 2416 D  0.0  1.3   0:07.39 httpd
 4731 apache    15   0 46332  25m 2916 D  0.0  1.3   0:08.15 httpd
 4548 apache    15   0 46372  25m 2908 D  0.0  1.2   0:07.65 httpd
 5197 apache    15   0 46256  25m 2916 D  0.0  1.2   0:07.84 httpd
 4869 apache    15   0 46372  24m 2916 D  0.0  1.2   0:08.57 httpd
 4425 apache    15   0 46340  24m 2916 D  0.0  1.2   0:07.85 httpd
 4341 apache    15   0 46364  24m 2908 D  0.0  1.2   0:08.23 httpd
 4866 apache    15   0 46340  24m 2916 D  0.0  1.2   0:08.69 httpd
 4418 apache    15   0 46336  24m 2908 D  0.0  1.2   0:07.74 httpd
 4509 apache    15   0 46220  24m 2908 D  0.0  1.2   0:07.69 httpd

As you can see there are lots of "D" httpd processes, D meaning dead??

I've read that this is related to "run away" processes, but what can I do to debug and further identify what is the root cause of these processes, so that I can resolve this issue.

At peak website times, it takes less than an hour for the server to become totally bogged down like this, with the only resolution being to reboot the server.
0
Comment
Question by:dealclickcouk
  • 10
  • 8
  • 2
20 Comments
 
LVL 15

Expert Comment

by:DonConsolio
ID: 19635682
"D" means uninterruptable sleep - usually this means waiting for IO

might be either a hardware problem (check /var/log/messages for disk problems
or similar troubles) or a hanging CGI script.

0
 

Author Comment

by:dealclickcouk
ID: 19636114
Thx for the tip, I've looked through the msg log, and around the time when I saw lots of D processes I see lots of these types of msgs:

Aug  5 13:00:07 localhost kernel:
Aug  5 13:00:07 localhost kernel: Free pages:       13796kB (512kB HighMem)
Aug  5 13:00:08 localhost kernel: Active:360561 inactive:138632 dirty:0 writeback:0 unstable:0 free:3449 slab:4983 mapped:500559 pagetables:4517
Aug  5 13:00:08 localhost kernel: DMA free:12556kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:628 all_unreclaimable? yes
Aug  5 13:00:08 localhost kernel: protections[]: 0 0 0
Aug  5 13:00:08 localhost kernel: Normal free:728kB min:928kB low:1856kB high:2784kB active:523784kB inactive:330912kB present:901120kB pages_scanned:12222012 all_unreclaimable? yes
Aug  5 13:00:08 localhost kernel: protections[]: 0 0 0
Aug  5 13:00:09 localhost kernel: HighMem free:512kB min:512kB low:1024kB high:1536kB active:918460kB inactive:223616kB present:1170368kB pages_scanned:10911002 all_unreclaimable? no
Aug  5 13:00:09 localhost kernel: protections[]: 0 0 0
Aug  5 13:00:09 localhost kernel: DMA: 5*4kB 5*8kB 3*16kB 5*32kB 4*64kB 2*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12556kB
Aug  5 13:00:09 localhost kernel: Normal: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 728kB
Aug  5 13:00:10 localhost kernel: HighMem: 0*4kB 8*8kB 4*16kB 4*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 512kB
Aug  5 13:00:10 localhost kernel: Swap cache: add 132469, delete 132468, find 865/1078, race 0+6
Aug  5 13:00:10 localhost kernel: 0 bounce buffer pages
Aug  5 13:00:10 localhost kernel: Free swap:            0kB
Aug  5 13:00:10 localhost kernel: 521968 pages of RAM
Aug  5 13:00:11 localhost kernel: 292592 pages of HIGHMEM
Aug  5 13:00:11 localhost kernel: 5555 reserved pages
Aug  5 13:00:11 localhost kernel: 100263 pages shared
Aug  5 13:00:11 localhost kernel: 1 pages swap cached
Aug  5 13:00:12 localhost kernel: Out of Memory: Killed process 5057 (httpd).
Aug  5 13:00:12 localhost kernel: oom-killer: gfp_mask=0xd0
Aug  5 13:00:12 localhost kernel: Mem-info:
Aug  5 13:00:12 localhost kernel: DMA per-cpu:
Aug  5 13:00:12 localhost kernel: cpu 0 hot: low 2, high 6, batch 1
Aug  5 13:00:13 localhost kernel: cpu 0 cold: low 0, high 2, batch 1
Aug  5 13:00:13 localhost kernel: cpu 1 hot: low 2, high 6, batch 1
Aug  5 13:00:13 localhost kernel: cpu 1 cold: low 0, high 2, batch 1
Aug  5 13:00:13 localhost kernel: Normal per-cpu:
Aug  5 13:00:14 localhost kernel: cpu 0 hot: low 32, high 96, batch 16
Aug  5 13:00:14 localhost kernel: cpu 0 cold: low 0, high 32, batch 16
Aug  5 13:00:14 localhost kernel: cpu 1 hot: low 32, high 96, batch 16
Aug  5 13:00:14 localhost kernel: cpu 1 cold: low 0, high 32, batch 16
Aug  5 13:00:15 localhost kernel: HighMem per-cpu:
Aug  5 13:00:15 localhost kernel: cpu 0 hot: low 32, high 96, batch 16
Aug  5 13:00:15 localhost kernel: cpu 0 cold: low 0, high 32, batch 16
Aug  5 13:00:15 localhost kernel: cpu 1 hot: low 32, high 96, batch 16
Aug  5 13:00:16 localhost kernel: cpu 1 cold: low 0, high 32, batch 16
Aug  5 13:00:16 localhost kernel:

I'm not sure if this is a problem, or normal, but it did stick out.

As this sever is used just as a webserver is there anyway to limit the % of mem & cpu that each process uses and how long before auto temination, ie even if a process is D, if it has been that way for more than 30secs then the end user will most probably got bored and left or hot refresh, so really no point waiting for IO regardless.
0
 

Author Comment

by:dealclickcouk
ID: 19636129

Also looking a little bit further back int the log I saw lots and lots of this type of entry:

Aug  5 08:54:02 localhost sshd(pam_unix)[1031]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com  user=root
Aug  5 08:54:31 localhost sshd(pam_unix)[1097]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com  user=root
Aug  5 08:54:32 localhost sshd(pam_unix)[1100]: check pass; user unknown
Aug  5 08:54:32 localhost sshd(pam_unix)[1100]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com
Aug  5 08:54:34 localhost sshd(pam_unix)[1105]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com  user=root
Aug  5 08:54:35 localhost sshd(pam_unix)[1108]: check pass; user unknown
Aug  5 08:54:35 localhost sshd(pam_unix)[1108]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com
Aug  5 08:54:37 localhost sshd(pam_unix)[1112]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com  user=root
Aug  5 08:54:38 localhost sshd(pam_unix)[1115]: check pass; user unknown
Aug  5 08:54:38 localhost sshd(pam_unix)[1115]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com
Aug  5 08:54:39 localhost sshd(pam_unix)[1119]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com  user=root
Aug  5 08:54:41 localhost sshd(pam_unix)[1121]: check pass; user unknown
Aug  5 08:54:41 localhost sshd(pam_unix)[1121]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com
Aug  5 08:54:42 localhost sshd(pam_unix)[1123]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com  user=root
Aug  5 08:54:43 localhost sshd(pam_unix)[1125]: check pass; user unknown
Aug  5 08:54:43 localhost sshd(pam_unix)[1125]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com
Aug  5 08:54:45 localhost sshd(pam_unix)[1127]: check pass; user unknown
Aug  5 08:54:45 localhost sshd(pam_unix)[1127]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com
Aug  5 08:54:46 localhost sshd(pam_unix)[1131]: check pass; user unknown
Aug  5 08:54:46 localhost sshd(pam_unix)[1131]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com
Aug  5 08:54:48 localhost sshd(pam_unix)[1133]: check pass; user unknown
Aug  5 08:54:48 localhost sshd(pam_unix)[1133]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com
Aug  5 08:54:49 localhost sshd(pam_unix)[1136]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com  user=admin
Aug  5 08:54:50 localhost sshd(pam_unix)[1141]: check pass; user unknown
Aug  5 08:54:50 localhost sshd(pam_unix)[1141]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com
Aug  5 08:54:52 localhost sshd(pam_unix)[1144]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com  user=admin
Aug  5 08:54:53 localhost sshd(pam_unix)[1147]: check pass; user unknown
Aug  5 08:54:53 localhost sshd(pam_unix)[1147]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com
Aug  5 08:54:54 localhost sshd(pam_unix)[1152]: check pass; user unknown
ruser= rhost=hs-622.dedicated.hostalia.com
Aug  5 08:55:01 localhost crond(pam_unix)[1166]: session opened for user root by (uid=0)
Aug  5 08:55:01 localhost crond(pam_unix)[1167]: session opened for user root by (uid=0)
Aug  5 08:55:01 localhost crond(pam_unix)[1167]: session closed for user root
Aug  5 08:55:01 localhost crond(pam_unix)[1166]: session closed for user root
Aug  5 08:55:02 localhost sshd(pam_unix)[1183]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com  user=admin
Aug  5 08:55:03 localhost sshd(pam_unix)[1188]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com  user=mysql
Aug  5 08:55:04 localhost sshd(pam_unix)[1192]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com  user=admin
Aug  5 08:55:05 localhost sshd(pam_unix)[1195]: check pass; user unknown
Aug  5 08:55:05 localhost sshd(pam_unix)[1195]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=hs-622.dedicated.hostalia.com
Aug  5 08:55:07 localhost sshd(pam_unix)[1197]: check pass; user unknown

Is this some kind of hack/attack, ie so many authentication failure & user unknown logs?
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 14

Expert Comment

by:ygoutham
ID: 19636563
i have my own custom script that i use to kill runaway processes.

might require some clean up for your purposes

#######
<?php

$pid = getmypid();
$sid = session_id();

// mcheck variable is for limiting the process to a certain amount of memory in MB
$mcheck = 50;

print "Your PID is $pid ---- Session id = $sid";
print "<pre>";
print "<h1>All HTTP Processes list</h1><br><h2>Memory Limited to $mcheck MB only</h2>";
exec("ps lax | grep httpd", $output, $return);
$ctr=0;

//run the 'ps -lax | grep httpd' command and take the output to parse line by line

foreach($output as $file) {
      print "$file<br>";
      while($test=strstr($file, "  ") ){$file = str_replace("  ", " ", $file);}
      $p = explode(" ", $file);
      $pa =  $pa.$p[2].",";
      $ma =  $ma.$p[7].",";
      $st = $st.$p[8].",";
      }
      $pa = explode(",", $pa);
      $ma = explode(",", $ma);
      $max = count($pa);
      $st = explode(",", $st);

      //run the 'ps -lax | grep httpd' command and take the output to parse line by line and identify the
               //process running in excess of 50 mb to be killed
      // below block of code is only for GOUTHAM. Not useable outside!!!
      foreach($ma as $val2){
      $mem = ($val2 / 1024) - $mcheck;
      if($val2 > ($mcheck * 1024)){
            $expid = $pa[$ctr] ;
            $p = mysql_query("select username from login where pid = $expid order by pidtime desc limit 1");
            while ($c_row = mysql_fetch_row($p) ){foreach($c_row as $field8) {$user=$field8; }}
            print "<br>".$pa[$ctr]." pid running in excess of $mem mb of memory from user $user";
            exec("kill -9 ".$pa[$ctr], $misc, $ret2);
            }
      $ctr++;
      }

      //below code is for killing any process that returns a "interr" status while running 'ps -lax | grep httpd' command
      
      $ctr=0;
      foreach($st as $val3){
      if($val3 == "interr"){
            $expid = $pa[$ctr];
            exec("kill -9 ".$pa[$ctr], $misc, $ret2);
            }
      }
      
            //for($i=0;$i<$max;$i++){print " ".$st[$i]." status  - ".$pa[$i]."  .<br>";  }

print "</pre>";

print "<p><p>Content Auto Refreshes in 1 minute</p></p>";
?>

####### END OF CODE ##########
0
 
LVL 14

Expert Comment

by:ygoutham
ID: 19636569
it sure looks like someone trying to access your server through an SSH .  change the files

/etc/hosts.deny

and add a line at the end

SSHD:  ALL EXCEPT your.ip.address.here, your_other.ip.address.here, so.on_and.so.forth

that means that any one trying to connect to your server through ssh from the outside world would be immediately be denied service.  
0
 

Author Comment

by:dealclickcouk
ID: 19636657

ygoutham Thx for the script, I will give that a try out, but am I right in assuming that this will not kill the "D" processes, because as DonConsolio says above that "D" means uninterruptable sleep - and so can't be killed?
0
 
LVL 14

Expert Comment

by:ygoutham
ID: 19636700
mine takes the third variable which is the memory used. you can however use the 10th variable which shows currently active or dead and act accordingly.

check for the D status and modify the script accordingly and you should be done.
0
 

Author Comment

by:dealclickcouk
ID: 19636819

ygoutham, thx again, but what I meant was is it actually possible to kill these D processes, because they are uninterruptable ?
0
 
LVL 14

Expert Comment

by:ygoutham
ID: 19636867
kill -9 kills a running or a dead process.  should not be a problem
0
 

Author Comment

by:dealclickcouk
ID: 19636877

ygoutham, just going through the code, just to check is the part with the sql query, is that something internal?

ie if I remove those two lines will it still function:

        $p = mysql_query("select username from login where pid = $expid order by pidtime desc limit 1");
        while ($c_row = mysql_fetch_row($p) ){foreach($c_row as $field8) {$user=$field8; }}


0
 
LVL 14

Expert Comment

by:ygoutham
ID: 19636892
yes.  i was trying to check from my login table as to who is using too much of system resources and therefore the comment before the block.  

in fact all references to mysql can be safely removed.
0
 
LVL 14

Expert Comment

by:ygoutham
ID: 19636895
in fact you can just populate the array with pid numbers for $pa (which is my pid array and pardon my quixotic ways of naming variables), and you can pick up the 10th variable in your $st array (with $st = $st.$pa[9].","; )

that would give an array of pid numbers with corresponding live or D statuses and proceed from there.
0
 

Author Comment

by:dealclickcouk
ID: 19636967
ygoutham: ,amy thx for this, it looks like a winner, could this be adjusted to kill any process using too much CPU as well?
0
 
LVL 14

Expert Comment

by:ygoutham
ID: 19637049
you need to specify what process to kill.  if you write something and end up killing "init" the entire system crashes!!!
0
 

Author Comment

by:dealclickcouk
ID: 19637065

OK, but using your existing script , ie only killing httpd child process, how would u identify the amount of CPU being used.  

The issue is when I look at the process list in top I see things like this:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12551 mysql     15   0  557m 162m 3204 S 51.5  8.1   7:08.89 mysqld
15617 apache    21   0 47068  36m 7384 R 99.9  1.8   0:07.95 httpd

as you see here PID 15617 is using 99.9% of the CPU which cant be good, and sometimes I see several of these processes which inturn kill the server for other processes
0
 
LVL 14

Expert Comment

by:ygoutham
ID: 19637294
he, you can do a multiple check by taking the pid and seeing if it has a D status and if the CPU usage and time run is beyond a particular number and then proceed to give the kill command.  when you combine all the three then it makes sense.  if you pick up only one aspect to proceed, then as you put it, it becomes a dangerous tool to be run.
0
 

Author Comment

by:dealclickcouk
ID: 19637442
thats exact;y what I want to do, but I'm not sure, using your script which paramter would be the CPU useage...
0
 
LVL 15

Expert Comment

by:DonConsolio
ID: 19638076
You could try to impose limits on apache and/or php

php.ini: max_execution_time=xxx

apache: RLimitCPU, RLimitMEM, RLimitNPROC (http://httpd.apache.org/docs/2.2/mod/core.html#rlimitcpu)

0
 
LVL 14

Expert Comment

by:ygoutham
ID: 19638376
unfortunately it does not have the cpu usage time.  probably you need to include only the memory usage and the status "D" with the time run to see if it exceeds a particular time limit...  is that an option or you want to specifically check on CPU usage???
0
 
LVL 14

Accepted Solution

by:
ygoutham earned 2000 total points
ID: 19638397
if it is that kind of a command you want to run then try using something like

top -b -n 1 | grep httpd

this would give you the exact top listing that you are looking for but only the httpd processes and you can count the number of columns to see which ones you want to pick up...  a work around if that satisfies the need
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It is possible to boost certain documents at query time in Solr. Query time boosting can be a powerful resource for finding the most relevant and "best" content. Of course the more information you index, the more fields you will be able to use for y…
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Suggested Courses
Course of the Month10 days, 23 hours left to enroll

571 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question