Solved

Server CPU high utilization, appears to be Apache, how do I determine the cause

Posted on 2014-07-28
19
1,155 Views
Last Modified: 2014-08-11
We have  multiple servers behind a load balancer. Currently our usage is low due to it being summertime and the majority of our websites are used by teachers and students at various schools. For example during the school  year we get around 30k users a day, currently we have about 3k users a day.

I have recently begun to notice that the CPU usage on the servers behind the load balancer is much higher than expected, (.42,.41,.35)  and it appears that a couple of apache processes are causing it, running between 7% and 14% cpu usage, there constantly 1 to 2 of these while the rest of the apache processes are at about 0-4% cpu.

In the past when we had high utilization, it only ever reached about these levels. So I am concerned that when we get back to 10x the amount of users per day that we are going to have server overload.

Since there are multiple websites on the servers, I don't know how to tell which site is causing this.

I have enabled Server Status for the server but what it is showing on the processes that are using a lot of CPU doesn't make sense, for example, the stylesheet of one of the sites, or a page that all it does is a redirect...

Is there any way to get better insight into what page of the apache thread/process is causing the processor utilization to spike?
0
Comment
Question by:jrm213jrm213
  • 8
  • 6
  • 3
  • +1
19 Comments
 
LVL 12

Expert Comment

by:Kent W
ID: 40224671
How many processors to you have?  (cat /proc/cpuinfo)

That is the only way to gauge relevance with your "uptime" output.  
For instance, your system is maxed if you have 4 cores, but only 1/2 maxed if you have 8, based on your current cpu usage.

You can to a "top" and list things by mem / cpu, etc. to find out where the hogs are.  

If it is apache, then this is most likely a symptom of someone scanning or trying to exploit your server in some way, especially if you have not changed any code.  Watching your web logs will give you insight as to who and how it's being hit.
0
 
LVL 62

Expert Comment

by:gheist
ID: 40224694
Most commonly it is PHP and/or DEFLATE
Can you detail your setup a bit more?
Which mpm?
Which apache version?
Distribution version?
Disable server status. There is a recent security bug in it to patch... It will not help at all.

Stylesheet? - is it dynamic?
Redirect page? - maybe you add google rank to some XXX site?
0
 
LVL 13

Assisted Solution

by:duncanb7
duncanb7 earned 200 total points
ID: 40224726
Besides buying commerical software, i think this is 25  good  linux command to monitor server performance at this site, http://www.tecmint.com/command-line-tools-to-monitor-linux-performance/

Hope understand your question completely. If not, pls pt it out

Duncan
0
Migrating Your Company's PCs

To keep pace with competitors, businesses must keep employees productive, and that means providing them with the latest technology. This document provides the tips and tricks you need to help you migrate an outdated PC fleet to new desktops, laptops, and tablets.

 
LVL 13

Assisted Solution

by:duncanb7
duncanb7 earned 200 total points
ID: 40224743
And with those commands to compose shell script to do looping or monitor whole system that is better than try the third-party software

Duncan
0
 
LVL 17

Author Comment

by:jrm213jrm213
ID: 40224825
Hello, here is some more information

@mugojava

Server Version: Apache/2.2.16 (Debian) mod_jk/1.2.30 PHP/5.3.3-7+squeeze19 with Suhosin-Patch

There are 4 CPU's [0-3]
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 4
model name      : Quad-Core AMD Opteron(tm) Processor 2374 HE
stepping        : 2
cpu MHz         : 97529.786
cache size      : 512 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu de tsc msr pae cx8 cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
bogomips        : 4400.17
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

Open in new window


I will look a the logs, they are huge, access log is currently 6gb. Error log is only 175kb.


@gheist
The stylesheet is not dynamic, it is just a .css file.
The redirect page, basically is a page that sets some location cookies and redirects back to the site or if the user is from specific states it redirects them to a different website.
0
 
LVL 17

Author Comment

by:jrm213jrm213
ID: 40224865
It is Apache that has the high cpu usage, I just don't know why.  I am not seeing anything strange in the access or error logs.
0
 
LVL 12

Assisted Solution

by:Kent W
Kent W earned 175 total points
ID: 40224876
One easy method I use to find out which site is being slammed (if that is the case), and I'll assume your log files are in
/var/log/httpd
cd into your web log directly, and do an
ls -ltra
This will list everything in by modification time, and you can see which log is being written to continually.
(drop the -r if you would rather have the newest at top. I prefer bottom)

You can also do a
watch "ls -ltra"
to watch the list with a default 2 second update.  

Then tail that puppy with a -f ollow to watch what and how the culprit his hitting it -
tail web-log-to-watch-access.log -f
0
 
LVL 62

Assisted Solution

by:gheist
gheist earned 125 total points
ID: 40224879
Apache should be
2.2.16-6+squeeze12 and not older.

Do you use mod_jk at all??? It can be replaced by mod_proxy in apache 2.2
Do you think you have time to convert apache to apache-worker and mod_fcgid and php-cgi?
0
 
LVL 62

Expert Comment

by:gheist
ID: 40224907
You must rotate logs. I think in debian if you install logrotate package logs are rotated next night.
0
 
LVL 12

Expert Comment

by:Kent W
ID: 40224937
I agree with gheist, try rotating those logs so apache is not trying to append to a 6G file.  If, you see nothing else in your access logs or other indications there's something mischievous going on.
0
 
LVL 62

Expert Comment

by:gheist
ID: 40225146
You need cron in addition to logrotate too.
0
 
LVL 17

Author Comment

by:jrm213jrm213
ID: 40225322
@gheist
Yes and no on mod_jk. It was used for a few of our legacy sites, until about a month ago when we moved those sites to their own server. I don't believe anything is using it on these servers anymore.

I am not sure how to convert to apache-worker, mod_fcgid or php-cgi

What is the benefit of making those changes.

We do rotate the logs, monthly and we keep a years worth of logs.
0
 
LVL 17

Author Comment

by:jrm213jrm213
ID: 40225334
So you guys think the problem might just be it's end of month and the log is big? I will be honest, I didn't set the log rotation up so I have no idea how to change it. I am guessing it is maybe as easy as moving it from cron.monthly to cron.weekly? If so will it still keep a years worth or just 12 weeks? I guess that is neither here nor there at the moment.

The logs are actually all in their own directories specific to each site.
0
 
LVL 62

Expert Comment

by:gheist
ID: 40225344
Lets wait for 24h to see if log rotation solves the problem.
0
 
LVL 62

Expert Comment

by:gheist
ID: 40233908
Did you manage with log rotation? is it better now?
0
 
LVL 17

Accepted Solution

by:
jrm213jrm213 earned 0 total points
ID: 40243515
The log rotation did not seem to help. I have been slowly tracking things down and have managed to identify a couple of attacks that were occurring that were repeatedly hitting the same valid files on the server, files I would normally expect to see in the logs. It took a while but after sifting through a couple million rows of logs I was able to identify the attacks and stop them with a combination of .htaccess rules and turning off some features of the blogs. It's tough to identify the good from the bad, when in your situation it is normal to have a couple hundred thousand requests from the same IP's daily because they are coming from schools and businesses. So it's normal to see 50,000 post requests to the login script etc.

The files I found that were specifically being targeted were
wp-login.php   - someone was trying to brute force attack a user. I denied access to this page via .htaccess ruleon the blogs that we don't allow users to register on, and added an environment check to allow our work IP Address to access it.

xmlrpc.php - I don't know what they were trying to do with this but I deleted this file from the blogs
I also edited the wp-config.php and disabled wp-cron (I manually update the blogs and plugins, and we don't schedule posts, so I think that should be fine.) I don't know if this was a problem on my machine but I read about it being used as an exploit so figured I would disable it now.

I am still getting fake posts to the comments on the blogs on a regular basis, even though they are not getting through because we require moderation of all comments and have some anti-spam stuff in place, we are still getting too many, I am unsure how to stop this one since we do want people to comment on our posts... but that is a completely different question.

So even though wordpress was not allowing any of the attacks to actually do anything, no comments were getting through, people were still beating on the server in short waves causing the cpu spikes. Currently things are sitting at a 15 minute average of between .10 and .15 which is more along the lines of what I expect of the system under the expected current load, and I no longer see any apache threads reaching above 10% cpu or if they do, it's only for a second and then they are back to 2-5% usually.

I will continue to monitor the servers more closely, but it appears that my issue was jerks on the internet.

I appreciate everyone's time and help with this issue.
0
 
LVL 62

Expert Comment

by:gheist
ID: 40247566
Attacks targed insecure wordpress. If you have no wordpress just disregard them. If you indeed use wordpress - rename those admin files to long random name and force over SSL.
If you have hard time containing CPU usage - evaluate if mod_fcgid + apache-worker is a viable option for you (examples in fcgid documentation talk about using it with php)
0
 
LVL 17

Author Closing Comment

by:jrm213jrm213
ID: 40252656
The solution ended up being that we had people attacking certain sites on the server. Even though the attacks were not causing problems with the sites and were as far as I can tell all blocked, they were still causing a good amount of load on the server.
0
 
LVL 62

Expert Comment

by:gheist
ID: 40254084
Since you have shared hosting you ahve all rights to go after insecure wordpress owners (or at least offer them 1% cheaper wordpress virtual instance that you patch)
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

SSH (Secure Shell) - Tips and Tricks As you all know SSH(Secure Shell) is a network protocol, which we use to access/transfer files securely between two networked devices. SSH was actually designed as a replacement for insecure protocols that sen…
If you've heard about htaccess and it sounds like it does what you want, but you're not sure how it works... well, you're in the right place. Read on. Some Basics #1. It's a file and its filename is .htaccess (yes, with a dot in the front). #…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question