Solved

Server CPU high utilization, appears to be Apache, how do I determine the cause

Posted on 2014-07-28
19
1,061 Views
Last Modified: 2014-08-11
We have  multiple servers behind a load balancer. Currently our usage is low due to it being summertime and the majority of our websites are used by teachers and students at various schools. For example during the school  year we get around 30k users a day, currently we have about 3k users a day.

I have recently begun to notice that the CPU usage on the servers behind the load balancer is much higher than expected, (.42,.41,.35)  and it appears that a couple of apache processes are causing it, running between 7% and 14% cpu usage, there constantly 1 to 2 of these while the rest of the apache processes are at about 0-4% cpu.

In the past when we had high utilization, it only ever reached about these levels. So I am concerned that when we get back to 10x the amount of users per day that we are going to have server overload.

Since there are multiple websites on the servers, I don't know how to tell which site is causing this.

I have enabled Server Status for the server but what it is showing on the processes that are using a lot of CPU doesn't make sense, for example, the stylesheet of one of the sites, or a page that all it does is a redirect...

Is there any way to get better insight into what page of the apache thread/process is causing the processor utilization to spike?
0
Comment
Question by:jrm213jrm213
  • 8
  • 6
  • 3
  • +1
19 Comments
 
LVL 12

Expert Comment

by:Kent W
ID: 40224671
How many processors to you have?  (cat /proc/cpuinfo)

That is the only way to gauge relevance with your "uptime" output.  
For instance, your system is maxed if you have 4 cores, but only 1/2 maxed if you have 8, based on your current cpu usage.

You can to a "top" and list things by mem / cpu, etc. to find out where the hogs are.  

If it is apache, then this is most likely a symptom of someone scanning or trying to exploit your server in some way, especially if you have not changed any code.  Watching your web logs will give you insight as to who and how it's being hit.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40224694
Most commonly it is PHP and/or DEFLATE
Can you detail your setup a bit more?
Which mpm?
Which apache version?
Distribution version?
Disable server status. There is a recent security bug in it to patch... It will not help at all.

Stylesheet? - is it dynamic?
Redirect page? - maybe you add google rank to some XXX site?
0
 
LVL 13

Assisted Solution

by:duncanb7
duncanb7 earned 200 total points
ID: 40224726
Besides buying commerical software, i think this is 25  good  linux command to monitor server performance at this site, http://www.tecmint.com/command-line-tools-to-monitor-linux-performance/

Hope understand your question completely. If not, pls pt it out

Duncan
0
 
LVL 13

Assisted Solution

by:duncanb7
duncanb7 earned 200 total points
ID: 40224743
And with those commands to compose shell script to do looping or monitor whole system that is better than try the third-party software

Duncan
0
 
LVL 17

Author Comment

by:jrm213jrm213
ID: 40224825
Hello, here is some more information

@mugojava

Server Version: Apache/2.2.16 (Debian) mod_jk/1.2.30 PHP/5.3.3-7+squeeze19 with Suhosin-Patch

There are 4 CPU's [0-3]
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 4
model name      : Quad-Core AMD Opteron(tm) Processor 2374 HE
stepping        : 2
cpu MHz         : 97529.786
cache size      : 512 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu de tsc msr pae cx8 cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
bogomips        : 4400.17
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

Open in new window


I will look a the logs, they are huge, access log is currently 6gb. Error log is only 175kb.


@gheist
The stylesheet is not dynamic, it is just a .css file.
The redirect page, basically is a page that sets some location cookies and redirects back to the site or if the user is from specific states it redirects them to a different website.
0
 
LVL 17

Author Comment

by:jrm213jrm213
ID: 40224865
It is Apache that has the high cpu usage, I just don't know why.  I am not seeing anything strange in the access or error logs.
0
 
LVL 12

Assisted Solution

by:Kent W
Kent W earned 175 total points
ID: 40224876
One easy method I use to find out which site is being slammed (if that is the case), and I'll assume your log files are in
/var/log/httpd
cd into your web log directly, and do an
ls -ltra
This will list everything in by modification time, and you can see which log is being written to continually.
(drop the -r if you would rather have the newest at top. I prefer bottom)

You can also do a
watch "ls -ltra"
to watch the list with a default 2 second update.  

Then tail that puppy with a -f ollow to watch what and how the culprit his hitting it -
tail web-log-to-watch-access.log -f
0
 
LVL 61

Assisted Solution

by:gheist
gheist earned 125 total points
ID: 40224879
Apache should be
2.2.16-6+squeeze12 and not older.

Do you use mod_jk at all??? It can be replaced by mod_proxy in apache 2.2
Do you think you have time to convert apache to apache-worker and mod_fcgid and php-cgi?
0
 
LVL 61

Expert Comment

by:gheist
ID: 40224907
You must rotate logs. I think in debian if you install logrotate package logs are rotated next night.
0
Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

 
LVL 12

Expert Comment

by:Kent W
ID: 40224937
I agree with gheist, try rotating those logs so apache is not trying to append to a 6G file.  If, you see nothing else in your access logs or other indications there's something mischievous going on.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40225146
You need cron in addition to logrotate too.
0
 
LVL 17

Author Comment

by:jrm213jrm213
ID: 40225322
@gheist
Yes and no on mod_jk. It was used for a few of our legacy sites, until about a month ago when we moved those sites to their own server. I don't believe anything is using it on these servers anymore.

I am not sure how to convert to apache-worker, mod_fcgid or php-cgi

What is the benefit of making those changes.

We do rotate the logs, monthly and we keep a years worth of logs.
0
 
LVL 17

Author Comment

by:jrm213jrm213
ID: 40225334
So you guys think the problem might just be it's end of month and the log is big? I will be honest, I didn't set the log rotation up so I have no idea how to change it. I am guessing it is maybe as easy as moving it from cron.monthly to cron.weekly? If so will it still keep a years worth or just 12 weeks? I guess that is neither here nor there at the moment.

The logs are actually all in their own directories specific to each site.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40225344
Lets wait for 24h to see if log rotation solves the problem.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40233908
Did you manage with log rotation? is it better now?
0
 
LVL 17

Accepted Solution

by:
jrm213jrm213 earned 0 total points
ID: 40243515
The log rotation did not seem to help. I have been slowly tracking things down and have managed to identify a couple of attacks that were occurring that were repeatedly hitting the same valid files on the server, files I would normally expect to see in the logs. It took a while but after sifting through a couple million rows of logs I was able to identify the attacks and stop them with a combination of .htaccess rules and turning off some features of the blogs. It's tough to identify the good from the bad, when in your situation it is normal to have a couple hundred thousand requests from the same IP's daily because they are coming from schools and businesses. So it's normal to see 50,000 post requests to the login script etc.

The files I found that were specifically being targeted were
wp-login.php   - someone was trying to brute force attack a user. I denied access to this page via .htaccess ruleon the blogs that we don't allow users to register on, and added an environment check to allow our work IP Address to access it.

xmlrpc.php - I don't know what they were trying to do with this but I deleted this file from the blogs
I also edited the wp-config.php and disabled wp-cron (I manually update the blogs and plugins, and we don't schedule posts, so I think that should be fine.) I don't know if this was a problem on my machine but I read about it being used as an exploit so figured I would disable it now.

I am still getting fake posts to the comments on the blogs on a regular basis, even though they are not getting through because we require moderation of all comments and have some anti-spam stuff in place, we are still getting too many, I am unsure how to stop this one since we do want people to comment on our posts... but that is a completely different question.

So even though wordpress was not allowing any of the attacks to actually do anything, no comments were getting through, people were still beating on the server in short waves causing the cpu spikes. Currently things are sitting at a 15 minute average of between .10 and .15 which is more along the lines of what I expect of the system under the expected current load, and I no longer see any apache threads reaching above 10% cpu or if they do, it's only for a second and then they are back to 2-5% usually.

I will continue to monitor the servers more closely, but it appears that my issue was jerks on the internet.

I appreciate everyone's time and help with this issue.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40247566
Attacks targed insecure wordpress. If you have no wordpress just disregard them. If you indeed use wordpress - rename those admin files to long random name and force over SSL.
If you have hard time containing CPU usage - evaluate if mod_fcgid + apache-worker is a viable option for you (examples in fcgid documentation talk about using it with php)
0
 
LVL 17

Author Closing Comment

by:jrm213jrm213
ID: 40252656
The solution ended up being that we had people attacking certain sites on the server. Even though the attacks were not causing problems with the sites and were as far as I can tell all blocked, they were still causing a good amount of load on the server.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40254084
Since you have shared hosting you ahve all rights to go after insecure wordpress owners (or at least offer them 1% cheaper wordpress virtual instance that you patch)
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
good comptia a+ teacher? 4 75
Open Camera IP 8 94
How to setup xrdp on Redhat? 2 65
Linux operating system 12 66
Linux users are sometimes dumbfounded by the severe lack of documentation on a topic. Sometimes, the documentation is copious, but other times, you end up with some obscure "it varies depending on your distribution" over and over when searching for …
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

24 Experts available now in Live!

Get 1:1 Help Now