Server CPU high utilization, appears to be Apache, how do I determine the cause

We have  multiple servers behind a load balancer. Currently our usage is low due to it being summertime and the majority of our websites are used by teachers and students at various schools. For example during the school  year we get around 30k users a day, currently we have about 3k users a day.

I have recently begun to notice that the CPU usage on the servers behind the load balancer is much higher than expected, (.42,.41,.35)  and it appears that a couple of apache processes are causing it, running between 7% and 14% cpu usage, there constantly 1 to 2 of these while the rest of the apache processes are at about 0-4% cpu.

In the past when we had high utilization, it only ever reached about these levels. So I am concerned that when we get back to 10x the amount of users per day that we are going to have server overload.

Since there are multiple websites on the servers, I don't know how to tell which site is causing this.

I have enabled Server Status for the server but what it is showing on the processes that are using a lot of CPU doesn't make sense, for example, the stylesheet of one of the sites, or a page that all it does is a redirect...

Is there any way to get better insight into what page of the apache thread/process is causing the processor utilization to spike?
LVL 17
jrm213jrm213Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Kent WSr. Network / Systems AdminCommented:
How many processors to you have?  (cat /proc/cpuinfo)

That is the only way to gauge relevance with your "uptime" output.  
For instance, your system is maxed if you have 4 cores, but only 1/2 maxed if you have 8, based on your current cpu usage.

You can to a "top" and list things by mem / cpu, etc. to find out where the hogs are.  

If it is apache, then this is most likely a symptom of someone scanning or trying to exploit your server in some way, especially if you have not changed any code.  Watching your web logs will give you insight as to who and how it's being hit.
0
gheistCommented:
Most commonly it is PHP and/or DEFLATE
Can you detail your setup a bit more?
Which mpm?
Which apache version?
Distribution version?
Disable server status. There is a recent security bug in it to patch... It will not help at all.

Stylesheet? - is it dynamic?
Redirect page? - maybe you add google rank to some XXX site?
0
duncanb7Commented:
Besides buying commerical software, i think this is 25  good  linux command to monitor server performance at this site, http://www.tecmint.com/command-line-tools-to-monitor-linux-performance/

Hope understand your question completely. If not, pls pt it out

Duncan
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

duncanb7Commented:
And with those commands to compose shell script to do looping or monitor whole system that is better than try the third-party software

Duncan
0
jrm213jrm213Author Commented:
Hello, here is some more information

@mugojava

Server Version: Apache/2.2.16 (Debian) mod_jk/1.2.30 PHP/5.3.3-7+squeeze19 with Suhosin-Patch

There are 4 CPU's [0-3]
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 4
model name      : Quad-Core AMD Opteron(tm) Processor 2374 HE
stepping        : 2
cpu MHz         : 97529.786
cache size      : 512 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu de tsc msr pae cx8 cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
bogomips        : 4400.17
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

Open in new window


I will look a the logs, they are huge, access log is currently 6gb. Error log is only 175kb.


@gheist
The stylesheet is not dynamic, it is just a .css file.
The redirect page, basically is a page that sets some location cookies and redirects back to the site or if the user is from specific states it redirects them to a different website.
0
jrm213jrm213Author Commented:
It is Apache that has the high cpu usage, I just don't know why.  I am not seeing anything strange in the access or error logs.
0
Kent WSr. Network / Systems AdminCommented:
One easy method I use to find out which site is being slammed (if that is the case), and I'll assume your log files are in
/var/log/httpd
cd into your web log directly, and do an
ls -ltra
This will list everything in by modification time, and you can see which log is being written to continually.
(drop the -r if you would rather have the newest at top. I prefer bottom)

You can also do a
watch "ls -ltra"
to watch the list with a default 2 second update.  

Then tail that puppy with a -f ollow to watch what and how the culprit his hitting it -
tail web-log-to-watch-access.log -f
0
gheistCommented:
Apache should be
2.2.16-6+squeeze12 and not older.

Do you use mod_jk at all??? It can be replaced by mod_proxy in apache 2.2
Do you think you have time to convert apache to apache-worker and mod_fcgid and php-cgi?
0
gheistCommented:
You must rotate logs. I think in debian if you install logrotate package logs are rotated next night.
0
Kent WSr. Network / Systems AdminCommented:
I agree with gheist, try rotating those logs so apache is not trying to append to a 6G file.  If, you see nothing else in your access logs or other indications there's something mischievous going on.
0
gheistCommented:
You need cron in addition to logrotate too.
0
jrm213jrm213Author Commented:
@gheist
Yes and no on mod_jk. It was used for a few of our legacy sites, until about a month ago when we moved those sites to their own server. I don't believe anything is using it on these servers anymore.

I am not sure how to convert to apache-worker, mod_fcgid or php-cgi

What is the benefit of making those changes.

We do rotate the logs, monthly and we keep a years worth of logs.
0
jrm213jrm213Author Commented:
So you guys think the problem might just be it's end of month and the log is big? I will be honest, I didn't set the log rotation up so I have no idea how to change it. I am guessing it is maybe as easy as moving it from cron.monthly to cron.weekly? If so will it still keep a years worth or just 12 weeks? I guess that is neither here nor there at the moment.

The logs are actually all in their own directories specific to each site.
0
gheistCommented:
Lets wait for 24h to see if log rotation solves the problem.
0
gheistCommented:
Did you manage with log rotation? is it better now?
0
jrm213jrm213Author Commented:
The log rotation did not seem to help. I have been slowly tracking things down and have managed to identify a couple of attacks that were occurring that were repeatedly hitting the same valid files on the server, files I would normally expect to see in the logs. It took a while but after sifting through a couple million rows of logs I was able to identify the attacks and stop them with a combination of .htaccess rules and turning off some features of the blogs. It's tough to identify the good from the bad, when in your situation it is normal to have a couple hundred thousand requests from the same IP's daily because they are coming from schools and businesses. So it's normal to see 50,000 post requests to the login script etc.

The files I found that were specifically being targeted were
wp-login.php   - someone was trying to brute force attack a user. I denied access to this page via .htaccess ruleon the blogs that we don't allow users to register on, and added an environment check to allow our work IP Address to access it.

xmlrpc.php - I don't know what they were trying to do with this but I deleted this file from the blogs
I also edited the wp-config.php and disabled wp-cron (I manually update the blogs and plugins, and we don't schedule posts, so I think that should be fine.) I don't know if this was a problem on my machine but I read about it being used as an exploit so figured I would disable it now.

I am still getting fake posts to the comments on the blogs on a regular basis, even though they are not getting through because we require moderation of all comments and have some anti-spam stuff in place, we are still getting too many, I am unsure how to stop this one since we do want people to comment on our posts... but that is a completely different question.

So even though wordpress was not allowing any of the attacks to actually do anything, no comments were getting through, people were still beating on the server in short waves causing the cpu spikes. Currently things are sitting at a 15 minute average of between .10 and .15 which is more along the lines of what I expect of the system under the expected current load, and I no longer see any apache threads reaching above 10% cpu or if they do, it's only for a second and then they are back to 2-5% usually.

I will continue to monitor the servers more closely, but it appears that my issue was jerks on the internet.

I appreciate everyone's time and help with this issue.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
gheistCommented:
Attacks targed insecure wordpress. If you have no wordpress just disregard them. If you indeed use wordpress - rename those admin files to long random name and force over SSL.
If you have hard time containing CPU usage - evaluate if mod_fcgid + apache-worker is a viable option for you (examples in fcgid documentation talk about using it with php)
0
jrm213jrm213Author Commented:
The solution ended up being that we had people attacking certain sites on the server. Even though the attacks were not causing problems with the sites and were as far as I can tell all blocked, they were still causing a good amount of load on the server.
0
gheistCommented:
Since you have shared hosting you ahve all rights to go after insecure wordpress owners (or at least offer them 1% cheaper wordpress virtual instance that you patch)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Apache Web Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.