asked on

Where does the big load of a web server come from ?

Hello,

I have this chain of top outputs here: http://www.hostme.ro/top_mare
Also, I have a single output of top, at another time, here: http://www.hostme.ro/top2

Which process(es) cause the big load ? How can I find out more details about what exactly is taking down my server ?

Thank you.

ngailfus

Digg.com or Slashdot?

softexp23

ASKER

nqailfus, what do you mean ?

ngailfus

I was referring to websites that tend to drive a lot of traffic to other websites.

softexp23

ASKER

Oh, yes, i understand. But I don't think that's it. I want to find a way with a script or a piece of code to find out exactly what and how is increasing the load, which processes and how much does each contribute to the load. Because from top you can't tell (see top2 output from example)

Also if it's httpd, i need to know which are the users (the web users, the virtual hosts, not the unix users) which consume most cpu or i/o.

softexp23

ASKER

For answering my question, also please consult the links with my specific top outputs

evilaim

Well, before anyone comes with some weird answer, I'll tell you that the server loads are what you're running. It would seem that your Apache (httpd) and your php server are taking on A LOT of memory. I would try and kill the 2 processes and see if that helps at all. I'd like to see a top/uptime without httpd/apache and php running.

noci

Well your top output indicate there are >100 concurrent active processes, of wich a lot are httpd and exim.

That probably means you get a lot of requests for web pages and quite some mail gets delivered.

Both applications have logfiles.
Please check the apache access_log (if not available, then turn them on) and analyze those.
Then you known what URL's are hit, that might give a clue.

Same for exim, exim logs into own logfiles or into syslog. That also could tell what mail you receive.

Based on that knowledge you might come to the conclusion that these contain unwanted items, then you can take measures, or decide that it's ok....

Besides that you could try to take a snapshot of your network traffic and see what hits your server.

softexp23

ASKER

noci, if I try to tail -f the acess_log of apache or the exim_mainlog, I get a lot of output which I can't decipher on the fly. And the logs doesn't tell me how much the cpu is busy processing one request or another. Mod_status of apache is closer to what I need, but it doesn't make much sense to me. I don't really understand his cpu related info.

I think I need 3 things:
1. On a period of 1minute or 60mins to see how the cpu time was sliced. Also I don't understand the difference between the cpu (percent) and cpu time from the top column. What contributes to the load, a process with high cpu column in top or with high cpu time ?

2. On that period which processes were active trying to keep the cpu busy

3. Sometimes the load comes from the big i/o time. How can I know on a particular moment which processes are writing like crazy on my hard disk.

ASKER CERTIFIED SOLUTION

noci

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

noci

w.r.t. tail -f if it is too fast then there is a lot of traffic.

but tail can also be used to make a listing of the lasst say 100 lines:

tail -n 100 access_log

softexp23

ASKER

Thanks for your detailed answer.

softexp23

ASKER

noci: "below this the process had 0% (or a small fraction of CPU used.) , the cpu column addsup to 21%,
so there is still ~8% not accounted for, maybe small fractions of activity further on in the list"

Thanks noci. One more thing...in the output you've analyzed, if the cpu column addsup to 21%, there is still 79% left, right ? (not ~8%). So does this 79% come from little pieces under 1% ? Because the top doesn't show it.

In your opinion, looking at top2 output, what caused the huge load ?

noci

23% user
4% system
2% non-normal prio
------ +
29% (31% if fractions are taken into account) CPU is BUSY
29% of the CPU is realy used
Sum of per process slices => 21% ==> 8 (10) % BUSY time is split over many processes but not realy accounted for.

51% = idle 18% = total 69% CPU is doing nothing worthwile.
(= roughly 70%)

From top2 I realy can;t tell.

It might well be swapping.
That is mostly hard to see, except in a 'vmstat 10' ( in place of 10 you can name another interval in seconds.)
But it will cause long queues for the swapspace. (and other IO if also to the same physical disk)

It will be a job of trying to locate a bottleneck and then remove that.
It might work very good if you are able to limit the ammount of processes. F.e. do not accept unlimited apache/exim links.

It might be better to keep a few on hold and have better throughput.

softexp23

ASKER

Thank you noci.