We are currently having problems running squirrelmail on a Redhat Linux 3.0ES box. We have approximately 1500 users on this system and around 200 users using it at any given time. The load average spikes to over 20 during the day and the httpd demon pegs the CPU at 100% for an extended period of time. TOP looks like this:
11:12:13 up 30 days, 1:23, 1 user, load average: 15.32, 7.89, 7.18
130 processes: 107 sleeping, 23 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 98.5% 0.0% 1.3% 0.0% 0.0% 0.0% 0.0%
cpu00 98.2% 0.0% 1.5% 0.0% 0.1% 0.0% 0.0%
cpu01 98.8% 0.0% 1.1% 0.0% 0.0% 0.0% 0.0%
Mem: 3868520k av, 2423932k used, 1444588k free, 0k shrd, 268992k buff
631976k active, 1260996k inactive
Swap: 8193140k av, 0k used, 8193140k free 1288976k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
7677 apache 19 0 16624 16M 7372 R 31.5 0.4 0:03 0 httpd
7224 apache 20 0 17052 16M 7392 S 23.8 0.4 0:40 0 httpd
7754 apache 21 0 16668 16M 7372 R 7.1 0.4 0:05 1 httpd
6462 apache 21 0 17656 17M 7468 R 1.9 0.4 1:23 0 httpd
6736 apache 21 0 17400 16M 7432 R 1.9 0.4 0:46 1 httpd
6827 apache 21 0 17352 16M 7436 R 1.9 0.4 1:03 0 httpd
6905 apache 21 0 18092 17M 7652 R 1.9 0.4 0:58 1 httpd
7183 apache 21 0 17260 16M 7424 R 1.9 0.4 0:30 0 httpd
7199 apache 21 0 17536 17M 7432 R 1.9 0.4 0:26 1 httpd
7204 apache 21 0 17404 16M 7424 R 1.9 0.4 0:39 0 httpd
7395 apache 21 0 17160 16M 7428 R 1.9 0.4 0:23 1 httpd
7404 apache 22 0 17172 16M 7396 R 1.9 0.4 0:19 1 httpd
7405 apache 21 0 17448 17M 7408 R 1.9 0.4 0:20
And my httpd.conf file looks like this:
# prefork MPM
# StartServers: number of server processes to start
# MinSpareServers: minimum number of server processes which are kept spare
# MaxSpareServers: maximum number of server processes which are kept spare
# MaxClients: maximum number of server processes allowed to start
# MaxRequestsPerChild: maximum number of requests a server process serves
<IfModule prefork.c>
StartServers 20
MinSpareServers 20
MaxSpareServers 35
MaxClients 250
MaxRequestsPerChild 10000
</IfModule>
I have increased the settings from the default and it doesn’t seem to help, I have also tweaked squirrelmail as much as I can too…
Any help would be much appreciated.
?
AnonymouslemmingAccepted Solution on 2005-11-17 at 12:28:14ID: 15314908
Well, my first thought was that you're waiting on IO, but from the top you've posted, that doesn't look to be a problem.
Are you using PHP as a module, or in CGI mode? You've got 2GB ram in the machine - that should be enough.
This next bit isn't strictly related, but it might help in the future.You say that you have 200 users on at any given time. With that in mind, I'd make the following changes:
StartServers 150
MinSpareServers 20
MaxSpareServers 50
MaxClients 500
MaxRequestsPerChield 10000
Setting it up that way, you're likely to generally have enough HTTPD processes already running when a user connects. At your settings, your system will be spending a lot of time forking off new processes and killing off old ones. To explain a bit better, I'll write a timeline as the server would see a busy day. This isn't exactly right as Apache tries to be more intelligent than this, but it's close.
When I start, I kick off 50 processes - this is my 20 start, plus 30 spare (more than MinSpare but less than MaxSpare). Two minutes later, 100 users connect, so I fork-exec another 50 processes. A minute after that another 100 users connect so I fork-exec another 100 processes. 5 minutes after than, 50 users depart, so I kill off 15 processes to get back down to my MaxSpare. A minute later, 15 users connect, so I fork-exec 35 processes (enough for the users, and 20 spare).
You can see from the above that if you don't start enough servers and have enough spare, you'll spend a lot of system time in the fork-exec world, and that means less user time to actually Do Work.
Having said all of this, these changes won't really improve the problems you're seeing. You've got just a few processes hogging a huge amount of the CPU. How long have these been running for ? If they are long running processes, try decreasing the MaxRequestsPerChild.
Also, what are your KeepAlive settings?