asked on

Why does my php-fpm keep taking up ram

I am running a Plesk server and am using php-fpm with nginx.

I've looked for a lot of different recommended settings and no matter what I change it seems to still have this issue.

I'm on an Ubuntu 18.04 and I have about 40 websites running with 8cpus and 32 gb of ram on a cloud server.

The biggest suck is the mysql queries. I am looking at moving the databases to RDS as plesk has a plugin to work with that.

The main issue is that I have a lot of php-fpm processes for these sites that are just sleeping and they just suck down more and more ram until it hits 100% usage. I have to go in by hand every few hours and kill the sleeping processes.

I want to have the processes just die when they are not in use.

What is recommended here?

gr8gonzo

Can you share your php-fpm configs?

Or at the very least, share the value of pm and pm.process_idle_timeout.

If pm isn't already set to ondemand, try using that (you'll need to restart fpm after changing the config), and see if the workers are being cleaned up right away.

If they are still sticking around after switching to ondemand, make sure you're looking at the right config / php-fpm install (sometimes managed control panels like plesk will have their own separate packages and you can end up with multiple copies of PHP installed).

Dustin King

ASKER

I do have pm set to ondemand be default.

I'm not seeing an option in plesk to actually set the process_idel_timeout I can add it in the additional directives.

I added:
[php-fpm-pool-settings]
pm.process_idle_timeout = 10s

Inside of the additional directives for a specific domain to confirm if it will work or not.

Dustin King

ASKER

Here is a screenshot of the settings here. I'm happy to change them. I'm also not familiar with what config file I would need to go to in order to set defaults.

gr8gonzo

Okay, so at this point, you should see the worker processes being destroyed after 10 seconds of being idle. If they're not being destroyed, then that indicates that they're not idle.

So in that scenario, you'll need to find out why they're not idle - go check your web server logs and look for anything indicating constant traffic (e.g. a script that has no timeout and is in an infinite loop, or things like that). You might consider enabling the slowlog for PHP-FPM, like this:

request_slowlog_timeout = 20s
slowlog = /var/log/fpm-slowlog.log

And then look for any requests that show up in the log. Twenty seconds is usually an eternity for normal PHP scripts, so the only stuff you should see in the log are things that would be considered edge cases (e.g. a long-running data import script) or abnormal traffic / bad scripts.

Dr. Klahn

gr8gonzo I/M/O has a good point. What's keeping the server busy so that those processes fail to exit?

It may be that your server is being repeatedly hit by vulnerability probes which are keeping it just busy enough to not release processes that would otherwise be idle. Most hosted servers are provided on a "defend it yourself" basis. Any server living in a cloud CIDR block is doubly a target as hostiles will just bang on the CIDR block IP addresses without even knowing the server names.

Suggestion: Look at your filtering and if necessary install iptables-addons and use GeoIP filtering to block problem countries. The following two lines alone will cut intrusion attempts by 30%.

# ==== CN -- China
iptables -t filter -A INPUT -m geoip --src-cc CN -j REJECT
# ==== CN -- China
iptables -t filter -A INPUT -m geoip --src-cc RU -j REJECT

Open in new window

David Favor

Refer to https://www.experts-exchange.com/questions/29156271/How-to-kill-a-process-by-name-and-state-in-Ubuntu.html my answer on to your other question.

Note: There's no context for your question. Provide your site URL + a description of your site function.

The real consideration is how site is tooled.

1) If real traffic is creating this situation, then site will require some design/tooling change.

2) If attack traffic is creating this situation, then you'll install Fail2Ban + write recipes to adaptively block/unblock attack traffic.

You can refer to your Apache + FPM logs for more detail about exact nature of problems.

Note: For example, a common attack relates to crafting 404 links against a WordPress site, especially if that site is foolishly running heavy security plugins like WordFence, rather than using Fail2Ban.

404s are never cached, so enough 404s sent to a site, fast enough, cause cause the exact problem you describe.

Also running an old Kernel, like you'll find with CentOS 7 (2+ year old Kernel) which tends to be much easier to take down, than the 4.15+ Kernels which all Distros should be using at this point.

Note: Context is everything... to guess at a correct solution...

Dustin King

ASKER

I do not believe it is an attack.

I am using a software called espocrm. I am also hosting wordpress sites.

I'm not seeing any fpm log being created and the fixes mentioned so far are not killing the sleeping processes. I'm not the one that created the sites so I'm not familiar with what may be causing the issues as far as coding is concerned.

Do I need to put pm. in front of:
request_slowlog_timeout = 20s
slowlog = /var/log/fpm-slowlog.log

I do have an error log in plesk-php71-fpm

Mostly it says that the max_chidlren setting is being reached (5) please consider raising it and signal 9 (SIGKILL) after 32533.338213 seconds from start on a lot of the sites.

That's just repeated over and over again on all sites.

I also loaded the main syslog file and for the mast part it's just a record of the crons running.

Aug 27 06:41:01 server CRON[21493]: (root) CMD (cd /var/www/vhosts/misite.nett; /opt/plesk/php/7.1/bin/php -f cron.php > /dev/null 2>&1)

I am noticing a few of these

Aug 27 06:42:06 server plesk_saslauthd[22032]: listen=6, status=5, dbpath='/plesk/passwd.db', keypath='/plesk/passwd_db_key', chroot=1, unprivileged=1
Aug 27 06:42:06 server plesk_saslauthd[22032]: privileges set to (112:115) (effective 112:115)
Aug 27 06:42:06 server plesk_saslauthd[22032]: No such user 'guest@site.com' in mail authorization database
Aug 27 06:42:06 server plesk_saslauthd[22032]: failed mail authentication attempt for user 'guest@site.com' (password len=7)
Aug 27 06:42:06 server postfix/smtpd[22110]: warning: unknown[80.82.77.18]: SASL LOGIN authentication failed: authentication failure
Aug 27 06:42:07 server postfix/smtpd[22110]: disconnect from unknown[80.82.77.18] ehlo=1 auth=0/1 rset=1 quit=1 commands=3/4

Just a few here and there though.

gr8gonzo

"Do I need to put pm. in front of:"
Nope.

"I'm not seeing any fpm log being created"
If the configuration is successful, then you should see /var/log/fpm-slowlog.log, even if it's an empty file. If it's not being created, then either the configuration isn't in the right area (it should be in the pool config, not the global config), or the service didn't reload/restart.

It's crucial that the FPM process is reloaded after you save config changes - just saving will not make them take effect.

Are all the sites sharing the 5 children, or does each site have its own pool?

Dustin King

ASKER

each site has its own pool of 5. I’ll check on the reload.

gr8gonzo

Oh, and the SASL stuff in your error log is related to mail authentication, so it's unrelated here.

David Favor

1) I do not believe it is an attack.

Logs must be reviewed to determine this.

2) I'm not seeing any fpm log being created.

Ubuntu, by default, creates FPM logs in /var/log named based on PHP version you're running.

3) Hum, you're referencing /opt/plesk/php/7.1/bin/php which is some non-standard PHP install.

This means you'll have to setup logging yourself.

David Favor

A standard PHP install will look something like this...

net16 # which php
/usr/bin/php

net16 # php --version
PHP 7.3.8-1+ubuntu18.04.1+deb.sury.org+1 (cli) (built: Aug  7 2019 09:52:12) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.8, Copyright (c) 1998-2018 Zend Technologies
    with Zend OPcache v7.3.8-1+ubuntu18.04.1+deb.sury.org+1, Copyright (c) 1999-2018, by Zend Technologies

Open in new window

Dustin King

ASKER

I saw I doubt it's an attack as this is a brand new server. Also it only happens during the day when my clients are actively using their sites. These sites are used heavily during the day by internal users to run tasks for each business.

The old server was just running a standard LAMP setup and never had Ram or CPU issues. The moment I started with plesk and switched to PHP-FPM is when this all started.

Once everyone has gone for the day the server doesn't pick back up. The more traffic the server gets the faster the ram is used up.

I did not change any defaults. The plesk server was auto installed from Vultr. So all the files would have come however they set it up. It's possible they cahnged the defaults.

I did check the /var/log and there is a php7.2-fpm.log as php 7.2 is the default php. I have to use 7.1 due to a requirement of client software that is running.

request_slowlog_timeout = 20s
slowlog = /var/log/plesk/fpm-slowlog.log

[php-fpm-pool-settings]
pm.process_idle_timeout = 10s

Open in new window

This is the current code I have in the Additional configuration directives. Please let me know if I've inserted wrong.

I made sure everything was reloaded by restarting the server just in case.

I'm not seeing the slowlog file created.

I won't know if the other settings made a difference until clients start using the server again in the morning.

Dustin King

ASKER

So I changed it to

[php-fpm-pool-settings]
pm.process_idle_timeout = 10s
request_slowlog_timeout = 20s
slowlog = /var/log/plesk/fpm-slowlog.log

Open in new window

Now there is a lot file in that location. Will have to wait and see what logs show now.

I have noticed a couple of php errors on a couple of sites being logged, but only on a couple. The ram usage is consistent over all sites that have a cron enabled it seems.

The command running starting the cron is:

*       *       *       *       *       cd /var/www/vhosts/site.com; /opt/plesk/php/7.1/bin/php -f cron.php > /dev/null 2>&1

Open in new window

Also, the server is running just fine currently with hardly any usage of Ram or CPU at all. So I'm not why the crons would make a difference. Mainly when the database calls start ramping up from system usage do I really see the spikes.

David Favor

Setting...

pm.process_idle_timeout = 10s

Open in new window

Creates massively corrupted data in many cases.

Consider the case where you call an external API which takes 11+ seconds to respond.

Then 100% of all data between your site + external API will be corrupt.

To determine how to fix this, you must first define what's truly occurring.

Your logs will tell you this.

You'll just look for long PHP processes + drill down into how code is running in the related PHP code.

This question needs an answer!

Become an EE member today

7 DAY FREE TRIAL

Members can start a 7-Day Free trial then enjoy unlimited access to the platform.

View membership options

Learn why we charge membership fees

We get it - no one likes a content blocker. Take one extra minute and find out why we block content.