sharingsunshine
asked on
Help On Why Memory Being Used Up On Linux
I built a LInux 2 Amazon Apache 2.4 instance that uses php-fpm. I have had several segfaults lately and then the server crashes. I installed atop and noticed when the server crashed I had several php-fpm pools open. At 1 PM today I had 1.4 GB free memory and now I have 900 mb free memory.
Something seems to be wrong with php-fpm because when I restart it the free memory jumps back up to 1.3 GB.
Here is my current atop screenshot - https://gyazo.com/5028ec3f184ff973409d6cba079d3767 you can see the last column shows only a small percentage of the processes are using memory. Yet I keep running out of memory does anyone have any suggestions? I have atop and sar logging.
Something seems to be wrong with php-fpm because when I restart it the free memory jumps back up to 1.3 GB.
Here is my current atop screenshot - https://gyazo.com/5028ec3f184ff973409d6cba079d3767 you can see the last column shows only a small percentage of the processes are using memory. Yet I keep running out of memory does anyone have any suggestions? I have atop and sar logging.
ASKER
I appreciate your response and I know I can increase memory. In my opinion, that is just masking the problem. Mysql has been stuck at 10% for several hours now and all last week it was operating between 1 -3 %. So I prefer to find out the problem and if necessary I'll increase memory.
Currently, I am on a C5.large which has 4 GB memory instead of 2.
I looked at OVH and I will stick with Amazon for now.
Currently, I am on a C5.large which has 4 GB memory instead of 2.
I looked at OVH and I will stick with Amazon for now.
Please could you post atop output as text rather than a picture - it is much easier to read that way. Post as a file attachment, or within <code> delimiters (you can get a pair of these by pressing the CODE button).
[Example: in another window, enter cat >atop.txt. Go to the atop window and touch "z" key to freeze the display. Highlight entire window (triple-click top line and drag to bottom line (on some systems, right-click last line works instead of drag). In cat window, middle-click (should paste in atop window contents) then Control-D (to terminate input to cat). Go back to atop window and touch "z" key again (to restart refresh). Post atop.txt].
[Example: in another window, enter cat >atop.txt. Go to the atop window and touch "z" key to freeze the display. Highlight entire window (triple-click top line and drag to bottom line (on some systems, right-click last line works instead of drag). In cat window, middle-click (should paste in atop window contents) then Control-D (to terminate input to cat). Go back to atop window and touch "z" key again (to restart refresh). Post atop.txt].
The memory in the photo looks quite healthy: 1.5G cache with 0.1G dirty (so 1.4G reclaimable). slab / slrec similarly. It is normal for "free" memory to decrease as cache increases with time. You actually have over 2G reclaimable memory.
To the increased mysql cpu usage: I would suspect that a page you are serving has changed (or the way a page is used has changed, less likely to cause the observed symptom). Could you look into that?
To the increased mysql cpu usage: I would suspect that a page you are serving has changed (or the way a page is used has changed, less likely to cause the observed symptom). Could you look into that?
ASKER
I really appreciate your expertise. However, the server went dead again and I wasn't able to the cat command since I can't shell in.
Here is the screenshot - https://gyazo.com/e62027c71830acdbd43ff31dc9ccbb6c
Here is the screenshot - https://gyazo.com/e62027c71830acdbd43ff31dc9ccbb6c
You have certainly run out of memory this time. The apache_php-fim processes seem to be using most of it. Can you limit how many of these there are? Although that might not help because those that are left might just use more memory until they crash the system.
Which brings me back to what are they doing? I still suspect a recently modified php script is leaking memory (e.g. continually creating data structures and never freeing them). Suggest you search in the script area for recently modified files (find xxx -mtime yyy).
Which brings me back to what are they doing? I still suspect a recently modified php script is leaking memory (e.g. continually creating data structures and never freeing them). Suggest you search in the script area for recently modified files (find xxx -mtime yyy).
ASKER
this is a wordpress site so the only scripts would be plugins. I am not clear on how I would do the find command. By the way, it did crash the system and I had to do a reboot.
I wasn't able to the cat command since I can't shell in.But you were able to take a screenshot. As long as the atop display will highlight on multi-click, you have the data on your local system, which is where I meant for you to do the cat. Although I know sometimes that doesn't work from the display of a VM: please have a practise when it's up again.
The php-fpm scripts are doing the damage. You need to find out what they're doing and stop it.
ASKER
-------------- 13s elapsed
PRC | sys 0.13s | user 0.59s | | #proc 283 | #trun 1 | #tslpi 204 | | #tslpu 0 | #zombie 0 | clones 167 | | #exit 165 |
CPU | sys 2% | user 5% | irq 0% | | idle 193% | wait 0% | | | steal 0% | guest 0% | curf 3.43GHz | curscal ?% |
cpu | sys 1% | user 4% | irq 0% | | idle 95% | cpu000 w 0% | | | steal 0% | guest 0% | curf 3.45GHz | curscal ?% |
cpu | sys 1% | user 1% | irq 0% | | idle 97% | cpu001 w 0% | | | steal 0% | guest 0% | curf 3.41GHz | curscal ?% |
CPL | avg1 0.06 | avg5 0.05 | | avg15 0.09 | | | csw 7500 | intr 4316 | | | numcpu 2 | |
MEM | tot 3.6G | free 1.9G | cache 538.7M | dirty 0.5M | buff 27.0M | slab 54.7M | slrec 35.6M | shmem 1.1M | shrss 0.0M | vmbal 0.0M | hptot 0.0M | hpuse 0.0M |
SWP | tot 0.0M | free 0.0M | | | | | | | | | vmcom 2.5G | vmlim 1.8G |
NET | transport | tcpi 158 | tcpo 159 | udpi 2 | udpo 2 | tcpao 6 | tcppo 2 | tcprs 0 | tcpie 0 | tcpor 0 | udpnp 0 | udpie 0 |
NET | network | ipi 162 | ipo 158 | | ipfrw 0 | deliv 162 | | | | | icmpi 2 | icmpo 0 |
NET | eth0 ---- | pcki 162 | pcko 161 | sp 0 Mbps | si 11 Kbps | so 17 Kbps | coll 0 | mlti 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |
Window resized to 189x50... PAUSED
PID SYSCPU USRCPU VGROW RGROW RDDSK WRDSK RUID EUID ST EXC THR S CPUNR CPU CMD 1/6
4011 0.01s 0.35s 2184K 2188K 0K 0K apache apache -- - 1 S 1 3% php-fpm
16364 0.00s 0.12s -68.0M -68.0M 0K 0K apache apache -- - 1 S 1 1% php-fpm
6262 0.08s 0.01s -268K -160K 0K 12K root root -- - 1 S 1 1% atop
21316 0.03s 0.05s 0K 0K 0K 0K root root -- - 1 R 0 1% atop
2491 0.01s 0.01s 0K 0K 0K 844K mysql mysql -- - 52 S 1 0% mysqld
21324 0.00s 0.02s 0K 0K - - root - NE 0 0 E - 0% <dhclient-scr>
15471 0.00s 0.01s 0K 0K 0K 4K apache apache -- - 22 S 0 0% httpd
2270 0.00s 0.01s 0K 0K 0K 4K root root -- - 1 S 1 0% dhclient
21357 0.00s 0.01s 0K 0K - - root - NE 0 0 E - 0% <dhclient-scr>
8943 0.00s 0.00s 0K 0K 0K 0K apache apache -- - 6 S 0 0% httpd
18577 0.00s 0.00s 0K 0K 0K 0K apache apache -- - 6 S 1 0% httpd
1904 0.00s 0.00s 0K 0K 0K 4K root root -- - 3 S 1 0% rsyslogd
1391 0.00s 0.00s 0K 8K 0K 208K root root -- - 1 S 1 0% systemd-journa
1 0.00s 0.00s 0K 0K 0K 0K root root -- - 1 S 0 0% systemd
21270 0.00s 0.00s 0K 0K 0K 0K ec2-user ec2-user -- - 1 S 1 0% sshd
21458 0.00s 0.00s 122.0M 3996K 0K 0K root root N- - 1 S 0 0% dhclient-scrip
1900 0.00s 0.00s 0K 0K 0K 0K root root -- - 1 S 0 0% systemd-logind
1894 0.00s 0.00s 0K 0K 0K 0K dbus dbus -- - 1 S 1 0% dbus-daemon
21482 0.00s 0.00s 112.0M 760K 0K 0K root root N- - 1 S 1 0% sleep
1921 0.00s 0.00s 0K 0K 0K 0K root root -- - 1 S 1 0% atopacctd
8 0.00s 0.00s 0K 0K 0K 0K root root -- - 1 I 1 0% rcu_sched
1326 0.00s 0.00s 0K 0K 0K 0K root root -- - 1 S 0 0% xfsaild/nvme0n
1985 0.00s 0.00s 0K 0K 0K 4K root root -- - 1 S 0 0% jbd2/nvme1n1-8
21326 0.00s 0.00s 0K 0K - - root - NE 0 0 E - 0% <hostname>
21325 0.00s 0.00s 0K 0K - - root - NE 0 0 E - 0% <dhclient-scr>
21328 0.00s 0.00s 0K 0K - - root - NE 0 0 E - 0% <cat>
21327 0.00s 0.00s 0K 0K - - root - NE 0 0 E - 0% <dhclient-scr>
21330 0.00s 0.00s 0K 0K - - root - NE 0 0 E - 0% <cat>
21329 0.00s 0.00s 0K 0K - - root - NE 0 0 E - 0% <dhclient-scr>
21331 0.00s 0.00s 0K 0K - - root - NE 1 0 E - 0% <dbus-send>
21332 0.00s 0.00s 0K 0K - - root - NE 0 0 E - 0% <dhclient-scr>
21333 0.00s 0.00s 0K 0K - - root - NE 0 0 E - 0% <dhclient-scr>
21334 0.00s 0.00s 0K 0K - - root - NE 0 0 E - 0% <dhclient-scr>
21335 0.00s 0.00s 0K 0K - - root - NE 0 0 E - 0% <ip>
21336 0.00s 0.00s 0K 0K - - root - NE 0 0 E - 0% <seq>
21337 0.00s 0.00s 0K 0K - - root - NE 0 0 E - 0% <sleep>
21339 0.00s 0.00s 0K 0K - - root - NE 0 0 E - 0% <ip>
ASKER
this is after the reboot. The other screen is already gone. I am not familiar how to find out what the php-fpm scripts are doing. Are there shell commands that can tell me?
Something else you could try is to configure a large swap file - say 8G. The system will run slow when is starts to use it, except just maybe if the leaked memory is never accessed again it won't be too bad.
For how long was your system running problem-free before the segfaults started?
For how long was your system running problem-free before the segfaults started?
That atop display is better. But this time it's sorted by CPU - your previous displays were from atop -M (or you can type M interactively).
ASKER
2018/07/29 01:24:15 -------------- 13s elapsed
PRC | sys 0.13s | user 0.59s | | #proc 283 | #trun 1 | #tslpi 204 | | #tslpu 0 | #zombie 0 | clones 167 | | #exit 165 |
CPU | sys 2% | user 5% | irq 0% | | idle 193% | wait 0% | | | steal 0% | guest 0% | curf 3.43GHz | curscal ?% |
cpu | sys 1% | user 4% | irq 0% | | idle 95% | cpu000 w 0% | | | steal 0% | guest 0% | curf 3.45GHz | curscal ?% |
cpu | sys 1% | user 1% | irq 0% | | idle 97% | cpu001 w 0% | | | steal 0% | guest 0% | curf 3.41GHz | curscal ?% |
CPL | avg1 0.06 | avg5 0.05 | | avg15 0.09 | | | csw 7500 | intr 4316 | | | numcpu 2 | |
MEM | tot 3.6G | free 1.9G | cache 538.7M | dirty 0.5M | buff 27.0M | slab 54.7M | slrec 35.6M | shmem 1.1M | shrss 0.0M | vmbal 0.0M | hptot 0.0M | hpuse 0.0M |
SWP | tot 0.0M | free 0.0M | | | | | | | | | vmcom 2.5G | vmlim 1.8G |
NET | transport | tcpi 158 | tcpo 159 | udpi 2 | udpo 2 | tcpao 6 | tcppo 2 | tcprs 0 | tcpie 0 | tcpor 0 | udpnp 0 | udpie 0 |
NET | network | ipi 162 | ipo 158 | | ipfrw 0 | deliv 162 | | | | | icmpi 2 | icmpo 0 |
NET | eth0 ---- | pcki 162 | pcko 161 | sp 0 Mbps | si 11 Kbps | so 17 Kbps | coll 0 | mlti 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |
PAUSED
PID TID MINFLT MAJFLT VSTEXT VSLIBS VDATA VSTACK VSIZE RSIZE PSIZE VGROW RGROW SWAPSZ RUID EUID MEM CMD 1/8
2491 - 263 0 17452K 18448K 934.6M 132K 2.0G 257.9M 0K 0K 0K 0K mysql mysql 7% mysqld
4010 - 0 0 4112K 25168K 143.9M 132K 511.4M 149.6M 0K 0K 0K 0K apache apache 4% php-fpm
4013 - 0 0 4112K 25168K 135.6M 132K 439.2M 147.0M 0K 0K 0K 0K apache apache 4% php-fpm
16364 - 344 0 4112K 25168K 133.8M 132K 437.4M 145.2M 0K -68.0M -68.0M 0K apache apache 4% php-fpm
4011 - 1027 0 4112K 25168K 119.1M 132K 486.6M 124.8M 0K 2184K 2188K 0K apache apache 3% php-fpm
4008 - 0 0 4112K 25168K 111.4M 132K 478.8M 115.3M 0K 0K 0K 0K apache apache 3% php-fpm
4009 - 0 0 4112K 25168K 91356K 132K 456.7M 97096K 0K 0K 0K 0K apache apache 3% php-fpm
4007 - 0 0 4112K 25176K 49288K 132K 353.7M 60944K 0K 0K 0K 0K apache apache 2% php-fpm
11809 - 0 0 4112K 25168K 55116K 132K 421.3M 59100K 0K 0K 0K 0K apache apache 2% php-fpm
4006 - 0 0 4112K 25084K 5508K 132K 304.9M 20488K 0K 0K 0K 0K root root 1% php-fpm
2313 - 0 0 520K 12844K 3024K 132K 269.1M 12856K 0K 0K 0K 0K root root 0% httpd
8943 - 0 0 520K 12844K 45400K 132K 310.5M 12748K 0K 0K 0K 0K apache apache 0% httpd
14574 - 0 0 520K 12844K 45472K 132K 310.6M 12704K 0K 0K 0K 0K apache apache 0% httpd
12695 - 0 0 520K 12844K 45368K 132K 310.5M 12668K 0K 0K 0K 0K apache apache 0% httpd
18577 - 0 0 520K 12844K 45380K 132K 310.5M 12616K 0K 0K 0K 0K apache apache 0% httpd
14572 - 0 0 520K 12844K 45260K 132K 310.4M 12492K 0K 0K 0K 0K apache apache 0% httpd
15471 - 0 0 520K 12844K 172.2M 132K 502.3M 12180K 0K 0K 0K 0K apache apache 0% httpd
19482 - 0 0 520K 12844K 44936K 132K 310.1M 12168K 0K 0K 0K 0K apache apache 0% httpd
19462 - 0 0 520K 12844K 45112K 132K 310.2M 12144K 0K 0K 0K 0K apache apache 0% httpd
19954 - 0 0 520K 12844K 44984K 132K 310.1M 12112K 0K 0K 0K 0K apache apache 0% httpd
1904 - 0 0 592K 5624K 18100K 132K 477.1M 10424K 0K 0K 0K 0K root root 0% rsyslogd
20611 - 0 0 520K 12844K 44520K 132K 309.7M 9876K 0K 0K 0K 0K apache apache 0% httpd
1391 - 52 0 312K 3420K 420K 132K 39156K 8940K 0K 0K 8K 0K root root 0% systemd-journa
6262 - 150 0 188K 3696K 4432K 704K 25460K 8444K 0K -268K -160K 0K root root 0% atop
3050 - 0 0 796K 12656K 1124K 132K 144.7M 8292K 0K 0K 0K 0K root root 0% sshd
21268 - 0 0 796K 12656K 1124K 132K 144.7M 8220K 0K 0K 0K 0K root root 0% sshd
21316 - 0 0 188K 3772K 4108K 132K 28736K 8056K 0K 0K 0K 0K root root 0% atop
2640 - 0 0 796K 12100K 884K 132K 107.9M 6972K 0K 0K 0K 0K root root 0% sshd
2547 - 0 0 324K 12844K 876K 132K 82124K 5996K 0K 0K 0K 0K postfix postfix 0% qmgr
17879 - 0 0 256K 12844K 760K 132K 81940K 5780K 0K 0K 0K 0K postfix postfix 0% pickup
1 - 0 0 1404K 3776K 17900K 132K 122.7M 5500K 0K 0K 0K 0K root root 0% systemd
3078 - 0 0 128K 6704K 864K 132K 191.6M 4924K 0K 0K 0K 0K root root 0% sudo
21296 - 0 0 128K 6704K 864K 132K 191.6M 4720K 0K 0K 0K 0K root root 0% sudo
21270 - 1 0 796K 12656K 1124K 132K 144.7M 4676K 0K 0K 0K 0K ec2-user ec2-user 0% sshd
2542 - 0 0 152K 12844K 760K 132K 81836K 4572K 0K 0K 0K 0K root root 0% master
2270 - 63 0 396K 14308K 1928K 132K 100.7M 4508K 0K 0K 0K 0K root root 0% dhclient
3052 - 0 0 796K 12656K 1268K 132K 144.8M 4492K 0K 0K 0K 0K ec2-user ec2-user 0% sshd
ASKER
I prefer to not use a swap since it will be too slow for google pagespeed guidelines once we solve the problem.
Are there shell commands that can tell me?Not that I know of. But I've yet to use php myself. You need to look at the scripts yourself. Once you figure out where they are, you can ls -ltR to sort by date modified and maybe see what was changed around the time your problems started. Or I can give you more help with find. This all assumes you had a working system for some time previously - is that the case?
ASKER
Yes, this seemed to start around two weeks ago. How will I know I have stopped the drain? Assuming I deactivate the correct plugin.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
To find where the scripts are, try locate .php. That requires that your distribution installed locate - what Linux Distribution do you have?
ASKER
linux 2 from amazon running apache 2.4
ASKER
I used the located and I have 1,000's of php programs. So how do I narrow it down.
Identify the directories with the most php files in them. In my case, but probably not yours, that is /usr/lib64/php. So i would enter find /usr/lib64/php -type f -mtime +10 -mtime -20 to get files but not directories modified at least 10 days ago and no more than 20 days ago. To sort this output by date modified, add | xargs ls -lt i.e. find /usr/lib64/php -type f -mtime +10 -mtime -20 | xargs ls -lt. If there are multiple screenfulls, pipe into less.
Files ending .so are plugins btw
ASKER
I found the problem. I had a program that I use for mailings and I had an old version that didn't do well with my new configuration. I upgraded the program and now there are no php-fpm pid's
Until this morning!
This same program began to show up in the pid's today.
I have been in touch with the developer and I am going to try something and reboot the server and observe it again and let you know. The screenshot above is showing this morning before the reboot and the fixes being implemented.
Until this morning!
2018/07/30 11:59:36 -------------- 10s elapsed
PRC | sys 0.03s | user 0.14s | | #proc 327 | #trun 1 | #tslpi 212 | | #tslpu 0 | #zombie 0 | clones 204 | | #exit 205 |
CPU | sys 2% | user 2% | irq 0% | | idle 196% | wait 0% | | | steal 0% | guest 0% | curf 3.42GHz | curscal ?% |
cpu | sys 1% | user 1% | irq 0% | | idle 98% | cpu000 w 0% | | | steal 0% | guest 0% | curf 3.43GHz | curscal ?% |
cpu | sys 1% | user 1% | irq 0% | | idle 98% | cpu001 w 0% | | | steal 0% | guest 0% | curf 3.41GHz | curscal ?% |
CPL | avg1 0.12 | avg5 0.09 | | avg15 0.02 | | | csw 5628 | intr 3162 | | | numcpu 2 | |
MEM | tot 3.6G | free 1.6G | cache 908.2M | dirty 0.1M | buff 131.9M | slab 132.1M | slrec 108.8M | shmem 1.6M | shrss 0.0M | vmbal 0.0M | hptot 0.0M | hpuse 0.0M |
SWP | tot 0.0M | free 0.0M | | | | | | | | | vmcom 2.2G | vmlim 1.8G |
NET | transport | tcpi 165 | tcpo 118 | udpi 1 | udpo 1 | tcpao 9 | tcppo 3 | tcprs 0 | tcpie 0 | tcpor 1 | udpnp 0 | udpie 0 |
NET | network | ipi 167 | ipo 113 | | ipfrw 0 | deliv 167 | | | | | icmpi 1 | icmpo 0 |
NET | eth0 ---- | pcki 159 | pcko 111 | sp 0 Mbps | si 106 Kbps | so 22 Kbps | coll 0 | mlti 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |
NET | lo ---- | pcki 8 | pcko 8 | sp 0 Mbps | si 0 Kbps | so 0 Kbps | coll 0 | mlti 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |
PAUSED
PID TID MINFLT MAJFLT VSTEXT VSLIBS VDATA VSTACK VSIZE RSIZE PSIZE VGROW RGROW SWAPSZ RUID EUID MEM CMD 1/10
2491 - 165 0 17452K 18448K 981.5M 132K 2.0G 333.5M 0K 0K 8K 0K mysql mysql 9% mysqld
11045 - 0 0 4112K 25168K 87000K 132K 452.4M 93004K 0K 0K 0K 0K apache apache 2% php-fpm
9268 - 0 0 4112K 25168K 75248K 132K 441.0M 81356K 0K 0K 0K 0K apache apache 2% php-fpm
1391 - 89 0 312K 3420K 732K 132K 114.6M 72148K 0K 0K 0K 0K root root 2% systemd-journa
10172 - 0 0 4112K 25176K 65352K 132K 433.3M 71472K 0K 0K 0K 0K apache apache 2% php-fpm
20969 - 206 2 4112K 25168K 65236K 132K 431.2M 71228K 0K -20.0M -20.0M 0K apache apache 2% php-fpm
10890 - 0 0 4112K 25168K 63416K 132K 429.4M 69476K 0K 0K 0K 0K apache apache 2% php-fpm
1904 - 0 0 592K 5624K 19120K 132K 548.9M 68372K 0K 0K 0K 0K root root 2% rsyslogd
27295 - 0 0 4112K 25168K 57980K 132K 424.1M 63928K 0K 0K 0K 0K apache apache 2% php-fpm
29775 - 0 0 4112K 25168K 57092K 328K 423.4M 63408K 0K 0K 0K 0K apache apache 2% php-fpm
10198 - 0 0 4112K 25168K 48852K 132K 415.2M 54828K 0K 0K 0K 0K apache apache 1% php-fpm
11046 - 0 0 4112K 25168K 45684K 132K 412.1M 51756K 0K 0K 0K 0K apache apache 1% php-fpm
7415 - 0 0 4112K 25168K 37660K 132K 404.3M 41708K 0K 0K 0K 0K apache apache 1% php-fpm
4006 - 0 0 4112K 25084K 5508K 132K 304.9M 20496K 0K 0K 0K 0K root root 1% php-fpm
2313 - 12 0 520K 12844K 3128K 132K 269.2M 12916K 0K 0K 0K 0K root root 0% httpd
4031 - 0 0 520K 12844K 172.6M 132K 502.7M 12864K 0K 0K 0K 0K apache apache 0% httpd
2852 - 0 0 520K 12844K 45252K 132K 310.4M 12656K 0K 0K 0K 0K apache apache 0% httpd
2262 - 4 0 520K 12844K 45380K 132K 310.5M 12652K 0K 0K 0K 0K apache apache 0% httpd
7895 - 0 0 520K 12844K 45128K 132K 310.2M 12348K 0K 0K 0K 0K apache apache 0% httpd
8359 - 0 0 520K 12844K 45204K 132K 310.3M 12248K 0K 0K 0K 0K apache apache 0% httpd
8133 - 1 0 520K 12844K 45096K 132K 310.2M 12228K 0K 0K 64K 0K apache apache 0% httpd
9041 - 0 0 520K 12844K 45192K 132K 310.3M 11980K 0K 0K 0K 0K apache apache 0% httpd
8577 - 0 0 520K 12844K 44900K 132K 310.0M 11308K 0K 0K 0K 0K apache apache 0% httpd
10141 - 146 0 520K 12844K 44488K 132K 309.6M 10612K 0K 264K 2704K 0K apache apache 0% httpd
10135 - 0 0 520K 12844K 44488K 132K 309.6M 10544K 0K 0K 0K 0K apache apache 0% httpd
17302 - 0 0 188K 3772K 4800K 132K 29428K 8876K 0K 0K 0K 0K root root 0% atop
9835 - 0 0 188K 3696K 4852K 704K 25880K 8720K 0K 0K 0K 0K root root 0% atop
17536 - 0 0 796K 12656K 1124K 132K 144.7M 8436K 0K 0K 0K 0K root root 0% sshd
17252 - 0 0 796K 12656K 1124K 132K 144.7M 8340K 0K 0K 0K 0K root root 0% sshd
14783 - 0 0 796K 12656K 1124K 132K 144.7M 8288K 0K 0K 0K 0K root root 0% sshd
2640 - 0 0 796K 12100K 884K 132K 107.9M 7528K 0K 0K 0K 0K root root 0% sshd
2547 - 0 0 324K 12844K 876K 132K 82124K 5996K 0K 0K 0K 0K postfix postfix 0% qmgr
6033 - 0 0 256K 12844K 760K 132K 81940K 5892K 0K 0K 0K 0K postfix postfix 0% pickup
1 - 0 0 1404K 3776K 17968K 132K 122.8M 5576K 0K 0K 0K 0K root root 0% systemd
17280 - 0 0 128K 6704K 864K 132K 191.6M 4956K 0K 0K 0K 0K root root 0% sudo
17571 - 0 0 128K 6704K 864K 132K 191.6M 4884K 0K 0K 0K 0K root root 0% sudo
This same program began to show up in the pid's today.
I have been in touch with the developer and I am going to try something and reboot the server and observe it again and let you know. The screenshot above is showing this morning before the reboot and the fixes being implemented.
ASKER
after the reboot the offending program isn't showing up. However, I keep seeing this in the pids that keep growing in size.
php-fpm 7799 apache 0u unix 0xffff8800b6815800 0t0 44473 /run/php-fpm/www.sock
ASKER
This is how it looks after the reboot and several hours later
2018/07/30 17:03:22 -------------- 10s elapsed
PRC | sys 0.06s | user 0.78s | | #proc 122 | #trun 3 | #tslpi 207 | | #tslpu 0 | #zombie 0 | clones 0 | | #exit 1 |
CPU | sys 1% | user 7% | irq 0% | | idle 192% | wait 0% | | | steal 0% | guest 0% | curf 3.40GHz | curscal ?% |
cpu | sys 0% | user 4% | irq 0% | | idle 95% | cpu001 w 0% | | | steal 0% | guest 0% | curf 3.40GHz | curscal ?% |
cpu | sys 0% | user 3% | irq 0% | | idle 97% | cpu000 w 0% | | | steal 0% | guest 0% | curf 3.40GHz | curscal ?% |
CPL | avg1 0.08 | avg5 0.03 | | avg15 0.07 | | | csw 5595 | intr 2904 | | | numcpu 2 | |
MEM | tot 3.6G | free 1.8G | cache 816.7M | dirty 0.0M | buff 40.8M | slab 75.0M | slrec 52.6M | shmem 1.8M | shrss 0.0M | vmbal 0.0M | hptot 0.0M | hpuse 0.0M |
SWP | tot 0.0M | free 0.0M | | | | | | | | | vmcom 2.3G | vmlim 1.8G |
NET | transport | tcpi 29 | tcpo 39 | udpi 1 | udpo 1 | tcpao 0 | tcppo 3 | tcprs 0 | tcpie 0 | tcpor 0 | udpnp 0 | udpie 0 |
NET | network | ipi 31 | ipo 31 | | ipfrw 0 | deliv 31 | | | | | icmpi 1 | icmpo 0 |
NET | eth0 ---- | pcki 31 | pcko 40 | sp 0 Mbps | si 3 Kbps | so 25 Kbps | coll 0 | mlti 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |
PAUSED
PID TID MINFLT MAJFLT VSTEXT VSLIBS VDATA VSTACK VSIZE RSIZE PSIZE VGROW RGROW SWAPSZ RUID EUID MEM CMD 1/4
2516 - 784 0 17452K 18448K 940.2M 132K 2.0G 307.8M 0K 0K 0K 0K mysql mysql 8% mysqld
8012 - 1152 0 4112K 25168K 96760K 132K 462.0M 101.0M 0K 16384K 16384K 0K apache apache 3% php-fpm
7798 - 0 0 4112K 25168K 90772K 132K 456.1M 97164K 0K 0K 0K 0K apache apache 3% php-fpm
7799 - 0 0 4112K 25168K 89024K 132K 454.4M 95904K 0K 0K 0K 0K apache apache 3% php-fpm
7797 - 365 0 4112K 25168K 82564K 132K 448.1M 89232K 0K 0K 0K 0K apache apache 2% php-fpm
19956 - 0 0 4112K 25168K 77744K 132K 443.4M 83704K 0K 0K 0K 0K apache apache 2% php-fpm
1390 - 0 0 312K 3420K 708K 132K 124.2M 79660K 0K 0K 0K 0K root root 2% systemd-journa
9573 - 0 0 4112K 25168K 67440K 132K 433.3M 73524K 0K 0K 0K 0K apache apache 2% php-fpm
1939 - 0 0 592K 5624K 20048K 132K 578.2M 59068K 0K 0K 0K 0K root root 2% rsyslogd
19949 - 0 0 4112K 25168K 50736K 132K 417.0M 57316K 0K 0K 0K 0K apache apache 2% php-fpm
19451 - 353 0 4112K 25168K 35956K 132K 338.7M 47724K 0K 4096K 4096K 0K apache apache 1% php-fpm
23833 - 0 0 4112K 25168K 41576K 132K 408.1M 45864K 0K 0K 0K 0K apache apache 1% php-fpm
8514 - 0 0 4112K 25168K 27160K 132K 394.0M 33200K 0K 0K 0K 0K apache apache 1% php-fpm
1952 - 0 0 188K 3696K 22512K 704K 43540K 26384K 0K 0K 0K 0K root root 1% atop
7794 - 0 0 4112K 25084K 5508K 132K 304.9M 20460K 0K 0K 0K 0K root root 1% php-fpm
2345 - 0 0 520K 12844K 3024K 132K 269.1M 12752K 0K 0K 0K 0K root root 0% httpd
14698 - 0 0 520K 12844K 45376K 132K 310.5M 12692K 0K 0K 0K 0K apache apache 0% httpd
8826 - 0 0 520K 12844K 45484K 132K 310.6M 12660K 0K 0K 0K 0K apache apache 0% httpd
21525 - 0 0 520K 12844K 172.5M 132K 502.6M 12608K 0K 0K 0K 0K apache apache 0% httpd
22200 - 0 0 520K 12844K 45408K 132K 310.5M 12604K 0K 0K 0K 0K apache apache 0% httpd
19352 - 0 0 520K 12844K 45268K 132K 310.4M 12536K 0K 0K 0K 0K apache apache 0% httpd
11435 - 31 0 520K 12844K 45308K 132K 310.4M 12528K 0K 0K 0K 0K apache apache 0% httpd
21537 - 0 0 520K 12844K 45300K 132K 310.4M 12524K 0K 0K 0K 0K apache apache 0% httpd
22202 - 0 0 520K 12844K 45192K 132K 310.3M 12460K 0K 0K 0K 0K apache apache 0% httpd
30434 - 0 0 520K 12844K 44800K 132K 309.9M 11956K 0K 0K 0K 0K apache apache 0% httpd
6058 - 0 0 188K 3772K 5108K 132K 29736K 9288K 0K 0K 0K 0K root root 0% atop
30892 - 0 0 520K 12844K 44524K 132K 309.7M 9008K 0K 0K 0K 0K apache apache 0% httpd
6270 - 0 0 796K 12656K 1124K 132K 144.7M 8452K 0K 0K 0K 0K root root 0% sshd
6010 - 0 0 796K 12656K 1124K 132K 144.7M 8364K 0K 0K 0K 0K root root 0% sshd
5405 - 0 0 796K 12656K 1124K 132K 144.7M 8312K 0K 0K 0K 0K root root 0% sshd
2659 - 0 0 796K 12100K 884K 132K 107.9M 7720K 0K 0K 0K 0K root root 0% sshd
17859 - 0 0 256K 12844K 760K 132K 81940K 5708K 0K 0K 0K 0K postfix postfix 0% pickup
2610 - 0 0 324K 12844K 876K 132K 82124K 5664K 0K 0K 0K 0K postfix postfix 0% qmgr
1 - 0 0 1404K 3776K 17900K 132K 122.7M 5516K 0K 0K 0K 0K root root 0% systemd
7056 - 0 0 896K 2104K 2360K 132K 123.2M 5452K 0K 0K 0K 0K root root 0% bash
6038 - 0 0 128K 6704K 864K 132K 191.6M 4856K 0K 0K 0K 0K root root 0% sudo
7055 - 0 0 128K 6704K 864K 132K 191.6M 4792K 0K 0K 0K 0K root root 0% sudo
Using top is the simple way to tell if you have memory exhaustion.
Just look at swap amount. Anything > 0 (any swap usage) will tell you to add memory.
To guess at amount of memory to add, create a flat file of say 16G + let your system run for a few hours... or long enough to represent normal site work load you expect.
Then look at max swap space used. Good rule of thumb, install RAM of 2x times amount of swap used.
Then keep re-running this test till swap usage goes to zero + stays at zero.
Just look at swap amount. Anything > 0 (any swap usage) will tell you to add memory.
To guess at amount of memory to add, create a flat file of say 16G + let your system run for a few hours... or long enough to represent normal site work load you expect.
Then look at max swap space used. Good rule of thumb, install RAM of 2x times amount of swap used.
Then keep re-running this test till swap usage goes to zero + stays at zero.
ASKER
I don't have swap enabled.
Looks a lot better. You still have over 2GB available memory. The php-fpm processes are all still below 4%, where they used to climb above 10%. Another key indicator: cache dirty is staying at 0%. So all of cache is available memory (although Linux runs faster when it can cache).
I would say it's fixed, but you might want to keep an eye on it for a day or 2 more
I would say it's fixed, but you might want to keep an eye on it for a day or 2 more
ASKER
I want to award you the points because your command led me right to the problem. Thanks so much, saved me having to increase memory by getting another instance. Doing that would have been some work so glad I am not having to do that nor pay the extra.
Sure glad you knew how to read the atop screen.
Sure glad you knew how to read the atop screen.
That's very tight memory.
I'd go for a minimum of 16G-32G, which will likely resolve your problem.
Segfaults will occur when the OOM (out of memory) Killer runs + picks a process to kill. How the process is selected is complex + relates to your OOM config.
You can attempt debugging this + whatever information you acquire will all point back to installing more memory.
You can attempt to tune Apache + FPM + Database (MariaDB/MySQL) down to using only a few threads + this will simply cause visitors to see errors or WPD (White pages of death).
Your options are to install more memory + visitors will see content, or attempt to debug the problem + tune your LAMP Stack so visitors see errors rather than content.
If you really must debug this, wow... fairly complex to do... There's no... one way to even begin...
Best to open a Gig + hire someone to get into your system... or better, add memory.
BTW, compare prices of OVH physical machines to Amazon Instances. You'll be surprised at the cost savings (especially debugging problems) when you install a physical server with Ubuntu latest.