asked on

CentOS Server dying at high load average

I have several CentOS VPS units running cPanel. At regular intervals, the server load goes extremely high (248+) and the server is dead in the water. At this point all customer websites etc. are unavailable. I usually end up resetting the server which is not good for MySQL databases etc. and need to discover what's really going on so I can stop this from happening.

At the time the server is non-responsive, there are hundreds of lines on the console saying kill process ID or sacrifice child. However when this is happening there is not much chance to get into the console as it's too busy going 'round in circles.

I had someone from cPanel support take a look and they say it has nothing to do with cPanel. Here is what the tech wrote:

I was monitoring your server from last 30 minutes and the server load was stable but the site https://www.tgis.co.uk/ is taking time to load.
It indicates that there is an issue with the site scripting/database that is eating resources.
For reference I have checked server old logs and found that the same domain is eating resources

top - 07:40:15 up 13:12, 1 user, load average: 53.75, 110.05, 170.98
Tasks: 267 total, 49 running, 216 sleeping, 0 stopped, 2 zombie
Cpu(s): 3.8%us, 3.9%sy, 0.0%ni, 30.2%id, 61.3%wa, 0.0%hi, 0.8%si, 0.0%st
Mem: 3925368k total, 3243796k used, 681572k free, 50444k buffers
Swap: 4128764k total, 353920k used, 3774844k free, 662552k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14221 tgis 20 0 331m 18m 636 R 4.9 0.5 0:58.55 /usr/bin/php /home/
14241 tgis 20 0 331m 19m 144 R 4.9 0.5 0:47.39 /usr/bin/php /home/
14255 tgis 20 0 253m 21m 1732 R 4.9 0.6 0:52.20 /usr/bin/php /home/
17441 tgis 20 0 395m 105m 9552 R 4.9 2.8 0:05.24 /usr/bin/php /home/
17466 tgis 20 0 309m 84m 9320 R 4.9 2.2 0:03.90 /usr/bin/php /home/
17576 tgis 20 0 248m 7440 5260 R 4.9 0.2 0:00.22 php -q /home/tgis/p
14111 tgis 20 0 251m 24m 4372 R 4.3 0.6 0:57.59 /usr/bin/php /home/
14602 root 20 0 92712 8924 1508 R 4.3 0.2 0:37.19 /usr/local/cpanel/s
17508 tgis 20 0 340m 51m 9368 R 4.3 1.3 0:01.82 /usr/bin/php /home/
2892 root 20 0 32832 7764 1928 R 3.7 0.2 0:00.41 /usr/local/cpanel/3
13111 tgis 20 0 253m 18m 2176 R 3.7 0.5 1:16.94 /usr/bin/php /home/
14234 tgis 20 0 335m 18m 76 R 3.7 0.5 0:44.64 /usr/bin/php /home/
14252 tgis 20 0 247m 21m 2692 R 3.7 0.6 0:56.60 /usr/bin/php /home/
17337 tgis 20 0 420m 130m 9596 R 3.7 3.4 0:10.07 /usr/bin/php /home/
17421 tgis 20 0 417m 127m 9596 R 3.7 3.3 0:06.73 /usr/bin/php /home/
17444 tgis 20 0 398m 109m 9596 R 3.7 2.9 0:05.56 /usr/bin/php /home/
17445 root 20 0 74540 17m 2580 R 3.7 0.5 0:03.92 /usr/local/cpanel/s
17465 tgis 20 0 372m 83m 9248 R 3.7 2.2 0:04.15 /usr/bin/php /home/
17496 tgis 20 0 346m 56m 9212 R 3.7 1.5 0:02.19 /usr/bin/php /home/
17500 tgis 20 0 348m 59m 9416 R 3.7 1.5 0:02.08 /usr/bin/php /home/
17503 tgis 20 0 340m 51m 9216 R 3.7 1.3 0:01.92 /usr/bin/php /home/
17504 tgis 20 0 340m 51m 9200 R 3.7 1.3 0:01.94 /usr/bin/php /home/
17506 tgis 20 0 332m 42m 9204 R 3.7 1.1 0:01.86 /usr/bin/php /home/
17525 tgis 20 0 326m 36m 8940 R 3.7 1.0 0:01.19 /usr/bin/php /home/
17549 clamav 20 0 116m 41m 1084 R 3.7 1.1 0:00.70 /usr/local/cpanel/3
14250 tgis 20 0 320m 20m 916 R 3.0 0.5 0:54.08 /usr/bin/php /home/
14277 tgis 20 0 327m 19m 704 R 3.0 0.5 0:45.58 /usr/bin/php /home/
14507 tgis 20 0 330m 19m 96 R 3.0 0.5 0:43.13 /usr/bin/php /home/
17252 tgis 20 0 416m 127m 9596 R 3.0 3.3 0:09.15 /usr/bin/php /home/

Seems a dynamic site with busy database as mysql error logs shows plenty of threads open

209 5:24:21 [Warning] /usr/sbin/mysqld: Forcing close of thread 244 user: 'tgis_whmcs'

180209 5:24:21 [Warning] /usr/sbin/mysqld: Forcing close of thread 234 user: 'root'

180209 5:24:21 [Warning] /usr/sbin/mysqld: Forcing close of thread 230 user: 'tgis_whmcs'

180209 5:24:21 [Warning] /usr/sbin/mysqld: Forcing close of thread 228 user: 'tgis_oneadmin'

180209 5:24:21 [Warning] /usr/sbin/mysqld: Forcing close of thread 227 user: 'tgis_oneadmin'

180209 5:24:21 [Warning] /usr/sbin/mysqld: Forcing close of thread 222 user: 'tgis_oneadmin'

180209 5:24:21 [Warning] /usr/sbin/mysqld: Forcing close of thread 221 user: 'tgis_oneadmin'

180209 5:24:21 [Warning] /usr/sbin/mysqld: Forcing close of thread 220 user: 'tgis_oneadmin'

I have done a malware scan just in case and it seemed to come up OK.

Can anyone help diagnose these issues? It's happening not only on this VPS but on others, even those on different hypervisor hosts. We use ESXi licensed version.

Many thanks
Chris

arnold

Sounds like under spec for the requests, services or potentially DDoS attack.
Impacting the requests generated against the backend MySQL db.

Does the issue correlate to a specific time, peak usage of a site...

In your situation tunning MySQL might not be enough to cure the resource depletion.

Not sure you could get the vps additional resources, memory, processing.

What is the host performance stats during that time frame?

gr8gonzo

How many connections is your web server configured to have (min/max servers and server type)? Also, are your PHP scripts establishing persistent connections to the database?