Link to home
Create AccountLog in
Avatar of Mike Paradis
Mike Paradis

asked on

Mariadb blew up... why? What is causing this?

We've been chasing a mariadb problem for months and eliminated all we could to the point of staring blankly when everything would stop dead and loads on clients would go up to 100.


I have been monitoring using pmm for a week or two now and nothing really obvious came up yet we'd watch things come to a crawl. 

No locks found but today, I finally found something in the log which is beyond me that this was never looked at but here we are.


The server is mariadb 10.5 running on Centos 7.9 and I found that the server got to the point where it actually restarted but, what caused this and and IS causing this?

(I corrected the above from 10.6 originally posted)

Here is some of the logging and note that there are a huge number of these too.
Warning] Aborted connection 1- Got an error reading communication packets

May  1 09:05:28 db01 mariadbd: number of readers 0, waiters flag 1, lock_word: 0
May  1 09:05:28 db01 mariadbd: Last time write locked in file mtr0mtr.inl line 146
May  1 09:05:28 db01 mariadbd: 2022-05-01  9:04:17 0 [Note] InnoDB: A semaphore wait:
May  1 09:05:28 db01 mariadbd: --Thread 139651702830848 has waited at row0ins.cc line 2633 for 252.00 seconds the semaphore:
May  1 09:05:28 db01 mariadbd: SX-lock on RW-latch at 0x7f186411cb78 created in file buf0buf.cc line 1226
May  1 09:05:28 db01 mariadbd: a writer (thread id 139764553910016) has reserved it in mode  exclusive
May  1 09:05:28 db01 mariadbd: number of readers 0, waiters flag 1, lock_word: 0
May  1 09:05:28 db01 mariadbd: Last time write locked in file mtr0mtr.inl line 146
May  1 09:05:28 db01 mariadbd: 2022-05-01  9:04:17 0 [Note] InnoDB: A semaphore wait:
May  1 09:05:28 db01 mariadbd: --Thread 139651687778048 has waited at row0ins.cc line 2633 for 252.00 seconds the semaphore:
May  1 09:05:28 db01 mariadbd: SX-lock on RW-latch at 0x7f186411cb78 created in file buf0buf.cc line 1226
May  1 09:05:28 db01 mariadbd: a writer (thread id 139764553910016) has reserved it in mode  exclusive
May  1 09:05:28 db01 mariadbd: number of readers 0, waiters flag 1, lock_word: 0
May  1 09:05:28 db01 mariadbd: Last time write locked in file mtr0mtr.inl line 146
May  1 09:05:28 db01 mariadbd: 2022-05-01  9:04:17 0 [Note] InnoDB: A semaphore wait:
May  1 09:05:28 db01 mariadbd: --Thread 139627240601344 has waited at btr0cur.cc line 1487 for 252.00 seconds the semaphore:
May  1 09:05:28 db01 mariadbd: SX-lock on RW-latch at 0x7efdfc031760 created in file dict0dict.cc line 2165
May  1 09:05:28 db01 mariadbd: a writer (thread id 139764553910016) has reserved it in mode  SX
May  1 09:05:28 db01 mariadbd: number of readers 136, waiters flag 1, lock_word: fffff78
May  1 09:05:28 db01 mariadbd: Last time write locked in file btr0cur.cc line 1487
May  1 09:05:28 db01 mariadbd: 2022-05-01  9:04:17 0 [Note] InnoDB: A semaphore wait:
May  1 09:05:28 db01 mariadbd: --Thread 139650269484800 has waited at row0ins.cc line 2633 for 252.00 seconds the semaphore:
May  1 09:05:28 db01 mariadbd: SX-lock on RW-latch at 0x7f186411cb78 created in file buf0buf.cc line 1226
May  1 09:05:28 db01 mariadbd: a writer (thread id 139764553910016) has reserved it in mode  exclusive
May  1 09:05:28 db01 mariadbd: number of readers 0, waiters flag 1, lock_word: 0
May  1 09:05:28 db01 mariadbd: Last time write locked in file mtr0mtr.inl line 146
May  1 09:05:28 db01 mariadbd: 2022-05-01  9:04:17 0 [Note] InnoDB: A semaphore wait:
May  1 09:05:28 db01 mariadbd: --Thread 139651696072448 has waited at btr0cur.cc line 1487 for 252.00 seconds the semaphore:
May  1 09:05:28 db01 mariadbd: SX-lock on RW-latch at 0x7efdfc031760 created in file dict0dict.cc line 2165
May  1 09:05:28 db01 mariadbd: a writer (thread id 139764553910016) has reserved it in mode  SX
May  1 09:05:28 db01 mariadbd: number of readers 136, waiters flag 1, lock_word: fffff78
May  1 09:05:28 db01 mariadbd: Last time write locked in file btr0cur.cc line 1487
May  1 09:05:28 db01 mariadbd: 2022-05-01  9:04:17 0 [Note] InnoDB: A semaphore wait:
May  1 09:05:28 db01 mariadbd: --Thread 139626700867328 has waited at row0ins.cc line 2633 for 252.00 seconds the semaphore:
May  1 09:05:28 db01 mariadbd: SX-lock on RW-latch at 0x7f186411cb78 created in file buf0buf.cc line 1226
May  1 09:05:28 db01 mariadbd: a writer (thread id 139764553910016) has reserved it in mode  exclusive
May  1 09:05:28 db01 mariadbd: number of readers 0, waiters flag 1, lock_word: 0
May  1 09:05:28 db01 mariadbd: Last time write locked in file mtr0mtr.inl line 146
May  1 09:05:28 db01 mariadbd: 2022-05-01  9:04:17 0 [Note] InnoDB: A semaphore wait:
May  1 09:05:28 db01 mariadbd: --Thread 139626718992128 has waited at btr0cur.cc line 1487 for 252.00 seconds the semaphore:
May  1 09:05:28 db01 mariadbd: SX-lock on RW-latch at 0x7efdfc031760 created in file dict0dict.cc line 2165
May  1 09:05:28 db01 mariadbd: a writer (thread id 139764553910016) has reserved it in mode  SX
May  1 09:05:28 db01 mariadbd: number of readers 136, waiters flag 1, lock_word: fffff78
May  1 09:05:28 db01 mariadbd: Last time write locked in file btr0cur.cc line 1487
May  1 09:05:28 db01 mariadbd: 2022-05-01  9:04:17 0 [Note] InnoDB: A semaphore wait:
May  1 09:05:28 db01 mariadbd: --Thread 139764948297472 has waited at btr0cur.cc line 1487 for 252.00 seconds the semaphore:
May  1 09:05:28 db01 mariadbd: SX-lock on RW-latch at 0x7efdfc031760 created in file dict0dict.cc line 2165
May  1 09:05:28 db01 mariadbd: a writer (thread id 139764553910016) has reserved it in mode  SX

May  1 09:06:03 db01 mariadbd: --Thread 139650272556800 has waited at btr0cur.cc line 1487 for 357.00 seconds the semaphore:
May  1 09:06:03 db01 mariadbd: SX-lock on RW-latch at 0x7efdfc031760 created in file dict0dict.cc line 2165
May  1 09:06:03 db01 mariadbd: a writer (thread id 139764553910016) has reserved it in mode  SX
May  1 09:06:03 db01 mariadbd: number of readers 136, waiters flag 1, lock_word: fffff78
May  1 09:06:03 db01 mariadbd: Last time write locked in file btr0cur.cc line 1487
May  1 09:06:03 db01 mariadbd: 2022-05-01  9:06:02 0 [Note] InnoDB: A semaphore wait:
May  1 09:06:03 db01 mariadbd: --Thread 139650530449152 has waited at btr0cur.cc line 1487 for 357.00 seconds the semaphore:
May  1 09:06:03 db01 mariadbd: SX-lock on RW-latch at 0x7efdfc031760 created in file dict0dict.cc line 2165
May  1 09:06:09 db01 kernel: INFO: task xfsaild/dm-0:664 blocked for more than 120 seconds.
May  1 09:06:09 db01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  1 09:06:09 db01 kernel: xfsaild/dm-0    D ffff9ebffa4c9660     0   664      2 0x00000000
May  1 09:06:09 db01 kernel: Call Trace:
May  1 09:06:09 db01 kernel: [<ffffffffa798c3e9>] schedule+0x29/0x70
May  1 09:06:09 db01 kernel: [<ffffffffc0438727>] xfs_log_force+0x157/0x2e0 [xfs]
May  1 09:06:09 db01 kernel: [<ffffffffa72dadf0>] ? wake_up_state+0x20/0x20
May  1 09:06:09 db01 kernel: [<ffffffffc0444a70>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
May  1 09:06:09 db01 kernel: [<ffffffffc0444c00>] xfsaild+0x190/0x780 [xfs]
May  1 09:06:09 db01 kernel: [<ffffffffc0444a70>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
May  1 09:06:09 db01 kernel: [<ffffffffa72c5e61>] kthread+0xd1/0xe0
May  1 09:06:09 db01 kernel: [<ffffffffa72c5d90>] ? insert_kthread_work+0x40/0x40
May  1 09:06:09 db01 kernel: [<ffffffffa7999df7>] ret_from_fork_nospec_begin+0x21/0x21
May  1 09:06:09 db01 kernel: [<ffffffffa72c5d90>] ? insert_kthread_work+0x40/0x40
May  1 09:06:09 db01 kernel: INFO: task mariadbd:18328 blocked for more than 120 seconds.
May  1 09:06:09 db01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  1 09:06:09 db01 kernel: mariadbd        D ffff9ea942cfe8e0     0 18328      1 0x00000080
May  1 09:06:09 db01 kernel: Call Trace:
May  1 09:06:09 db01 kernel: [<ffffffffa798c3e9>] schedule+0x29/0x70
May  1 09:06:09 db01 kernel: [<ffffffffa798a0b1>] schedule_timeout+0x221/0x2d0
May  1 09:06:09 db01 kernel: [<ffffffffa7553f29>] ? __blk_run_queue+0x39/0x50
May  1 09:06:09 db01 kernel: [<ffffffffa7557c5c>] ? blk_queue_bio+0x31c/0x400
May  1 09:06:09 db01 kernel: [<ffffffffa726d39e>] ? kvm_clock_get_cycles+0x1e/0x20
May  1 09:06:09 db01 kernel: [<ffffffffa798bc9d>] io_schedule_timeout+0xad/0x130
May  1 09:06:09 db01 kernel: [<ffffffffa798ca1d>] wait_for_completion_io+0xfd/0x140
May  1 09:06:09 db01 kernel: [<ffffffffa72dadf0>] ? wake_up_state+0x20/0x20
May  1 09:06:09 db01 kernel: [<ffffffffa75596a4>] blkdev_issue_flush+0xb4/0x110
May  1 09:06:09 db01 kernel: [<ffffffffc04310f9>] xfs_blkdev_issue_flush+0x19/0x20 [xfs]
May  1 09:06:09 db01 kernel: [<ffffffffc0418e99>] xfs_file_fsync+0x199/0x1c0 [xfs]
May  1 09:06:09 db01 kernel: [<ffffffffa7484227>] do_fsync+0x67/0xb0
May  1 09:06:09 db01 kernel: [<ffffffffa7999ec9>] ? system_call_after_swapgs+0x96/0x13a
May  1 09:06:09 db01 kernel: [<ffffffffa7484533>] SyS_fdatasync+0x13/0x20
May  1 09:06:09 db01 kernel: [<ffffffffa7999f92>] system_call_fastpath+0x25/0x2a
May  1 09:06:09 db01 kernel: [<ffffffffa7999ed5>] ? system_call_after_swapgs+0xa2/0x13a
May  1 09:06:09 db01 kernel: INFO: task mariadbd:18806 blocked for more than 120 seconds.
May  1 09:06:09 db01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  1 09:06:09 db01 kernel: mariadbd        D ffff9ebffd9f5860     0 18806      1 0x00000080
May  1 09:06:09 db01 kernel: Call Trace:
May  1 09:06:09 db01 kernel: [<ffffffffa798c3e9>] schedule+0x29/0x70
May  1 09:06:09 db01 kernel: [<ffffffffc0438c03>] xfs_log_force_lsn+0x1a3/0x370 [xfs]
May  1 09:06:09 db01 kernel: [<ffffffffa72dadf0>] ? wake_up_state+0x20/0x20
May  1 09:06:09 db01 kernel: [<ffffffffc0418dfd>] xfs_file_fsync+0xfd/0x1c0 [xfs]
May  1 09:06:09 db01 kernel: [<ffffffffa7484227>] do_fsync+0x67/0xb0
May  1 09:06:09 db01 kernel: [<ffffffffa7999ec9>] ? system_call_after_swapgs+0x96/0x13a
May  1 09:06:09 db01 kernel: [<ffffffffa7484533>] SyS_fdatasync+0x13/0x20
May  1 09:06:09 db01 kernel: [<ffffffffa7999f92>] system_call_fastpath+0x25/0x2a
May  1 09:06:09 db01 kernel: [<ffffffffa7999ed5>] ? system_call_after_swapgs+0xa2/0x13a
May  1 09:06:09 db01 kernel: INFO: task kworker/3:0:17452 blocked for more than 120 seconds.
May  1 09:06:09 db01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  1 09:06:09 db01 kernel: kworker/3:0     D ffff9eb22df65860     0 17452      2 0x00000080
May  1 09:06:09 db01 kernel: Workqueue: xfs-cil/dm-0 xlog_cil_push_work [xfs]
May  1 09:06:09 db01 kernel: Call Trace:

May  1 09:06:55 db01 pmm-agent: #033[36mINFO#033[0m[2022-05-01T09:06:55.745-07:00] 2022-05-

01T16:06:29.836Z#011error#011VictoriaMetrics/lib/promscrape/scrapework.go:354#011error when scraping "http://127.0.0.1:42000/metrics?

collect%5B%5D
=buddyinfo&collect%5B%5D=cpu&collect%5B%5D=diskstats&collect%5B%5D=filefd&collect%5B%5D=filesystem&collect%5B%5D=loadavg&collect%5B

%5D=meminfo&collect%5B%5D=meminfo_numa&collect%5B%5D=netdev&collect%5B%5D=netstat&collect%5B%5D=processes&
collect%5B%5D=standard.go&collect%5B%5D=standard.process&collect%5B%5D=stat&collect%5B%5D=textfile.hr&collect%5B%5D=time&collect%5B

%5D=vmstat" from job "node_exporter_agent_id_42c51671-7d6b-4426-be45-59f78a59dbf7_hr-5s" with labels {agen
t_id="/agent_id/42c51671-7d6b-4426-be45-59f78a59dbf7",agent_type="node_exporter",instance="/agent_id/42c51671-7d6b-4426-be45-

59f78a59dbf7",job="node_exporter_agent_id_42c51671-7d6b-4426-be45-59f78a59dbf7_hr-5s",machine_id="/machine_id/cc
4015d81dd64439ae059694373f1630",node_id="/node_id/34b0314f-1f30-42b2-94f4-0950ace8320e",node_name="db01.phx.loc",node_type="generic"}: 

error when scraping "http://127.0.0.1:42000/metrics?collect%5B%5D=buddyinfo&collect%5B%5D=cpu&collect%
5B%5D=diskstats&collect%5B%5D=filefd&collect%5B%5D=filesystem&collect%5B%5D=loadavg&collect%5B%5D=meminfo&collect%5B

%5D=meminfo_numa&collect%5B%5D=netdev&collect%5B%5D=netstat&collect%5B%5D=processes&collect%5B%5D=standard.go&collect%5B%
5D=standard.process&collect%5B%5D=stat&collect%5B%5D=textfile.hr&collect%5B%5D=time&collect%5B%5D=vmstat" with timeout 4.5s: timeout  

#033[36magentID#033[0m=/agent_id/b0fbda86-3080-411e-805c-1be676a7c9f8 #033[36mcomponent#033[0m=agent-process #033[36mtype#033[0m=vm_agent


May  1 09:09:11 db01 mariadbd: 2022-05-01  9:09:11 0 [Warning] InnoDB: A long semaphore wait:
May  1 09:09:11 db01 mariadbd: --Thread 139645429868288 has waited at btr0cur.cc line 1487 for 546.00 seconds the semaphore:
May  1 09:09:11 db01 mariadbd: SX-lock on RW-latch at 0x7efdfc031760 created in file dict0dict.cc line 2165
May  1 09:09:11 db01 mariadbd: a writer (thread id 139764553910016) has reserved it in mode  SX
May  1 09:09:11 db01 mariadbd: number of readers 136, waiters flag 1, lock_word: fffff78
May  1 09:09:11 db01 mariadbd: Last time write locked in file btr0cur.cc line 1487
May  1 09:09:11 db01 mariadbd: 2022-05-01  9:09:11 0 [Warning] InnoDB: A long semaphore wait:
May  1 09:09:11 db01 mariadbd: --Thread 139626710697728 has waited at row0ins.cc line 2633 for 546.00 seconds the semaphore:
May  1 09:09:11 db01 mariadbd: SX-lock on RW-latch at 0x7f186411cb78 created in file buf0buf.cc line 1226
May  1 09:11:22 db01 rsyslogd: imjournal: 8218 messages lost due to rate-limiting
May  1 09:14:41 db01 systemd: mariadb.service: main process exited, code=killed, status=6/ABRT
May  1 09:14:41 db01 systemd: Unit mariadb.service entered failed state.
May  1 09:14:41 db01 systemd: mariadb.service failed.
May  1 09:14:46 db01 systemd: mariadb.service holdoff time over, scheduling restart.
May  1 09:14:46 db01 systemd: Stopped MariaDB 10.5.15 database server.
May  1 09:14:46 db01 systemd: Starting MariaDB 10.5.15 database server...
May  1 09:14:46 db01 mariadbd: 2022-05-01  9:14:46 0 [Note] /usr/sbin/mariadbd (mysqld 10.5.15-MariaDB) starting as process 20403 ...
May  1 09:14:46 db01 mariadbd: 2022-05-01  9:14:46 0 [Warning] Could not increase number of max_open_files to more than 32768 

(request: 81075)
May  1 09:14:46 db01 mariadbd: 2022-05-01  9:14:46 0 [Note] Using unique option prefix 'myisam_recover' is error-prone and can break in the future. Please use the full name 'myisam-recover-options' instead.
May  1 09:14:46 db01 mariadbd: 2022-05-01  9:14:46 0 [Note] mariadbd: Aria engine: starting recovery
May  1 09:14:47 db01 mariadbd: tables to flush: 1 0
May  1 09:14:47 db01 mariadbd: (0.5 seconds);

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of John Tsioumpris
John Tsioumpris
Flag of Greece image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Avatar of skullnobrains
skullnobrains

the system did not crash but was actually killed by the OS
it may or may not have recovered by itself if left alone

this looks like a deadlock but the log lines are a bit different from the usual.
googling suggests the lock may occur in the adaptive hash index. you probably should try to disable it in order to check whether it is indeed the root cause.
other actions include searching for the actual dead lock starting with innodb status page.
Hi,

This line in the log may be causing this.
May  1 09:14:46 db01 mariadbd: 2022-05-01  9:14:46 0 [Warning] Could not increase number of 
max_open_files to more than 32768 (request: 81075)

Open in new window

The log indicates that the database is waiting for locks and that may because you have reached the open_file_limit.

How many tables do you have in your database ? If it is 81K then you should increase the OPEN_FILE_LIMIT to either 100000  or INFINITY. See the below links on how to.

Run this command to see how many files MariaDB can open.
SHOW GLOBAL VARIABLES LIKE 'open_files_limit';

Open in new window



https://haydenjames.io/could-not-increase-number-of-max_open_files-to-more-than/
https://mariadb.com/kb/en/could-not-increase-number-of-max_open_files-to-more-than-1024-request-1835/
You can also set this value in the my.cnf.(default is 0). If not set, that is the default is used, then the open_file_limit takes/uses the number from the LimitNOFILE which is described in the above links.

Regards,
    Tomas Helgi
hmm as far as i know, this message is just maria trying to raise the limit on startup and complaining. it probably tries infinity. chances that 32k descriptors are actually used are slim. you can set the limit to a sane value if you do not want this message to pop.
Avatar of Mike Paradis

ASKER

I'm not really focusing on trying to prevent messages as these are the only leads we have as to what is going on.
To answer some of the questions.

The network is fine. For months, I've tested everything to do with connectivity between machines including having zabbix and outagesio clients/agents installed on everything to make sure there are no network losses. I also do iperf testing and on the 10G network, I'm getting over 8Gbps consistently. I too thought it might be a network issue but discounted that.

I spent a lot of time looking for locks and slow queries and while there are slow queries, they are known to us and normal. They are simply long running and cannot be avoided at this time.

I know it's not hardware or vm because the problem has been happening across machines. Our first sign of the problem was when the server was a dedicated 64GB, 16 core, bare metal, nothing else running on it and we started seeing these 'lock ups' for lack of better word. I posted many times on this site and others looking for ideas but almost everything suggested I was already doing and suggestions I was not doing am now. Still happens.

Two months ago now, a new blade was fired up, this time 256GB and 48 cores, still the problem shows up. I decided that was overkill so now running proxmox on that blade with only one vm, the DB. This is so I can have more flexibility with the DB. I gave it 128GB and 44 cores and performance wise, it's fine. When using top to see how it's doing which I run 24/7, it's never gone over 50GB and around 40 load which I suspect is when this problem is manifesting itself. I say that because otherwise, load is average 2-3 even with the constant 24/7 data flow. I eliminate hardware and vm at this point.

I do agree to a complete rebuild but I usually do than when I've exhausted everything else and / or think there is a case where something happened during installation that gives me that uncomfortable 'something unknown' feeling. I've never gotten that in this case.

I monitor with percona and have not seen any locks. I've also manually worked through the log files and found nothing but the slow queries we're aware of. I've thought it might be locks but there have never been any reported.

Open files limit has never been reached. I've thought it could be anything from a file limit to socket limit but that doesn't seem to be it either, especially across OS's now. The main db has only 108 tables and the open_files_limit is set to 32768. Not sure why so high.

The only custom things I have in the OS are;
sysctl.conf
vm.swappiness = 1
net.ipv4.tcp_fastopen = 3
net.ipv4.tcp_max_syn_backlog = 5120
net.core.netdev_max_backlog = 5000
net.core.somaxconn = 1024

And something else somewhere where I raise the OS limit from 1024 to 8000 or something but would have to look in my notes.

At this point, I've been thinking there is an application problem for quite some time but cannot find it. I've spoken to the db dev and we've dug for hours and cannot find anything.

I use things like mysqltuner and tuning-primer along with percona and cannot find anything.

The only thing I can think of is a conversion step missed when we imported the db from a mariadb 10.4 version to a 10.5 I read some things about needing to take some steps but we didn't know about them at the time. Since this is a live running db that is continuously updated, we cannot go back to an old backup.
However, to try and find such a problem, we've exported the databases and re-imported them into temp databases to see if anything would go wrong since there is an integrity check happening during the import and again, no problems. I thought maybe a control character or something is lost in the db file but doesn't seem to be the case, it exports and re-imports just fine.

I hoped this log would show something that would finally give up the mystery.


Hi Mike,

I think I have spotted your problem. You say you are running MariaDB 10.6 right ?
However your log indicates that you are running an 10.5.15 instance although the binaries may be 10.6.x

1 09:14:46 db01 systemd: Stopped MariaDB 10.5.15 database server.
May  1 09:14:46 db01 systemd: Starting MariaDB 10.5.15 database server...

Open in new window

Aftur you did the binary upgrade from 10.5 to 10.6 did you run the command
mariadb-upgrade (a.ka. mysql_upgrade) on the database instance ?

It is a crucial step to run after installing/upgrading the binaries between major versions. You can run into strange problems skipping that stepp.
https://mariadb.com/kb/en/mysql_upgrade/

Regards,
   Tomas Helgi
Hi Tomas,

It's almost funny you mention that because it's exactly what I said to another person here yesterday.
I explained that I recently built a 10.6 galera cluster and had to run 'some upgrade' command but I could not find my note on it so it was left at that.
When I migrated the 10.5 db to the recently rebuilt 10.6 server, I never knew about that command so no, it was never run.

I hope the data is not trashed?!
I'll check that link right now. Yes, it's what I did before. Can it be run after the fact, now that the db has been running this way for maybe a couple of months?

Doesn't explain what was happening before but I don't care if this fixed the current problem.
I'll check that link right now. Yes, it's what I did before. Can it be run after the fact, now that the db has been running this way for maybe a couple of months?
Yes it can be. It is mostly the catalog that is updated to reflect the binary changes and maybe some other tables.
It is always a good practice to take a backup of the database prior running the mariadb-upgrade.

As you are running a Galera Cluster configuration you might take a look at these links
https://mariadb.com/kb/en/upgrading-from-mariadb-103-to-mariadb-104-with-galera-cluster/
https://fromdual.com/upgrading-from-mariadb-10.4-to-mariadb-10.5-galera-cluster
and run the upgrade with the --skip-write-binlog option on each node as mentioned.

Regards,
    Tomas Helgi
This server is separate from the clustered ones. I"ll backup and give this a shot.

This installation of MariaDB is already upgraded to 10.5.15-MariaDB.
There is no need to run mysql_upgrade again for 10.5.15-MariaDB.
You can use --force if you still want to run mysql_upgrade

Well, that's a bit confusing. I got mixed up. This server is 10.5, the cluster is 10.6 but I already did the upgrade on those.
So, I guess that's not the problem then?


What does these commands tell you ?

SHOW GLOBAL VARIABLES LIKE 'open_files_limit';

Open in new window

SHOW GLOBAL VARIABLES LIKE 'max_connections';

Open in new window

show status where variable_name = 'threads_connected';

Open in new window


Regards,
   Tomas Helgi
I've kept an eye on all of these variables using both the above and the scripts mentioned.
The connections are high because there are situations where we're get to 800+ connections now and then.


MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'open_files_limit';
+------------------+-------+
| open_files_limit | 32768 |
+------------------+-------+
1 row in set (0.001 sec)

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'max_connections';
+-----------------+-------+
| max_connections | 1000  |
+-----------------+-------+
1 row in set (0.001 sec)

MariaDB [(none)]> show status where variable_name = 'threads_connected';
+-------------------+-------+
| Threads_connected | 11    |
+-------------------+-------+
1 row in set (0.001 sec)

Open in new window

I suggest you increase the open_file_limit to 100000 as I suggested in previous comments and see if that helps when you have high peak of connections and load on your system.

Regards,
    Tomas Helgi

Let me run some tests before I change anything.
I've been watching all of these things for months. There's always room for improvement but I don't see any problems in how things are configured at the moment. I'm aware of why some show a bit out of whack but those are partly due to the problem we're experiencing.

mysqltuner says no problem with files open;
-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 2d 5h 38m 26s (290M q [1K qps], 17M conn, TX: 6601G, RX: 71G)
[--] Reads / Writes: 89% / 11%
[--] Binary logging is disabled
[--] Physical Memory     : 122.8G
[--] Max MySQL memory    : 123.7G
[--] Other process memory: 0B
[--] Total buffers: 105.3G global + 18.9M per thread (1000 max threads)
[--] P_S Max memory usage: 72B
[--] Galera GCache Max memory usage: 0B
[!!] Maximum reached memory usage: 118.7G (96.63% of installed RAM)
[!!] Maximum possible memory usage: 123.7G (100.76% of installed RAM)
[!!] Overall possible memory usage with other process exceeded memory
[!!] Slow queries: 16% (48M/290M)
[OK] Highest usage of available connections: 72% (725/1000)
[OK] Aborted connections: 0.00%  (5/17936966)
[OK] Query cache is disabled by default due to mutex contention on multiprocessor machines.
[OK] Sorts requiring temporary tables: 1% (232K temp sorts / 16M sorts)
[!!] Joins performed without indexes: 47041862
[!!] Temporary tables created on disk: 33% (47M on disk / 142M total)
[OK] Thread cache hit rate: 94% (944K created / 17M connections)
[OK] Table cache hit rate: 99% (284M hits / 284M requests)
[OK] table_definition_cache(4096) is upper than number of tables(302)
[OK] Open file limit used: 0% (62/32K)
[OK] Table locks acquired immediately: 100% (44K immediate / 44K locks)


Open in new window

tuning-primer;
MAX CONNECTIONS
Current max_connections = 1000
Current threads_connected = 14
Historic max_used_connections = 725
The number of used connections is 72% of the configured maximum.
Your max_connections variable seems to be fine.

OPEN FILES LIMIT
Current open_files_limit = 32768 files
The open_files_limit should typically be set to at least 2x-3x
that of table_cache if you have heavy MyISAM usage.
Your open_files_limit value seems to be fine



Open in new window


What is access9ng the data might you be requesting locks unnecessarily and not releasing them?
open files limit is NOT the issue.

what about the adaptive hash index ? is it active ? did you try disabling it ? given the 48 core server, i see many more chances this could be the cause.

have you actually looked for innodb dead locks ? i do not believe this is how they appear in the log but checking with the actual regular tool does not harm. you can log said locks with innodb_print_all_deadlocks but beware as the error log might get huge

also note that running such a performant server requires to setup adequate buffer sizes, probably split the buffer pool into multiple ones, define a number of read/write threads adapted to the number of disks and raid level... some of these may help with your issue.
nd a few notes regarding your status

 Joins performed without indexes: 47041862

Open in new window

this is crazy and indicates you are missing indexes


Maximum possible memory usage: 123.7G (100.76% of installed RAM)

Open in new window

this requires attention : you need to be under 100 including leaving enough space for the OS. maybe limit the number of connections or buffers.


Hi,

skullnobrain is right. This is crazy.
Joins performed without indexes: 47041862

Open in new window

You should log all queries not using indexes, look at them and add suitable indexes on your tables.
I personally configure a dedicated DB server with no more than 70 - 75% of available memory leaving 25 - 30 % to OS and other usage. It is considered Best Practice to do so. ;)

Is the log you posted from a high load in your system ?
If you have high load and overallocated memory you risk starving your OS which may lead to unpredicted behavior.

Regards,
    Tomas Helgi
Add slow query log and enable slow query that takes longer than 10 seconds at first.
This can be dynamically activated by updating variables.

Possibly you have an overlap.
Do you also have pbpmyadmin, it will include suggestions based on the system stats that could improve/resolve your issues.
Wow, I don't know where to start :).

>What is accessing the data might you be requesting locks unnecessarily and not releasing them?

In all of the digging I've done, including a very long thread thinking the problem was in fact locking or client limits being reached, this was discounted as not being the problem.
https://www.experts-exchange.com/questions/29236783/High-TIME-WAIT-php-mariadb-causing-problems.html

>what about the adaptive hash index ? is it active ? did you try disabling it ?
>given the 48 core server, i see many more chances this could be the cause.

Disabled, not in use.

>have you actually looked for innodb dead locks ? i do not believe this is how they appear in the log

It doesn't show in what I posted but logging is enabled for the persona agent that's installed on the server sending data to a pmm. I did those from the CLI because I don't want to take the server down to change the my.cnf. No locks are reported by pmm nor by either of the scripts I'm using to monitor the server as shown in the snips I posted.

>but checking with the actual regular tool does not harm. you can log said locks with
>innodb_print_all_deadlocks but beware as the error log might get huge

Just need a bit of clarity on this. What is the 'actual tool'?
Do you mean enabling another log such as mentioned above, innodb_print_all_deadlocks?

>also note that running such a performant server requires to setup adequate buffer sizes, probably split
>the buffer pool into multiple ones, define a number of read/write threads adapted to the number of
>disks and raid level... some of these may help with your issue.
 
I've never read that anywhere that the buffer pool could be split. I'll have to read up on how this works and how it might help.

>Joins performed without indexes: 47041862
>this is crazy and indicates you are missing indexes

Yes, this has been an ongoing thing :). I've brought it up to the db dev many many times and he assures me that everything is indexed. Every report I run however says otherwise but he insists all is indexed so I'm kind of stuck on this one.

>Maximum possible memory usage: 123.7G (100.76% of installed RAM)this requires attention
>you need to be under 100 including leaving enough space for the OS. maybe limit the number of
>connections or buffers.

I've posted about memory usage many times on this site and the end result seems to be that these reports are not quite accurate. Looking at top at least, the highest amount of memory the system has used is 50GB and 0 swap.
In the other posts, we tested over and over again and it always showed high memory usage so I'm not sure what to do about this because no matter what I do, it never says it's good.

>You should log all queries not using indexes, look at them and add suitable indexes on your tables.

I do log and I've checked this for months.

>I personally configure a dedicated DB server with no more than 70 - 75% of available memory leaving
>25 - 30 % to OS and other usage. It is considered Best Practice to do so. ;)

In terms of buffer size, I always use 80% or less of the total amount of memory. In this case, I gave the server (vm) 128GB of memory so it has a buffer size of 104GB. The entire server is only using 50GB.

>Is the log you posted from a high load in your system ?
>If you have high load and over allocated memory you risk starving your OS which may lead to
unpredictable behavior.

Even when the db service starts locking up, the system low and memory usage aren't high, they remain the same. I thought it would totally use everything up but what ever this lock up is, it's not taking up all the memory or resources, it's just locking up the database itself in some way.

If anyone is interested, here are some of the other posts on the same topic of trying to find this problem and/or optimizing the system. The problem has been there for some time and early on, I thought I was honing in on the problem but each time, even if it optimized the system, this weird lock up continues.

https://www.experts-exchange.com/questions/29226416/Mariadb-use-more-available-mem-lower-temp-tables.html

https://www.experts-exchange.com/questions/29127284/Mariadb-suddenly-blocks-requests.html

https://www.experts-exchange.com/questions/29178902/Mariadb-kill-sleep-connections-ASAP.html

https://www.experts-exchange.com/questions/29025382/Mariadb-temp-tables.html



Yes, this has been an ongoing thing :). I've brought it up to the db dev many many times and he assures me that everything is indexed. Every report I run however says otherwise but he insists all is indexed so I'm kind of stuck on this one.
The best way to address this is to log slow queries using proper configuration and set log_queries_not_using_indexes to ON in my.cnf and gather all queries that are logged into a report and present it to the db dev.

Take some sample of those queries and run EXPLAIN on them to reveal the access path for the query and copy that to the report. It will show clearly what is using indexes and what is not.
Or if you want the logged slow queries to be automatically explained then you can simply put into the config
log_slow_verbosity=query_plan,explain

Open in new window

see the chapter Configuring the Slow Query Log Verbosity in the first link and example in the second link.



See these links for more info
https://mariadb.com/kb/en/slow-query-log-overview/
https://mariadb.com/kb/en/explain-in-the-slow-query-log/
https://mariadb.com/docs/reference/mdb/system-variables/log_queries_not_using_indexes/
https://minervadb.com/index.php/mariadb-slow-query-log-explained/

Regards,
    Tomas Helgi
your tile wait issue is because your writers do no use serialisation or connection caches so they connect and disconnect at a rapid pace. you can play with sysctls to limit their numbers but they do not really harm. better make sure you do not connevt every time you spawn a query.

noted : adaptive hash is not the issue


the flag that log locks makes them appear in the error log. that said, this is probably not your issue : seems to be related to mutex contention rather than innodb deadlock but there is no harm making sure.

memory usage is an actual issue. it will only reach the maximum when you reach the max number of clients.

forget about swapping  there is no point in having a swap at all on such a server.

you did miss something regarding indexes. the query log does not give enough information unless you run explain queries as well. this is very time consuming so we usually start with the long queries of use the builtin accounting db.

can you post information regarding the number of read and write threads and the disks (number, raid level, hard vs soft raid...)

is the machine in a gallera cluster ? if ig ys, chances are the deadlocks can be sql deadlocks across multiple machines.
I guess I should share the cnf to that things are clear.
Also, even if you don't see the logging in the my.cnf, I enable it from the command line.

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
symbolic-links=0
user=mysql

collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8

#slow_query_log = 1
#slow_query_log_file = /var/log/mariadb/mariadb_slow.log
log-queries-not-using-indexes = 1
long_query_time = 1

performance_schema             = On

# CACHES AND LIMITS
max_connections                = 1000                   # Changed from 800 to 1000 9-24-2021

# Effective in-memory tmp_table_size is limited to max_heap_table_size
# Changed to 1GB from 536870912 4/2/22
# SET max_heap_table_size=1073741824;
# SET tmp_table_size=1073741824;

                                                         # Larger blew up the server by filling /tmp
max_heap_table_size            = 1024M                   # From 512M to 1024M 4/3/22
tmp_table_size                 = 1024M                   # Changed together. Done in memory on this date.

open_files_limit               = 65535
query_cache_type               = Off
query_cache_size               = 0
table_definition_cache         = 4096
table_open_cache               = 5000
thread_cache_size              = 50

# TIMEOUTS
#connect_timeout                = 10
#interactive_timeout            = 3600
wait_timeout                    = 120                   # Changed from 8 days to 120 10/17/21

# MyISAM
#key_buffer_size                = 32M                   # Commented out Jan/29/2021
myisam_recover                 = FORCE,BACKUP

# SAFETY
innodb                         = FORCE
innodb_strict_mode             = 1

# This doesn't need to be any larger than the max blob/packet/message size.
max_allowed_packet             = 16M                    # Changed from 1G to 16M 4/3/22
max_connect_errors             = 1000000
skip_name_resolve                                       # Enabled 9-24-2021

# INNODB
innodb_buffer_pool_size        = 104G                    # 80% of 256B
innodb_flush_method            = O_DIRECT
innodb_flush_log_at_trx_commit = 0
#innodb_file_per_table         = 1                      # Commented out 9-24-2021
#innodb_log_files_in_group     = 2                      # Commented out 9-24-2021

# Was 16GB. Changed to 1GB 4/2/22
innodb_log_file_size           = 1G
default-storage-engine         = INNODB

[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid

!includedir /etc/my.cnf.d


Open in new window


Run in CLI for percona;
MariaDB [(none)]> SET GLOBAL long_query_time=5.0;
MariaDB [(none)]> SET GLOBAL slow_query_log=1;
MariaDB [(none)]> SET GLOBAL log_queries_not_using_indexes=ON;
MariaDB [(none)]> SET GLOBAL log_slow_verbosity='query_plan,explain';
MariaDB [(none)]> SET GLOBAL slow_query_log_file='slow_query_log';

I do this from the cli so that when I'm done logging, I don't have to restart the service.

>Or if you want the logged slow queries to be automatically explained then you can simply put into the config

Do the percona reports show the results you're mentioning? I have percona pmm running 24/7.

>your title wait issue is because your writers do not use serialization or connection caches so they connect
>and disconnect at a rapid pace.
>better make sure you do not connect every time you spawn a query.

Wait now, I don't recall that ever being mentioned in that very long question thread. You're saying you know this for a fact?

I can tell you that the client traffic is constant, non stop and I was trying to understand if I should close the connection as fast as possible or keep it open so the client doesn't have to open another one. It was never clear and still is not which is best and at what end this needs to be done, on the client server side or the db server side.

>you can play with sysctls to limit their numbers but they do not really harm.

My sysctls settings are something I got used to doing each time I set up a db server. I've not really looked lately to even know if I should be doing that.

BTW, I'm not a database trained person, I just have way too many hats on because I trust me more than people I've hired that keep leaving things broken. I've ended up having to deal with this because of how long it takes to find reliable help but I'm no expert at all. I've just been tuning and tuning and working it for a couple of years now as a sideline to the countless other things I'm overwhelmed with.

The same way you enable ooptions you can disable them and then edit my.cnf to make the change permanent.

Can you add phpmyadmin to your web server and see what recommendations it offers.

If you have your slow query log, use explain query
It will tell you what consumes time,...
I wondered about slow query results for a long time and posted about it here. (the other questions)
There aren't any that we're not aware of. Some are simply long queries and can't be helped.
I've also posted asking about using a ram drive,

I use command line but have phpmyadmin installed for the db dev. There isn't much I see in there that helps as those are all things that always show no matter what is done or are simply not the problem but could be tiny performance gains.

IssueRecommendation
There is a lot of slow queries compared to the overall amount of Queries.You might want to increase long_query_time or optimize the queries listed in the slow query log 
The query cache is not enabled.The query cache is known to greatly improve performance if configured correctly. Enable it by setting query_cache_size to a 2 digit MiB value and setting query_cache_type to 'ON'. Note: If you are using memcached, ignore this recommendation. 
Too many sorts are causing temporary tables.Consider increasing sort_buffer_size and/or read_rnd_buffer_size, depending on your system memory limits 
There are lots of rows being sorted.While there is nothing wrong with a high amount of row sorting, you might want to make sure that the queries which require a lot of sorting use indexed columns in the ORDER BY clause, as this will result in much faster sorting 
There are too many joins without indexes.This means that joins are doing full table scans. Adding indexes for the columns being used in the join conditions will greatly speed up table joins 
The rate of reading the first index entry is high.This usually indicates frequent full index scans. Full index scans are faster than table scans but require lots of CPU cycles in big tables, if those tables that have or had high volumes of UPDATEs and DELETEs, running 'OPTIMIZE TABLE' might reduce the amount of and/or speed up full index scans. Other than that full index scans can only be reduced by rewriting queries. 
The rate of reading data from a fixed position is high.This indicates that many queries need to sort results and/or do a full table scan, including join queries that do not use indexes. Add indexes where applicable. 
The rate of reading the next table row is high.This indicates that many queries are doing full table scans. Add indexes where applicable. 
Many temporary tables are being written to disk instead of being kept in memory.Increasing max_heap_table_size and tmp_table_size might help. However some temporary tables are always being written to disk, independent of the value of these variables. To eliminate these you will have to rewrite your queries to avoid those conditions (Within a temporary table: Presence of a BLOB or TEXT column or presence of a column bigger than 512 bytes) as mentioned in the beginning of an Article by the Pythian Group 
MyISAM key buffer (index cache) % used is low.You may need to decrease the size of key_buffer_size, re-examine your tables to see if indexes have been removed, or examine queries and expectations about what indexes are being used. 
The rate of opening tables is high.Opening tables requires disk I/O which is costly. Increasing table_open_cache might avoid this. 
There are too many threads that are slow to launch.This generally happens in case of general system overload as it is pretty simple operations. You might want to monitor your system load carefully. 
Too many clients are aborted.Clients are usually aborted when they did not close their connection to MySQL properly. This can be due to network issues or code not closing a database handler properly. Check your network and code. 
The InnoDB log file size is inadequately large.It is usually sufficient to set innodb_log_file_size to 25% of the size of innodb_buffer_pool_size. A very big innodb_log_file_size slows down the recovery time after a database crash considerably. See also this blog entry 

Not using MyISAM.
InnoDB log file is a weird suggestion by phpmyadmin. I raised it at one point and it was simply too large, taking much too long to restore the db if stopping/starting. We don't do any critical things like banking so there's not much need for a large log.
Temp tables, all tools always complain about yet around 25% is pretty good which is where I'm usually at. For some reason, it's a bit higher than usual in the past few weeks.
The indexing stuff, I've talked with the db dev repeatedly about and he tells me everything is indexed.

The slow thread start is something I've not seen before but it could be because of what ever is wrong with the system. Meaning, a symptom is caused but its not the cause so changing a bunch of things that would help to fine tune just doesn't seem that important right now. Finding the cause of the main problem is.

The largest db is 1GB which seems to be not that large but mariadb but maybe it is if it's having to be fully scanned over and over or something.

Which leads me to the things that do catch my eye...

Too many sorts, lots of rows being sorted, too many joins without indexes.

I asked in other questions if putting more things into memory would help and yes of course it would but all of these things still don't seem to point to what is happening, why the whole db locks up while the OS is barely under pressure. Lots of memory, CPU etc, load is not high yet everything stops and as mentioned in my question, actually restarted the server.

So, yes, I can tune but I'd like to avoid 'tuning' things to find the actual problem first. And of course, sometimes, tuning could be the answer so I'm in a big of a pickle since I cannot restart this service.
Consider a long query, that the client disconnects(connection aborts)
i.e. the client timeout might be too small compared to length of time a response might take.

joins without indexes are also consume resources as they trigger entire table scan. it is one thing when a table is definitional, i.e. has 10 rows as an example.
If you see the slow queries and they are joins
best to look and see the columns on which they are joined and see whether there is an index defined on that column
You could add an index (not unique) and see, the index can always be removed.
note which table, which column, and the index you are adding.
if you see joins on multiple columns, you can add a multi column index
DBs as data gets added, more rows, at times the initial design might not have taken that into account. The issues often manifest down the line.
I think you can add an index using the PHPmyadmin
Intially concentrate on the tables with large numbers of rows that are using in the slow query.
If successful, these queries will stop showing up in the log.


Adding more memory is one way, but the memory consumption might be attributed to the non-index queries joins when data from tables loads into memory.......



Today, we decided to go with the following.

mysql_upgrade --force and
mysqlcheck -o --all-databases -uroot -p --auto-repair

Nothing went wrong, no errors so maybe that will help, no idea.
One thing that keeps coming up... is it possible the install itself is somehow messed up?
There were not problems that were obvious at least during installation as I usually note those.

@Arnold, I'm going to share this with the dev. Before ending the day, I suggested that he take a really close look at the database and what the reports including phpmyadmin are telling us.

To me, it's seemed for a long time something to do with these constant quick connections layering over each other. I've wanted to either keep those connections open or close them as quickly as the data is done but I've posted asking for help on this and never found something conclusive.

Then there has also been talk about using ramdisk but still not conclusive enough to take any steps.

I guess after running these cleaning commands to see if that helped and reviewing what you've also added now is where we'll be tomorrow.
Issues with tables show up differently. If not mistaken,
Often, data cannot be added.or changed
The error entry might also point to table corruption.

I think as the non indexed joins pt large tables, the non indexed queries of large tables are adjusted it may improve performance potentially eliminating these issue or help point to something else that ...possibly part of code accessing the data.
I.e. does the code in use include selreporting,  logging of events?


The largest db is 1GB
1 GB or 1TB ?
1GB is ...nothing...its the size of my Dbs running on my TV Box (Armbian)
we are a little off topic here but since you ask

<< time wait ... know for a fact >>

yes : that happens when your server shuts down connections. on a normal tcp stack, this should last for a couple of minutes after you close any tcp connection. note that local clients had better use the socket but are not impacted anyway since localhost connections behave differently.



<<I was trying to understand if I should close the connection as fast as possible or keep it open so the client doesn't have to open another one>>

if you can afford to leave the connections open or expect little to no gain from closing them, don't. something that performs an insertion every 10 seconds or less definitely should use a persistent connection. once per hour is more debatable.

if you have sufficient ram and provisionned for the max number of connections during busy hour, there is no point in trying to limit their numbers during non busy hours as they cost nothing. if you have a huge number of clients each running a few daily queries, the situation is the opposite.

such issues actually depend on the workflow but the general rule would be keep them open unless you have a very good reason to do things differently. use persistent connections. use connection pools if possible. or even play with the mysql proxy. at your own risk.



__ are you in a galera cluster or is this a single machine ?
__ what about the number of read/write threads and the physical disks ?
i would probably use software raid 10 of at least 8 drives on barebone or possibly dedicated disks given the rest of the specs. on a bsd system, probably a similar setup with a zfs pool without a filesystem or file_per_table and directIO.



<<Some are simply long queries and can't be helped>>
unless you run queries that perform complex analytics, this is probably not true. the number of joins without index and the number of temporary tables (on disk for many of them) is a clear indication you are missing some optimisation. and even complex analytical queries usually can be avoided by building said data incrementally. or run offline on a replica. it is not sane to mix long analytical queries and shorter ones. also, you can change the isolation level to use non or less locking mechanisms such as read uncommitted (if suitable).


__ there are good advices in the table you posted
namely your innodb log file is obviously too large



<<Too many sorts, lots of rows being sorted, too many joins without indexe>>
yes. as most dbs, the first thing you need to focus on is misuse. the mutex contention is probably related to the bunch of temporary tables all over the place combined with saturated disks. you mention the os is not stressed. it is probably dying under the I/O load as far as i can guess.
THE PROPER WAY TO DEAL WITH THIS IS USING QUERY PLANS AND IN YOUR CASE THIS IS WHERE YOU NEED TO START. beware most devs cannot read them and very few of them understand how a db works under the hood and how to use it without killing it.



<<is it possible the install itself is somehow messed up>>
i seriously doubt that


<<ramdisks>>
i have no idea why you got this advice but as a general rule, databases won't benefit from playing with ram disks and the risk of data loss is huge. the exception is when you have small databases which you can afford to run entirely from memory and can afford to loose due to replication or some architectural twist. this happens seldom to never as such cases essentially indicate dbs are not the right tool for the job.
multiple buffer pools and IO threads or playing with thread concurrency probably will help as well but this needs to be experimented with AFTER dealing with the bulk of bad queries and knowing the server specs. also note that the results can be excellent or very disappointing.
@John, yes, the largest tables are only 1GB. We offload raw data that isn't needed for day to day onto other servers.

@arnold, I've asked the dev to take a look at this question.

@skullnobrains,
 
>i have no idea why you got this advice but as a general rule, databases won't benefit from playing with
>ram disks and the risk of data loss is huge

This point was definitely mentioned when this came up.

>multiple buffer pools and IO threads or playing with thread concurrency probably will help as well but this
>needs to be experimented with AFTER dealing with the bulk of bad queries and knowing the server specs.

What do you mean by knowing the server specs? Is there something else I should be sharing and haven't?

>__ are you in a galera cluster or is this a single machine ?

Single machine. I want to solve this problem before doing anything else. The next step is to either go with a galera cluster we have or add another DB server as master/master or master/slave, what ever the case will be when that next step comes.

I feel what ever this problem is, it needs to be solved before adding more complexity.

>__ what about the number of read/write threads and the physical disks ?
>i would probably use software raid 10 of at least 8 drives on barebone or possibly dedicated disks

Sure, I can always add more hardware but I don't think that's the problem right now.
This server has 256GB of memory of which only 50GB is being used and 1TB of SSD space. The DB in this application is only around 6GB in size so the host has plenty of resources.

You mention the InnoDB file is too large.
innodb_log_file_size           = 1G

It used to be much larger but I lowered it because recovery time took way too long.
innodb_log_file_size should be (=26G) if possible, so InnoDB total log files size equals to 25% of buffer pool size.

This is how I had set it before (based on mysql-tuner advise) to test.
Way too much and now you're saying 1G is too much?

The dev and I just found something interesting and will update shortly.
Right now, for no obvious reason, load on web servers went to 100 stopping all clients from reaching the DB. The DB server load is 2.xx only and only 58GB of memory used. Nothing obvious what so ever and nothing in the mariadb log or anything else.

Screen cap of web server

User generated image

Two web servers which aren't even in the load balancer, on standby with web servers/php running are both at 100 also. They aren't even serving anything. This is super weird.
makes little to no sense given top says it is 99.7% idle.

Do you have dos/DDOS shielding mechanism?
you have to look at the web log to see whether it is getting inundated with request/connection, or merely opens up TCP connection, tying up resources.
# ps aux | grep www -c
101
# systemctl reload httpd php-fpm
# ps aux | grep www -c
21

Load is going down after reloading the web services.
Doesn't answer why a server that isn't even being used went to high load when it was sitting idle, not accepting connections. Yet those and the live ones showed the exact same number of php-fpm results.

The one thing they have in common is using an NFS share but that share was always available even when this happened.

When I originally posted trying to figure out what is happening, I suspected NFS share and other network / communications things but all checked out.
I also checked and tested apache.php configs but those checked out as well.

How could a web server that is not even included in the load balancer even do that?

The external communications things that all of these servers have in common would be the web share and a zabbix client.

All day long, I'm seeing these on the DB server

Aborted connection ... Got timeout reading communication packets

Something is preventing communications and maybe the problem really isn't DB other than the DB could always use a little extra tuning. I'm stumped.




>Do you have dos/DDOS shielding mechanism?

Yes, fully protected.
Here we go, found something.
This is exactly when the problem started.
It's blaming the NFS share but NFS was definitely up and running as we see data kept coming in from other servers using the same share. To me, this is indicating a problem but maybe it's the symptom, not the cause.

May  5 15:20:00 cp01 kernel: lockd: server tn01 not responding, still trying
**snip**
May  5 15:20:31 cp01 kernel: lockd: server tn01 OK
May  5 15:20:32 cp01 kernel: lockd: server tn01 not responding, still trying
**snip**
May  5 15:22:11 cp01 kernel: lockd: server tn01 OK
May  5 15:22:12 cp01 kernel: lockd: server tn01 not responding, still trying
**snip**
May  5 15:25:31 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
May  5 15:25:31 cp01 kernel: lockd: server tn01 not responding, still trying
May  5 15:25:31 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
**snip**
May  5 15:25:31 cp01 kernel: lockd: server tn01 not responding, still trying
May  5 15:25:31 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
May  5 15:25:31 cp01 kernel: lockd: server tn01 not responding, still trying
May  5 15:25:31 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
**snip**
May  5 15:25:31 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
May  5 15:25:31 cp01 kernel: lockd: server tn01 not responding, still trying
May  5 15:25:31 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
**snip**
May  5 15:25:31 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
May  5 15:25:32 cp01 kernel: lockd: server tn01 OK
May  5 15:25:32 cp01 kernel: lockd: server tn01 OK
May  5 15:26:32 cp01 kernel: lockd: server tn01 not responding, still trying
May  5 15:26:32 cp01 kernel: lockd: server tn01 not responding, still trying
May  5 15:27:38 cp01 kernel: lockd: server tn01 OK
**snip**
May  5 15:27:38 cp01 kernel: lockd: server tn01 OK
May  5 15:28:38 cp01 kernel: lockd: server tn01 not responding, still trying
**snip**
May  5 15:28:38 cp01 kernel: lockd: server tn01 not responding, still trying
May  5 15:29:44 cp01 kernel: call_decode: 90 callbacks suppressed
May  5 15:29:44 cp01 kernel: lockd: server tn01 OK
May  5 15:30:18 cp01 systemd-logind[1019]: New session 6 of user root.
May  5 15:30:18 cp01 systemd[1]: Started Session 6 of user root.
May  5 15:30:59 cp01 kernel: lockd: server tn01 OK
May  5 15:33:38 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
**snip**
May  5 15:33:38 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
**snip**
May  5 15:33:38 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
May  5 15:33:39 cp01 kernel: lockd: server tn01 OK
May  5 16:07:06 cp01 systemd[1]: Reloading The Apache HTTP Server.
May  5 16:07:06 cp01 systemd[1]: Reloading The PHP FastCGI Process Manager.
May  5 16:07:06 cp01 systemd[1]: Reloaded The PHP FastCGI Process Manager.
May  5 16:07:06 cp01 systemd[1]: Reloaded The Apache HTTP Server.
May  5 16:07:06 cp01 httpd[1442]: Server configured, listening on: port 443, port 80
May  5 16:10:18 cp01 kernel: rpc_check_timeout: 89 callbacks suppressed
May  5 16:10:18 cp01 kernel: lockd: server tn01 not responding, still trying
May  5 16:10:18 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
May  5 16:10:18 cp01 kernel: lockd: server tn01 not responding, still trying
**snip**
May  5 16:10:18 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
May  5 16:10:55 cp01 kernel: rpc_check_timeout: 182 callbacks suppressed
May  5 16:10:55 cp01 kernel: lockd: server tn01 not responding, still trying
May  5 16:14:11 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
May  5 16:21:38 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
May  5 16:21:43 cp01 systemd[1]: Starting Cleanup of Temporary Directories...
May  5 16:21:43 cp01 systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
May  5 16:21:43 cp01 systemd[1]: Started Cleanup of Temporary Directories.
May  5 16:22:13 cp01 systemd[1]: Starting dnf makecache...
May  5 16:22:14 cp01 dnf[62326]: CentOS Linux 8 - AppStream                       20 kB/s | 4.3 kB     00:00
May  5 16:22:14 cp01 dnf[62326]: CentOS Linux 8 - BaseOS                          20 kB/s | 3.9 kB     00:00
May  5 16:22:15 cp01 dnf[62326]: CentOS Linux 8 - Extras                          20 kB/s | 1.5 kB     00:00
May  5 16:22:15 cp01 dnf[62326]: Extra Packages for Enterprise Linux Modular 8 -  52 kB/s |  19 kB     00:00
May  5 16:22:15 cp01 dnf[62326]: Extra Packages for Enterprise Linux 8 - x86_64   53 kB/s |  16 kB     00:00
May  5 16:22:16 cp01 dnf[62326]: Remi's Modular repository for Enterprise Linux  5.5 kB/s | 3.5 kB     00:00
May  5 16:22:17 cp01 dnf[62326]: Safe Remi's RPM repository for Enterprise Linux 5.8 kB/s | 3.0 kB     00:00
May  5 16:22:19 cp01 dnf[62326]: Webmin Distribution Neutral                     1.4 kB/s | 2.9 kB     00:02
May  5 16:22:19 cp01 dnf[62326]: Zabbix Official Repository - x86_64              39 kB/s | 2.9 kB     00:00
May  5 16:22:19 cp01 dnf[62326]: Zabbix Official Repository non-supported - x86_  82 kB/s | 3.0 kB     00:00
May  5 16:22:20 cp01 dnf[62326]: Module yaml error: Unexpected key in data: static_context [line 9 col 3]
May  5 16:22:20 cp01 dnf[62326]: Module yaml error: Unexpected key in data: static_context [line 9 col 3]
May  5 16:22:20 cp01 dnf[62326]: Metadata cache created.
May  5 16:22:20 cp01 systemd[1]: dnf-makecache.service: Succeeded.
May  5 16:22:20 cp01 systemd[1]: Started dnf makecache.
May  5 16:25:23 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107
**snip**
May  5 16:25:23 cp01 kernel: xs_tcp_setup_socket: connect returned unhandled error -107

Open in new window

What do you mean by knowing the server specs? Is there something else I should be sharing and haven't?
are you running a galera cluster ? is this a vm ? what is your disks setup including raid levels ?
on a 48 core machine, you would normally match the io threads with the expectable i/o

if you actually use a single 1Tb drive, there is no point in bothering to increase anything else. that many cores and quantity of memory is way too much for a single drive setup. if you want to monitor something hardware related, check disk I/O.

--

you are correct regarding solving this first. and solving it starts with cleaning up the rogue queries and missing indexes. you may have reasons to think this is expected but unless you run lots of analytical large queries, there is no way that many full scans and tmp tables are normal.

note that a galera cluster of 2 members is a bad idea but you can use garbd to emulate a 3rd member and get a working master-master setup.

--

i misread, sorry : 1G log file size is reasonable. actually i would probably use around 4 or maybe 8 given the other parameters. 1 or 2 tops with a single drive. 25 is waaay too much.

--

Yes, fully protected. 

that is totally unreasonable. shielding against a properly crafted dos attack is not feasible. believing you are implies a clear misunderstanding of said attacks.

--

load of 100
are you talking of load average ? if that is the case and the CPU is still low, you have an issue with I/O. if the cpu is the one reaching 100, you probably have a loop in whichever program is using 100% cpu.

you need to keep one of the spare web servers alive when the issue arises if you want to debug properly. if fpm is the one at fault, you can identify the pool based on the process name in most cases and use lsof in others. if you are lucky enough, you'll even know the script using lsof. ltrace can usually help to pinpoint the location of the loop if it is not obvious.

--

also note that serving web files from nfs is quite a bad idea. the issues will vastly depend on nfs 3 or 4 and nfs settings such as hard vs soft mounts and timeout but you WILL experience issues.
sorry we cross posted.
the lockd message indeed indicates an issue with nfs. nfs3 over udp if i am not mistaken.
this setup will break but not entirely when you loose merely a few packets.
there is nothing to improve though tcp and nfs4 might help reducing the frequency of the issue.
serving web files over nfs is simply a bad idea.
We noticed this in some other servers using the same share.
/var/log/messages-20220410:Apr  7 15:42:41 cp159 kernel: lockd: server tn01 not responding, still trying

And looking at the web server that is not serving, it also shows the same errors at the exact same time.

**snip**
May  5 15:23:00 cp02 kernel: lockd: server tn01 not responding, still trying
May  5 15:23:02 cp02 kernel: lockd: server tn01 not responding, still trying
May  5 15:25:01 cp02 systemd[1]: Started Session 1987 of user root.
May  5 15:25:01 cp02 systemd[1]: session-1987.scope: Succeeded.
May  5 15:25:34 cp02 kernel: xs_tcp_setup_socket: connect returned unhandled error -107

So, this seems to be an NFS problem BUT, we moved the files to the local server the other day and the problem showed up. I think we'll get the NFS out of the mix to see what happens. Doing it now.

note that when nfs makes a mess, the load average will reach crazy values but the cpu will usually stay low. this is the best indication you can get that you experience nfs issues as nothing else besides a dying disk or disk controller will cause the same behavior. no idea whether that situation matches yours. nb : the above would clearly indicate nfs issues but nfs issues can also manifest differently.
>Are you running a galera cluster ? is this a vm ?
>what is your disks setup including raid levels ?

This server is a proxmox host with one single virtual machine running on it, this mariadb DB server.
The disks are ZFS RAID1.

I was considering using the entire blade to run the DB but I decided to go with a vm for better flexibility since soon enough, we'll be changing to something else, cluster or some other redundancy. it's kind of a temporary thing.

>on a 48 core machine, you would normally match the io threads with the expectable i/o

Have not really gotten to higher levels of optimization due to all of the problems we've been experiencing but those kinds of things, I'd like to get too once this is solved.

This blade only has 4 slots though I think it can have six so that would be the max number of drives I could install on this.

>solving it starts with cleaning up the rogue queries and missing indexes. you may have reasons to think
>this is expected but unless you run lots of analytical large queries, there is no way that many full scans
>and tmp tables are normal.

While there is a lot of constant traffic, it's not analytical, it's mainly read with 10/15% writes.

>note that a galera cluster of 2 members is a bad idea but you can use garbd to emulate a 3rd member
>and get a working master-master setup.

The cluster we have is three nodes. What I was saying is once this is solved, I have to look at what will be next, either a cluster or a dual DB setup like master/master or master/slave. We need redundancy and this is especially needed right now due to this problem.

>Yes, fully protected. that is totally unreasonable. shielding against a properly crafted dos attack
>is not feasible. believing you are implies a clear misunderstanding of said attacks.

I should have said we have full DDOS monitoring and some limitations possible.

>you need to keep one of the spare web servers alive when the issue arises if you want to debug properly

This is the interesting part. I took down the load balancing for the servers that are affected so there is only one server handling all requests at the moment. However, when things exploded again a while ago, that server and the others that were not handling requests all went to almost 100 load.

In the other questions I posted, I didn't think it was the DB but am not 100% sure so wanted some input. While the DB can always use some tuning and still will once this is solved, the problem has not been clear. On one hand, it seems to be an NFS issue, on the other, it seems to be a DB issue. Maybe it's both but one is worst than the other.

>if fpm is the one at fault, you can identify the pool based on the process name in most cases and use lsof
>in others. if you are lucky enough, you'll even know the script using lsof. ltrace can usually help to
>pinpoint the location of the loop if it is not obvious.

I think this might fall into the tuning area again because I spent quite a bit of time digging into that as well thinking there is a configuration issue with Apache/PHP. I tried various changes and the problem still showed up so I discounted those items as the cause.

>also note that serving web files from nfs is quite a bad idea. the issues will vastly depend on
>nfs 3 or 4 and nfs settings such as hard vs soft mounts and timeout but you WILL experience issues.

The reason is because there are several servers that would have to be kept in sync in terms of their files.
Using NFS, we can update one set of files and all servers are updated.

The files are only read, nothing ever changes on those other than updates which is why they are centralized.
We did do testing on that as well. I removed the NFS share for these web servers and had all of the files directly on the server. The problem still showed up.

However, as of today, it sure is starting to look like it's something with the NFS causing this because as mentioned in another comment, servers that are connected to the NFS but not being used right now all went to near 100 load when this happened again earlier.

>note that when nfs makes a mess, the load average will reach crazy values but the cpu will usually stay low.
>this is the best indication you can get that you experience nfs issues as nothing else besides a dying disk
>or disk controller will cause the same behavior.

>no idea whether that situation matches yours. nb : the above would clearly indicate nfs issues but nfs
>issues can also manifest differently.

At this point, only one web server is serving, no load balacing in the mix.
This server has the files locally and no NFS mount.
I kept the rest of the servers not taking requests but connected to NFS.

The point is to see if this happens again and if it affects only those NFS connected servers even if not serving files or if it affects the single server serving files.

Now comes the fun part.... as mentioned above, files are directly on the web server.
On the DB server, I'm still seeing
'Aborted connection - Got timeout reading communication packets'

What weirdness. Hard to fix mystery problems.
>the lockd message indeed indicates an issue with nfs. nfs3 over udp if i am not mistaken.
>there is nothing to improve though tcp and nfs4 might help reducing the frequency of the issue.
>serving web files over nfs is simply a bad idea.

Didn't think it would be since the files are mainly only read.
More aborted connections to the DB, some are from servers that don't have any NFS shares BTW.

The very odd thing is that even if this is an NFS problem, why is it not affecting all of the NFS connected servers?
The set that is data only are never affected nor are some others which are what we call control servers.

The only ones affected are the ones that folks use to reach their control panel.

I could eliminate the NFS shares in the mix by creating a method that lets each web server know when there is a file update (web pages). Each server could then connect to the NFS share, rsync their files to local then disconnect from the NFS share, thus serving files locally.

Of course, then I start thinking about all these servers, all these operating systems to maintain, all these files all over the place...
https://www.experts-exchange.com/questions/29238988/Cluster-running-same-OS.html


Updated above, please refresh.
With 6Gb Database we are talking peanuts in enterprise level.
I think your need to narrow down your possible problems and the most likely seems to be the application...somehow it looks like it enters a deadlock and thus causing collapse to everything else.
I guess that you should try to emulate the whole case to lab environment ...maybe even push everything to cloud to be 100% that is not something infrastructure related and start "hammering" . I don't know what platform/application you have but one way or another you should be able to somehow simulate the traffic ...If it works you have something in your infrastructure ...if not the application is the issue.
Everything 'seems' like it could be this or that but that is what we are trying to narrow down before spending countless hours in the wrong direction.

Last night, I decided to disconnect NFS from all the web servers that were being affected.
I also left only one web server running and again, with its web files locally stored.

This morning (and still digging), there don't seem to have been any problems overnight but I see this in the log for the 6th, today. This was around 3/4am.

kernel: xs_tcp_setup_socket: connect returned unhandled error -107
kernel: lockd: server tn01 OK

Open in new window

Considering that the NFS share is disconnected, it is strange to see these errors in the log now.
I checked the other servers, just in case there is a process or something running that used NFS and there aren't so yet another mystery.

The db is not stressed at all, no limits reached, barely any use. Usually, when things are calm, the data flow is very constant and there are quite obvious highs and lows.
We found some things yesterday that may explain why the reports keep talking about indexing and it might be related to a schema issue. The dev will be digging into that.

As of this morning, I've removed the NFS shares from all servers so their files are local.
Interestingly, when I stopped the web services on each server, I saw these errors on the DB server.

[Warning] Aborted connection to db. (Got an error writing communication packets)
I assume this is because the web server was closing connections that it had open to the DB server.

Meanwhile, as I watch the DB log, I continue to see these.
[Warning] Aborted connection to db. (Got timeout reading communication packets)

Now to let a little time pass and see what happens. It seems there are three things happening.
One was NFS causing problems.
Two is still unknown, the timeouts.
Three, more configuring of the DB server once things calm down. I don't see anything glaring that could be breaking the DB server, just some things that could be improved.

If I'm missing something, I'm more than appreciative to know about it :).

Keep postponing reading the more recent comments as this interface os not optimal.

Double check the output of mount
Not clear what your setup file system

Is tn01 this system or.another node?

What is the file system type?
Might the system be using drbd
Do you have some rsync setup?

Rsync is how we've been keeping web server and other files synced up but it's a PITA since it needs ssh.
Because we've been moving things around so much lately, I thought it would make sense, at least for a while to use centralized NFS shares.

Mount doesn't show anything it should not be, at least at the moment.

The tn01 is a physical server dedicated to running TrueNAS-12.0-U6.
I installed some 90GB of memory, 10 SAS SSD disks set up with a main pool that is dual 5 disk mirrors and an Optane ssd for slog.
It's got 10G Ethernet ports and using iperf, I keep an eye on things and seeing over 8Gbps from physical hosts and vms on those physical hosts across the network.
The OS is FreeBSD which doesn't support DRBD as far as I know.

The NFS service on TN can use NFSv4 and a few other options such as using UDP.
It can also server iSCSI shares and others.

I just noticed this on the server while digging around.
"Specify how many servers to create. Increase if NFS client responses are slow."
"Keep this less than or equal to the number of CPUs to limit CPU context switching."
This TN server has 32 cores and the default setting was set to only 4. Maybe it wasn't able to keep up.

Looking back at your logs, lockd refers to NFS locks not DB related issues.
NFS is a fairly stable and common protocol used from netfilers, to linux systems.
it could very well that the NAS and NFS was under resources for the number of systems that were relying on it.

i.e. the top you posted some time ago with the load pegged at 90 while the state reflected 97% idle

if you can double the cpu allocation to 8 on TN01 and see. if it makes a difference.
Yes, I raised the CPU count to 16 but I'm not using NFS at the moment to try to keep isolating what ever is going on.

Yet another odd thing just happened. I was on the DB server making a backup of the databases in case of a crash today.
No problem using mysqldump but as I was tar'ing up the files, the server locked everything up again.

The load went to over 800 by the time I noticed it so all members got the dreaded gateway timeout error. No NFS in the mix.

There are no hardware errors and nothing in the logs showing any resource exhaustion but so I'm wondering if I have to suspect proxmox.

 Definitely losing members over this mess.

User generated image

Unfortunately, trouble shooting issues, requires extensive knowledge of the environment and setup. As you go through the suggestions, and find a log entry or provide some additional info, I am, and likely the rest of us are trying to navigate to hopefully the item that you can confirm is the cause of the issue, at which point the remedy might be addressed.
Until you can identify what is causing your issues, it might be a try this, what happened now try that.

The question also includes suggestions on how to mitigate/address certain other issues becoming somewhat more difficult in terms of is the suggestion to resolve the issue at hand, or to provide what should be considered to possibly avoid this type of issue in the future, i.e. setup a more resilient, redundant setup that can handle the higher demand , number of requests.

To recap, this was to address a web site/Maria Db with some 6GB of databases that logs reflected errors locking.
IT used NFS for somethings./RSYNC to synchronize web site portal.
The DB setup generates non index joins, queries, a concern, but if the tables in question are small, might not be an issue.

Proxmox shows this while things are normal.
The high CPU shown is normal, it's just when I happened to capture this.


User generated image

>Until you can identify what is causing your issues, it might be a try this, what happened now try that.

Yes, fully understanding this, looking for leads, trying to eliminate and so on.
It's exactly what's happening and the help being offered is definitely nice as we try to figure this out.
Another odd thing is that when this DB was moved from a physical server recently, we were already experiencing the gateway timeouts which I believe is one of the posts I listed way above.

I'm sorry to keep adding messages but when editing previous comments, the changes can be missed.

I gave this vm 125GB of memory of the 256GB on this blade.
I notice that proxmox says we're using over 80% of the memory continuously yet looking on the db server itself shows a maximum use of 55GB or so.

And the timeouts were happening on the previous server even when the memory/load was not exhausted.

Truly confusing.

This is one of those situations where someone comes along and goes 'I saw that once, this is what we found'.



Got another lead.

I dumped the database.
# time tar cvf dump.tar dump.sql
dump.sql

I lost connection to proxmox. Load was only showing 28 form the CLI of the DB server.
Memory seems to stay around the same, 80% used according to proxmox.

^C^C^C^C^C^@^C
Message from syslogd@db01 at May  6 12:31:28 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [mariadbd:38124]

Open in new window


Looking on the net, there are some posts talking about a kernel error in proxmox that doesn't 
seem to be fixed yet. My version is 7.1.7.

So maybe I have two problems, proxmox and what ever this gateway timeout thing is since 
the last server as well.

Open in new window


That event might correspond to a slow query.

Does the proxmon platform you are using, can you prioritize VMs i.e. the VM with the mariadb should have higher if not highest priority for memory, storage access, etc.
The prioritization, scheduler of resources might not be what you need it to be.
Often, it might be balanced meaning each VM has the same amount of time/priority. see if possible to raise the DB server priority.
Whew, this is probably going to be the longest question in the history of this site. Maybe I'll win a prize or something.
I have to admit, it's good to have to write all this down because it's not only a history of what's happening and could help someone else but it's also nice to know there are others reading it that at points here and there could notice something we're not.

I fired up another proxmox host with nothing on it yet and noticed this.
User generated image
Forums posts on proxmox seem to indicate there is a known problem with the CPU bug I posted about so maybe these two (above image) are related.

I've been able to eliminate some issues and have two main ones to focus on at this point.
-No known reason for gateway timeouts to members since it's happened across a physical server before being rebuilt into a proxmox vm.

-No confirmation if the problem is actually with proxmox or hardware but I'm not seeing any hardware error messages anywhere. If the problem is with proxmox, it's only manifesting itself on this instance because I have several other blades running the exact same proxmox versions and none of those have any problems to date.
Mind you, none of them have this amount of the hardware's resources assigned either.

In this case, the hardware has 48 cores and I've given the DB 44 of those leaving the rest for the OS. However, maybe that is a problem. I could change that but only the next time I have a problem or when I absolutely have to restart the service.

Options

I could migrate this vm to another host then rebuild this host as an ESX 6.7 or install Debian/BSD so I can take advantage of ZFS as I prefer not to use the hardware RAID built into the hardware.

Once built, to save on down time, I could prepare the existing server to become a galera node and make the new DB server the master. Once I got them communicating, rather than spending an hour down, the existing server would update the new one. At that point, I could point all clients to the new server using DNS or proxysql.

I could then destroy the current DB and rebuild the blade into a second DB server so I could run master/master and get to the redundancy we need.
Or, I could keep them as they are, one physical and one virtual and do backups and everything on the physical hardware so as not to push proxmox into what ever this problem is. It seems to run just fine when it's not having to do anything else.

After a beer (or multiples), I could try to figure out what's going on with the timeouts assuming the current problems went away.

Or maybe someone has a better idea :).

@arnold, it's the only vm on this host. I purposely went with a vm with most of the hosts resources just to have the flexibility of a vm.

@John,
>I guess that you should try to emulate the whole case to lab environment

We actually have a dev environment but cannot replicate the problem. At this point, it's clear that what ever is happening it's on the db server itself but not with the databases.

If just using tar to put some files into a single file crashes the server, then that is a good lead.
The one thing I noticed is that usually, when a Linux server runs out or resources, it just complains about it like 'exhausted memory' for example but in this case, no such warnings/notices which is a bit peculiar.


Often, memory is more valuable then number of cores for resource. the other there is no need to over allocate CPU as it might be overcompensating for a prior decision.
Deals with whether you go from side to side, versus building a knowledge of how much resources are needed for a specific task....
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
percona toolkit also has a slow-query log analyzer.   pt-query-digest.
or alternatively: pt-index-usage   same query log, different angle to grind it.
(also read both tool's man pages).
It also pulls in the indexing etc. used for queries if available. It should  also show the worst query first....
If dev works fine just swap dev to production and monitor.   Yes it should be slower but for 6gb db I don't expect too much .
If the dev -- >production works fine  then you should know what is wrong
@noci;
>There is overhead involved in emulation, the Host will show including overhead, the guest won't
>know about the overhead.

In all the turmoil, I didn't think once to log into the host itself and monitor it's resources from the CLI rather than from the GUI. I thought I gave the host plenty with 4 cores and it's got 128GB of memory left over that nothing is using.
I failed to look into what the hypervisor might need but have always read the OS's are tiny, highly efficient, not requiring much for its own functions.

>percona toolkit also has a slow-query log analyzer.   pt-query-digest.

Yes, but since just tar alone causes the vms load to go crazy within seconds of using it, would I not be looking in the wrong place looking at db configs and performance right now? When it's working properly, it's usually well under 5 load, jumping to 40 or so with the heavy processes that we need to analyze later.

@John,

>If dev works fine just swap dev to production and monitor. Yes it should be slower but for 6gb db I don't expect
>too much . If the dev -- >production works fine  then you should know what is wrong

Our dev environment is more for testing code and while we can and do load the production data into the dev env, it's not the same as production at all and would not tell me what's going on with this server/db instance anyhow.

I have however come to the conclusion that something has to be done since even in this long question, there is nothing obvious that says here's what is happening.

We've had so much down time that I need to not touch the production stuff right now. Moving to test will cost at least one hour of down time. Even though the prod databases are small, it takes about an hour to import.

We offload what we consider old data every day onto other DB servers where performance isn't as important so once we tune the production stuff, things should fly nicely.

Just to add to the fun, there is a power failure in my area so now I'm working using my cell as a hot spot.
This is when real world experiences come in handy for us. I can't get any more than 2.5 to 3.5Mbps on this 4G phone and those wireless companies keep on bragging about how fast their networks are. <sigh>.

I was going to spend the day testing some ideas. I built a freebsd server last night and was going to try and find a way to simulate a heavy load on it. If it worked fine, I was going to migrate the DB vm over to another host so I could rebuild that server into a single OS rather than a hypervisor.

I don't want to use the hardware RAID on the server, I'd rather use something that has ZFS on it so am limited on OS choices.

We've been trying to get to a fail over solution but never seem to get there. Now I'm not sure if I should use a galera cluster with proxysql on each client or simply two DB's on two physical blades being master/master for example.
Cannot get to that until what ever is happening now is solved of course.

The joys of IT... late nights, weekends, friends and family wondering why you're working. If only people understood what it takes to keep this digital world going :).
Maybe the problem is resource exhaustion and nothing else. Someone asked if we had looked at the hosts load or only the vm. In all that was happening, we didn't think of checking the host from the command line because we were busy looking at the vm and the proxmox gui.

I took a look and see the following.
The top one is the vm, the bottom one is the host with only the single vm running on it.
Is proxmox reserving that 162GB it's showing?

Yesterday or the day before, each time we would run tar, it would bring the system down to it's knees. I decided to watch both the vm and host loads while running a tar and this time, it didn't bring anything down but it did slow it down a little and again, with not much additional load while watching the top on both.

However, something different happened this time which I share below. Had not seen that error any other time until now. Could it be related to what's bringing the system down? Happened at the exact same time as I ran tar.

User generated image
Looks like the pmm agent is causing some sort of problem.

May  7 16:54:08 db01 pmm-agent: #033[36mINFO#033[0m[2022-05-07T16:54:08.513-07:00] 2022-05-07T23:54:08.512Z#011error#011VictoriaMetrics/lib/promscrape/scrapework.go:354#011error when scraping "http://127.0.0.1:42001/metrics?collect%5B 
%5D=buddyinfo&collect%5B%5D=cpu&collect%5B%5D=diskstats&collect%5B%5D=filefd&collect%5B%5D=filesystem&collect%5B%5D=loadavg&collect%5B%5D=meminfo&collect%5B%5D=meminfo_numa&collect%5B%5D=netdev&collect%5B%5D=netstat&collect%5B%5D=proc   esses&collect%5B%5D=standard.go&collect%5B%5D=standard.process&collect%5B%5D=stat&collect%5B%5D=textfile.hr&collect%5B%5D=time&collect%5B%5D=vmstat" from job "node_exporter_agent_id_5d859f79-ba0b-4aea-af91-d51a5a4a4fb5_hr-5s" with lab   els {agent_id="/agent_id/5d859f79-ba0b-4aea-af91-d51a5a4a4fb5",agent_type="node_exporter",instance="/agent_id/5d859f79-ba0b-4aea-af91-d51a5a4a4fb5",job="node_exporter_agent_id_5d859f79-ba0b-4aea-af91-d51a5a4a4fb5_hr-5s",machine_id="/m   achine_id/cc4015d81dd64439ae059694373f1630",node_id="/node_id/2fc49a42-7c1a-459b-9d8c-5c6311b78ec2",node_name="db01.x.loc",node_type="generic"}: error when scraping "http://127.0.0.1:42001/metrics?collect%5B%5D=buddyinfo&collect%5B%   5D=cpu&collect%5B%5D=diskstats&collect%5B%5D=filefd&collect%5B%5D=filesystem&collect%5B%5D=loadavg&collect%5B%5D=meminfo&collect%5B%5D=meminfo_numa&collect%5B%5D=netdev&collect%5B%5D=netstat&collect%5B%5D=processes&collect%5B%5D=stand   ard.go&collect%5B%5D=standard.process&collect%5B%5D=stat&collect%5B%5D=textfile.hr&collect%5B%5D=time&collect%5B%5D=vmstat" with timeout 4.5s: timeout  #033[36magentID#033[0m=/agent_id/1b0d973f-f754-43ce-9731-4c04909f3331 #033[36mcomp   onent#033[0m=agent-process #033[36mtype#033[0m=vm_agent


Open in new window

This looks like info gathering
scan the log for pmm-agent and see if there is a five minute interval when this event occurs.
at 16:54 collected polling

when you run anything extensive on the host, it takes precedence over scheduleing of VM resources. in any Virtualization enviroonment.

host shares resources with the hypervisor,
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
if both tops are at the sane time, you probably emulate more cpus than you have in the vm. this may be the indirect cause of the mutex related messages from your other thread. and is something you should hardly ever do and clearly not in this situation.

It's just interesting how the percona client just happened to log at the very instant that I ran tar.

@John,
>And that's about it...the real problem is that you don't get a clear indication of what is wrong...you get
>all kind of signals but nothing like..."The database has issue because this occurred"

Agreed. I hoped maybe some clarity would come over the next few days of simply monitoring and maybe finding a lead but no more. Anyhow, yes, I'm done with this. Today, I'll prepare a couple of hosts and get going on these changes and go bare metal for db servers again.

>Side note: from the old server to the new server did you bring something along like an SSD , a CF or whatever.

Sorry, I'm not 100% clear on the question. Text will do that :).

Everything is physically two hours from my location. Are you asking if I have a way to backup?

In our situation, we do backup but the backups are for extreme cases only. Data is flowing in 24/7, non stop so backups would have missing data. This is partly how all this started, working toward a db redundancy solution.

The only option is to put the app into maintenance mode and stop all servers receiving data as the clients will continue to try sending until the services are back up. It's an interesting rush of traffic to see when this happens.

At that point, I could use the NAS storage to do what ever I need. Typically, I'll export the databases including the mysql one which is the first one I add back onto a new server then re-import the data.
Assuming the db services are ready to go before shutting down, I can expect the whole process to take around an hour.


@skullnobrains, no problem it's a very long thread.

i hage no time to read through everything so sorry if i repeat what someone else said.

>rsync does not require ssh. you can also use rsynd. there are many other ways to sync including
>wget mirror which i use very frequently. even cp or rsync between an nfs mount and local files will
>work decently. also nfs using soft mounts through automount would cause less issues.

Based on the other question and your responses, I now have a path in terms of how to not only keep things in sync but also how to deal with server sprawl in a uniform way.

>i believe your apparent resource issues are related to zfs. vitualising a db is not a very good idea. doing it
>over zfs and proxmox is a terrible one as zfs on linux is NOT mature enough for busy servers. at least
>not without lots of tweaking.

We became overly confident and trusting of proxmox after using it for a few months now and really liking it.
It seems perfect for everything else we're running but not db as you say.

As mentioned, at this point, I'll prepare two bare metal hosts dedicated to running mariadb only.

I like to run RAID1 at least so that if a drive fails, I don't have to instantly drive to the data center.
My options will be the hardware RAID which I have cache cards installed for or using a ZFS based OS like BSD or Debian.
I built a FreeBSD host the other day just to play with it since it's been a long time. I've used it but rarely.
Same for Debian, not a lot of hands on time but then once the DB is set up, there's not much to do other than keep the OS updated, tweak, tune, etc. It won't be serving anything else.

I prefer to use an OS that does it during install as I've not tackled manually creating a ZFS partition and nervous about adding too much complexity in this change. I'll have to do some reading before I do all this.

I am just asking if you have some "piece" equipment that you reused from the old server...e.g for ESXi is common to load it from a USB stick to save/separate from the disk bays
In such a strange case there is always the probability that something trivial is simply ruining everything.
I had a strange strange case years ago ... A server that was kind of decommissioned..but I wanted to use it for testing...so I formatted it and loaded Windows 2003 with almost nothing...it would work for couple of days..then freeze for good...reformatted with some Linux distro....it would go on for weeks... again Windows.. again freeze.. again Linux.. again working...I was pissed..I took it apart.
.some capacitors had issues....tossed it for good.
But the strange thing was that under Linux I didn't have issues...till today is my biggest mystery .

So maybe something that seems irrelevant might be the culprit for the erratic behaviour...to be honest I would suggest to even transferring the database to Windows environment to get an alternative point of view but from a little search maybe is not that straight forward.
freebsd on zfs is supported at install time.

running maria does not require more than "pkg add mariadb" or maybe "pkg install mariadb" it is a good fit.

if you only plan on using 2 servers, you need an extra garbd. garbd can run in a vm.
@John, when I first started in the business, MS was a part of our solutions. Then I learned about something called Slackware which took something like 40 disks to install. No such thing as an installer back then. That was my introduction to Linux. I built and ran SLIP/PPP servers and never looked back at Windows so would really be lost if I went there :).

As for did something migrate, only the data, nothing in terms of hardware. After moving the DB to a completely different host and going from Ubuntu to Centos 7.9, I noticed similar behavior from the last install which has gotten me on this quest to solve what ever is happening there. Then this mystery problem came along so sidetracked until that is solved.

@skullnobrains, I do have a FreeBSD bare metal system up and running with mariadb 10.5 right now but it was a temp build, just to play/test. It only has two 1TB drives so could not do RAID 10 for example.

The only systems I can reach from remote that could work have 128GB and 4 x 1TB SAS disks but not SSD.
I could prepare three of them to get a proper cluster going but garbd would save the overkill of a third server. I've not used garbd so a bit nervous to try learning something new while making a production change. Doesn't seem complicated, just a 'fake' node to gain quorum.

If safe to migrate the db over to another proxmox host to be used temporarily, I could make some changes from remote.

Still would like to better understand why I'm seeing the proxmox host using 160GB+ of memory while the vm is showing only 50GB+ memory being used. There is 256GB in that server so even if it saved/dedicated the 128GB I assigned to the bm, why would it be using 160+?.


I do not think there is a true SAS 1TB disk, those are likely Nearline SAS drives, meaning they are sata drives with SAS translation.
one can tell by the rotational speed.
sas commonly 10k or 15k
sata 5400,7200 and use disk cache to compensate.

Your DBs are comparably small, it is not clear why the system is consuming so much memory.
162GB is    used is used in ALL processes.

The KVM process on top is your DB. vm...    - seems close to 128GB,  the overhead there is 131.5 - 128 = 3.5GB...  ~2%.  with 125GB resident in memory. 
(The OS inside the the VM is included in that 128GB )....     (The memory there is allocated by the initialisation of the Guest's OS). 
the disk emulators (cpq...*)  = 1GB each.
The OS of proxmox will take some space as well.

If load is the one on top... that is the amount of active processes... Another consideration: having 44 CPU's also means 44 active elements contending for single resources (table rows).  you may need less cpu's and find that causes less locking issues as the work gets aligned more by queueing.
Yup, nearline, enterprise, 7.2K RPM.

I have a pile of true SAS drives from IBM storage I was using which are 10K or 15K but they are only 600GB.
The DB is small enough right now that I could put 4 of those in the blade and run RAID10.

I'd have to order more SAS SSD's to fill each server with four which means I could not do that today.
This change needs to be done asap.

Since I only have on average 25% disk writes, how important are the drives really?

Thanks for the information about the load/memory noci. Nice to have a little clarity on that.
>Since I only have on average 25% disk writes, how important are the drives really?

Also wanting to try and lower that by fine tuning once all is back to normal.

So really, how important is drive speed?
I have lots of 600GB 10K or 15K SAS drives handy I could install today.
I have plenty of the SAS and SATA 7.2K enterprise drives but only a couple of spare SAS SSD.
Today, I could have two blades with two SAS SSD each with RAID1 in a galera cluster.


The reads are potentially what queues up the disks. faster drives provide both faster reads and faster writed.

The through put might also differ.
i.e. 6GB versus 12GB
Not sure whether NL SAS supports 12GB..

unfortunately, your situation is difficult to attribute the issue to.
>unfortunately, your situation is difficult to attribute the issue to.

Yes, I hope that in the end, it helps in some way those who have been taking part. We all face mystery problems now and then and sometimes, we remember something that was mentioned, that turns into a lead, maybe a solution.

There really is nothing worse than mystery problems where nothing is obvious so having so many offering thoughts is nice to have. I hope I can contribute and help some here at some point as I've faced a fair share of such things, especially lately it seems.

raid1 is merely a case of raid 10 ;). with only 2 drives and with redundant servers, you may even consider raid0 if you need performance.

it might seem weird to many people, but sata drives do not perform significantly worse than sas on many or most db workloads. if the db is smaller than the available ram, there won t be a difference at all unless you write a lot.

actually, even for write intensive workloads, a sata array and a single tiny nvme drive for the zfs intent log or mariadb binlog will outperform a sas array on a raid card and be MUCH more cost effective. and you can use that cost effectiveness to add more drives. 4 sata drives will definitely outperform 2 sas drives even without the extra intent log drive on a db.

if you use platter drives and can throw in a number of them, you may want to use the beginning if each drive. you will find out that platter sata drives are about twice as fast when you read or write the first few gigs than the last ones.
it is unlikely but not entirely impossible that freebsd does not have the required driver. it is not uncommon that you need to kldload the adequate driver. the generic kernel only packs common ones.

look for your card here
https://www.freebsd.org/releases/13.0R/hardware/

can t help you with the proxmox issue

good luck ;)
No problem but thanks for the lead. I accidentally deleted that message mentioning promox.

As an update, I spent the night at the data center and installed new servers that will become a galera cluster.

I wish there was something I could add to this like an ah ha moment that solved the mystery but I don't have any as those who took part know.

It's unfortunate that we were not able to determine if this was because of something messed up in the database structure itself or not but based on the input, it may have been as simple as the host not being able to deliver what was configured for the db vm. Obviously, we will be working on the items mentioned here and in other threads.

Once I've gotten the cluster up and running and see how things go, I'll update this post.

most likely zfs becoming unresponsive occasionally due to poor buffering strategies and inadequate usahe of fuse. this is a known issue on early linux versions.

mariadb would complain about mutex contention because io threads are stuck waiting for one another

but there is no way to be entirely sure

let us know if freebsd and bare metal works better.
BTW: the error 107 ( -107) mentioned before means:
/usr/include/asm-generic/errno.h:#define        ENOTCONN        107     /* Transport endpoint is not connected */

Appearantly  RPC is not ready yet/ or already cleansed..... (RPC is the lower levels of communication that NFS uses to manage items).
the message might be an indication of "race condition" issues around closing sockets..
https://patchwork.kernel.org/project/linux-nfs/patch/20211029200421.65090-3-trondmy@kernel.org/
or packet loss given the lack of robustness of the implementation. do not expect cifs to be any better btw. network filesystems tend to be either robust or performant but not both.
>most likely zfs becoming unresponsive occasionally due to poor buffering strategies and inadequate
>usahe of fuse. this is a known issue on early linux versions.

Add NFS on top of that and maybe tar and the already 'heavy' underlying load and maybe there is an answer. I guess that could be tested in a lab environment if someone had time.


been there, done that both in labs and on production systems. multiple times. but only YOUR settings will produce the exact behavior you experience. systems are rarely alike.

i can confirm for sure that older linux zfs implementations have such issues and they were quite easy to reproduce. i am not sure recent ones are impervious to said issues.

i can also confirm that a regular nfs hard mount ends up frequently in situations similar to yours. if you run a very basic stress test on your servers such as get / in a loop and gently tap the cables, you will probably reproduce the issue. if the rootfs is on nfs or you insist for a while, to the point the machine becomes totally unresponsive and needs to be rebooted.


I won't do persistent NFS mapping anymore to virtual machines after all this.

Since this, I'm using the NAS to centralize updates then send a command to each server to mount and rsync with the updates files. This is a lot lighter overall.

As an update, I have a couple of freenas servers up and running with mariadb 10.5. The last step is to get them into a cluster which means I need to dig into garbd today and either biuld a vm for that or run it on an existing one. Not sure the options yet but will look it up.

What's frustrating is that this whole time, there are constant aborted connections to the vm db which is losing some data. Badly need to get this going today and be done with this mess. Then we can look at all the comments about tuning and see what we can find.
nfs issues are the same on busy bare metal servers. this is espetially not suitable for web servers. you can get some stability with tweaking but avoiding it altogether whenever simple is a good lesson.

beware : garbd generates quite a lot of traffic so make sure you do not saturate the network links. a 3rd server would be better on the long run. you can also run a cluster of 2 nodes if you can either craft smart scripts or accept a manual intervention if a node fails without warning.

you may also consider persistent connections. preferrably over unix sockets. there should be a rather easy way to run fpm in a vm or container on freenas. said arch might actually vastly oupterform your current one using significantly less hardware and will scale up to a dozen nodes in a single location read intensive apps.
I have one bare metal freebsd system up and running as a db server only, nothing else on it.
The DB was moved over to that and is obviously a lot faster than the vm was even with the same resources.
I'm trying to read up on how to monitor things like aborted connections and all those since bsd is different from centos I was used to.

What's interesting is the server has only been running for an hour or so and already being warned of the usual things. Early to tell but that's the first most obvious thing without any time on it yet.

User generated image

SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Yes, I just re-used the my.cnf for now as I was in the rush to get something up and running.

That was the interesting part of my post :). Without any server time yet, it's already complaining about the same things. As I mentioned, we'll tackle the tuning problems later as this question was more about why the server restarted which seems very rare.

The temp replacement DB is the same hardware, just no longer running promox. The blade is dedicated to freebsd and we could tell the responsiveness difference instantly when all went live.

the issues you mention : index scans and open table rates have no relation to my.cnf. they can only be solved with proper indexing and possibly rewriting some queries.

you probably should run an explain on each and every suspect query.

the message regarding the key buffer is normal. you can use the default size or smaller. this is used with myisam tables and not innodb as far as i remember

the speed difference is normal as well even with no optimisation. you removed the virtio overhead, the kvm cpu emulation overhead (note that kvm is slower than qemu-kmod on bsd and both are very significantly slower than bare metal), you now run a production ready zfs version on a system that has a better io subsystem design ...

let's see if it stops. i highly doubt it will unless you toy with zfs settings. but then i seldom used freenas as a db server base

note that directIO on zfs files, raw disks, or a raw zfs pool are faster setups though you probably do not care if you have ram to spare.
Oh, I see.

using the slow query log/non-index joins
one option is to deal with quantity, address/resolve queries used the most; most frequent.
then use queries with large tables
(use explain on each query you choose and save the execution/results so that you can compare to the one after you make a change either adding an index, or two... based on the common query. index based on column1 and column2 unique, or not....) you would be consuming storage space while gaining performance.
Do you have queries that might benefit from using views i.e. a view reflecting a combination of data from several tables, but is limited to the last 30 days.
Might your initial design, did not look far enough dealing with whether the working table now includes so many rows, but the queries only deal with less than 1% of the rows in the table.
i.e. it might be beneficial if the working table is limited to active (not closed issues) type of a setup.

Lets say you reach a point post tunning, what then, or you've already concluded going to galera cluster to introduce redundancy into the prior setup, such that the tunning fixes will be included there?
Percona-tools has interesting scripts that can help prioritizing queries from the slow-query log.
mentioned before pt-query-digest or alternatively: pt-index-usage also use explain to give preprocessed data for the DBA.
>the issues you mention : index scans and open table rates have no relation to my.cnf. they can
>only be solved with proper indexing and possibly rewriting some queries.

No, I'm just saying I re-used my big config (for lots of memory) as a quick way to get the new server up.
When I posted this, it had only been up and hour or a few at most.

>you probably should run an explain on each and every suspect query.

I have to look into all your suggestions in this thread, there were many. Is this something the dev needs to add to the queries for example?

>note that directIO on zfs files, raw disks, or a raw zfs pool are faster setups though you probably do
>not care if you have ram to spare.

Yes, the new servers are running zfs.
This persons notes are interesting. I think he's creating the raw pool that was mentioned.
https://davidpdischer.com/2017/11/06/sysadmin-notes-freebsd-mariadb-10-2-and-galera-cluster/

I just used the mirrored RAID1 zfs option while installing freesdb on two SAS SSD.

Now that it's running well on physical hardware, we'll have to get on discovering what those actual db services/settings problems are.

Everything is much faster now, server load is always low, memory use is next to nothing. Have to let it run for a week or so to build up stats while I work on the new cluster that will replace this server. Exact same hardware, just want to have the redundancy of a second instance.

Oddly, still seeing aborted connections from clients on the network.

2022-05-13 11:22:47 5148694 [Warning] Aborted connection 5148694 to db: 'xxx' user: 'xxx' host: '10.0.0.159' (Got timeout reading communication packets)

Using iperf, I'm seeing around 700Mbps using the 1GB NIC. I'm trying to find drivers for the 10G but no luck.

Bit confusing.

>Might your initial design, did not look far enough dealing with whether the working table now includes
>so many rows, but the queries only deal with less than 1% of the rows in the table. i.e. it might be
>beneficial if the working table is limited to active (not closed issues) type of a setup.

The idea behind the single server was to get away from the vm asap. That took quite a bit of work but it was faster than messing with a new cluster to get everything right to then move prod over. Now prod is fine while we have time to work on a cluster and yes, hopefully, with a more optimized DB.

I think what I need the most is to hire someone that could spend an hour or two with my dev going over what we have to see how it could be improved. It's always possible that his knowledge doesn't include the kind of environment that we have. I really don't know since the conversation always goes the same way. I task him to explain, what he explains makes sense but I think we need an outside view/input to really know what we're not seeing.

>Percona-tools has interesting scripts that can help prioritizing queries from the slow-query log.
>mentioned before pt-query-digest or alternatively: pt-index-usage also use explain to give 
>preprocessed data for the DBA. 

Oh yes, have to get a percona client on this new server.

Hi,

Yes, there were many; personally, migrating to a new setup is the most drastic move, last resort in my opinion.

>I have to look into all your suggestions in this thread, there were many. Is this something the dev needs to add to the queries for example?
NO, explain is a diagnostic tool not to be used as part of the application's interaction.

explain is an informational directive. dev can be provided as noci pointed out with the slow query/non-index joins, etc. and have them run the explain outside of the software/application being used for production.
i.e. they have to use the mysql or mysql admin/mariadb related GUI management workbench tool to run the explain select somecolumns,anothercolumn from mytable
the result is the execution plan i.e. what mariadb in this case does to return a response. it lists information including the size of the table (rows) in a query
 
Depending on what your dev has access to, i.e. can the dev issue a show create tablename
this will return the definition of the table which should include the list of defined indexes.

The issue is not whether it performs now without the issues you encountered before, the point is that you moved to a baremetal system such that as was noted earlier certain overhead has been aliviated. with time the demand on resources would get to a point that the same issue will return.
The point and suggestions deal with address the errors and the slowqueries or you will be at some point, getting yet another server, newer and more powerful and going through this exact same thing.
i.e. setup a new server, transfer the DB. re-point all the apps and once again the performance is improved compared to the latest situation.
the forced server migration outside the "normal" hardware/soft upgrade cycle is always according to Murphy's Law, "If something can go wrong, it will" but likely at the most inconvenient time,
I.e. a few hours before you have to go on a long planned vacation/trip.....

>2022-05-13 11:22:47 5148694 [Warning] Aborted connection 5148694 to db: 'xxx' user: 'xxx' host: '10.0.0.159' (Got timeout reading communication packets)

A slow query taking 30 seconds as an example might exceed the client timeout setting at which point the client disconnects. consider you and I are talking on the phone, at some point the one talking, realizes one has not heard any sound from the other side. stops talking and asks whether the other is still on, then presumes the connection is broken, and hangs up.
This is potentially what is going on, the client connects to the server, issues a join request combining data from three different tables, the query is long 30,40, 50 seconds. the client is configured to wait 20seconds for a response. seeing no data, it drops the connection and either returns an error, or it is persistent, reconnects and once again sends the same request.  (in your case, I think one of the suggestion reflects no query caching) i.e. with query caching, at some point the cache will have the completed data set from a previous run.


IMHO, fixing the queries, adding indexes, etc. will improve the performance of the DB as well as the user experience.
Reference for 10Gbps interfafce cards. (if you need to select one).
https://forums.freebsd.org/threads/10gb-ethernet-nic-pcie-recommendation.76442/

If you use a Blade (c3000/c7000)  also check the broadcom drivers.
https://docs.broadcom.com/doc/957410-PB101

There was a combo that had a limitation on Jumboframes (max 4K in stead of 9K, but that might have been Itanium blades only).
wrt. percona-tools, percona tools not is the same as the percona-client.....  tools are a set of scripts to analyze things.
aborted connections are in most cases timeout mismatch between server and clients. most are harmless though dumb and annoying. using persistent connections and no server side timeouts will solve the non meaningful issues.

developers c"an run explain on whichever queries they issue. they should. if they do not, you need to monitor queries and do that yourself . if you have lots of queries, the accounting database or the slow log can help prioritizing.

there are no reasons not to run extra nodes right now. it is sensible to use slightly different setups on different nodes if you want to tackle subtle performance issues at some point. but as mentionned above, unless you manage to solve the full table scans and similar issues, they will only be solved temporarily and this should definitely be your main focus.

feel free to post query plans for some specific help

explain is by far the most if not the ONLY useful tool to tackle poorly crafted agorithms. the row column will provide a clear view of the current cost. the meta information such as full table scan or index merge helps you understand what is going on under the hood. relying on the sliw query log helps prioritizing but is merely the best way to wait for an explosion while explain will warn you whenever a tiny spark fires.
i often instruct developpers to add comments before each query with the location in the code and information regarded the expected max query time. this is yet another way to locate the most meaningful issues. the comments appear in "show full process list". in some cases, automagically reporting and/or killing longer than expected queries can help significantly
Mike, do you still have the VM version of the DB, do you have a sample VM for the app.

Deals with testing setup of indexes, etc. on this and then using the app.
it will likely only reflect queries that actually run long,  outside the load related hit.
i.e. take advantage of the "test" environment.
Kind of an overwhelming amount of new information.

I've mentioned this a few times but let's make sure we're in sync.
There are two problems.

1: SOLVED: In our use, a vm based db server was not able to keep up. DB service is now a dedicated physical server which has ended the original reason for posting.

2: There is a database problem which needs to be addressed and this is something we will start looking at this coming week.

In the end, there were a lot of what seemed like mysteries but perhaps it was simply the two problems above adding up to services completely bogging down. Everything is working nicely now, much faster, no crashes, all good.

>Mike, do you still have the VM version of the DB, do you have a sample VM for the app.

Yes, I've not deleted that yet and won't until I'm sure there is no reason to keep it. I usually keep such things around for a while in case we forgot something we might want to get back or copy a config instead of fully rebuilding, etc.

If using it for testing, it would only be for the database which is already running on the replacement server, meaning, there would be no load on the db but everything else remains intact.

I'll start digging around the net for information on how to set up this explain stuff. Keep in mind, I'm a general specialist, not a db guy. I'm just doing this because it's my business that hurts if things don't get done and so far, this problem has not been solve so I've jumped into it for myself.

I think I really need to find someone to spend an hour or two with my dev to go over things, make some suggestions, etc. Maybe an online meeting for example with my dev. I've had terrible luck with freelancer sites, so many 'experts' learning at my cost.

As far as getting 10G, the adapters are HP Flexbabric (set to iSCSI), 10GB, 554FLR-SFP.
HP part numbers 634026-001 629140-001.

If I can't find drivers for these, I can pull them out and replace them with any other 10G PCIe card the freebsd is known to support.

SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
@John,

Yes, initially you suggested building a new vm and later suggested I move all to hardware only, no hypervisor/vm in play.

This was definitely the solution. I'm not sure how I'm going to pic a solution to close this because there are so many good comments and ideas on what to look for, what to try etc. Many of those ideas are great troubleshooting steps that could be taken so could be helpful if someone wanted to look at this question and it's comments to write up an article. I wish I had time as it would be an interesting one.

When we all have to deal with problems, we first look for the more obvious things but what to do once those are exhausted and you still have to dig. Yes, the database itself can be improved and it may have been partly the cause along with a vm not being able to keep up but getting to that point without absolutely clear data was quite a task.

Explain as the word means is a prefix to a query the result of which is how the database would process the requet.

Analogy. You tell your employee "do the followings task.," but the employee is taking longer than you think to complete it. You ask the employee:
Explain what goes into completing the task.

The suggestion re VM iDB s to try modifying the table schema to improve such that these slow queries are resolved. Without interfering with the production.

On the VM DB only
I.e. query select columns from table1 where column1,column3

Takes 10 seconds to complete
You add.a non-unique index column1 and or column3 one at a time or at the same time, testing the query
At the conclusion the query completes in 3 seconds.

Idea is to limit the u universe of things being worked on.
The use of an app talking to this VM db is a way to make sure a change in the schema does not impact the insert, updates to trigger errors.

Working slow queries, non-index Joins, etc.

The scope and scale
Keep track of each schema change.

When a query commonly combines a few columns an index based on both columns might help at the expense of storage.
 Whether you update, adjust the production with every confirmed thing, or wait till the end.




Yeap this was my 2nd recommendation ...But since I am completely unaware of Proxmox when I talk VM I am always referring to either Vmware or  Hyper-V... the others...no knowledge...to tell the truth ..Proxmox was probably the 1st time I heard it.
Personally, you took a step, but the issue you raised in your question remains even with the physical server whose performance improved/for now/eliminating the lockup, the errors in the log about lockups. and client disconnects.remain. This suggests this solution is temporary.and potentially fixing the DB by adding indexes where needed may help further. improving both the DB performance and the client experience.
 i.e. instead of 10-15 seconds to get a response, your user might get a response in 2.

Should I open a new question for troubleshooting the db now? Everything is getting mixed up :).

@John, I guess we could test it using esx some day but no idea when I would get around to that as this has caused a lot of work to get backed up.

@arnold,
>the errors in the log about lockups. and client disconnects.remain.

I thought we excluded locks other than the full out crash which of course would lock it up. The 'locks' we were talking about were resources no longer able to keep up likely being because the db was a vm. And yes, the steps are temporary but that was of course the first most critical thing, to get the services back to working right. Now we can breathe and start digging into the other matters.

As for 10G drivers, what's interesting is that proxmox sees them and are what I use with those systems so doesn't that mean that freebsd should have drivers or does it just mean that proxmox provides those?
Mike, On whether to open another question is completely up to you.
My point is, the reason this question was opened remains unresolved.
Considering this way, a question is opened to address an issue, some suggestions try to get to the cause for the issue, a suggestion to reboot the system, is taken. on reboot of the system, the issue is no longer present, until a passage of time, when it returns.

IMHO, the combination of the queries that are slow, non-indexed, etc. culminate in resource exhaustion rendering the system unresponsive.
Potentialy misconfiguration between the client timeouts and the server are not helpful in the above. (one thing you can tell dev to do is possibly add a session directive set session connect_timeout=30
to extend it from the default of 10 and see if it abates the client disconnect errors. This however means, the client may hang around longer consuming the connection resources.





Should have check first, the connect_timeout is a global variable, you can increase it dynamically, set GLOBAL connect_timeout=30;
this will increase the timeout (you can do it on the production system) and see whether the quantity of the client disconnect errors reduce in quantity without causing other issue.

Anything that can be dynamically set, can be dynamically unset
Keep track on what changes you make, so that when needed you can roll it back without the need to restart the services, mariadB.

The idea the more DB stats you have(duration) the better data mysqltunner and phpmyadmin, and other tools have to make suggestions...

toad for mysql is an interesting tool, have used it long time ago,
along with mysql workbench

Ok, fair enough, I have no problem keeping the question open and coming to a conclusion. My only concern is will anyone ever read the entire thread to get to that conclusion? Maybe, if they are experiencing something similar but my question title may not give them what they need to look further. Maybe I need to change that to something like 'is virtualizing the db causing resources to run out?
To answer the last question first, no virtualization of a DB server does not cause the resource exhaustion when configured based on the correct need. memory, storage, etc.

if you look at your current physical server with the DB, it is likely idle 80% of the time. What Virtualization with the correct prioritization tries to achieve is the use of the HW around 80% (saving on costs of HW) while at the same time building on a less expensive setup with redundancy..

You can choose which comment helped you resolve the question to address your immediate need (i.e. move the DB to a physical system) Any other comments that helped, etc.
Then proceed with what you intend to do next week,
If other comments helped in narrowing issues while improving performance, add them as additional solutions as you go along

Post back should a question comes up in furtherance of ....

The point is there is no single solution that resolves an issue such as this, it is up to the individual which steps they start with. You had the resources to move the DB.
Others would have had to address the slow/unindexed joins/.queries based on which are more commonly used to see whether each adjustment improved the situation.
said drivers probably exist in freebsd. approaching ones usually work though this is no exact science. the bsd foundation is known to address such issues and provide decent help.

proxmox is linux based so the drivers afe entirely unrelated.

--

regarding explain, you would simply run "explain select ..." rather than just select. they are rather easy to understand in most cases. the product of the row column is more or less the number of examined rows overall. the req or eq_ref types are the ones you want.

--

virtualising does always slow things down. the debate would be how much.in your case, i di believe the old zfs implementation is the real reason. additionally, you experience udeless double buffering which uses up ram and slows things down significantly.

--

the disconnect messages are quite harmless but for a dedicated db, there is no point in using ephemeral connections. use persistent connections and remove server side timeouts. this will also help making sure a sudden surge won t kill the platform.

--

i would ignore the remaining locks for now. sanitizing the queries is needed first. developpers should never write queries without running adeduate explains first. modern mysql versions have accounting and will allow to prioritize.

you can also use the general query log but beware of the size. i used to log to a fifo which you can read from when needed. mysql does not block when there is no reader so this is rather safe.
I was under the impression that Debian and FreeBSD were just ports of BSD.
Can't look for drivers for ever so I'll just find cards that are supported by FreeBSD if I can find that info and swap out the adapters next time I'm on site.
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
I'll just have to find PCIe cards that are supported by freebsd and swap the ones I have in there now. I've posted a message (somewhere?) asking for a list of supported cards.

Today, we went through this entire thread, making a lot of notes and started digging into things, slowly but surely.
The db dev is going to try some small changes and see how they affect things as I work through variables.

Using mysqltuner and tuning-primer shows everything is pretty good as a starting point. The one thing that caught my eye was raising the join_buffer_size suggestion. Looking around for other folks experiences, I came across this.

(join_buffer_size)
There is no gain from setting the buffer larger than required to hold each matching row, and all joins allocate at least the minimum size, so use caution in setting this variable to a large value globally. It is better to keep the global setting small and change to a larger setting only in sessions that are doing large joins. Memory allocation time can cause substantial performance drops if the global size is larger than needed by most queries that 

I'm not sure this is what it means but it sounds like we could alter some queries to have larger join_buffer_size while keeping the global setting at its default.

For those still following and interested, here is where things are at as of putting a bare metal server up.

 mysqltuner-1-9-8.pl

-------- Storage Engine Statistics -----------------------------------------------------------------
[--] Status: +Aria +CSV +InnoDB +MEMORY +MRG_MyISAM +MyISAM +PERFORMANCE_SCHEMA +SEQUENCE
[--] Data in InnoDB tables: 11.5G (Tables: 109)
[OK] Total fragmented tables: 0

-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 4d 6h 42m 47s (579M q [1K qps], 29M conn, TX: 12505G, RX: 123G)
[--] Reads / Writes: 91% / 9%
[--] Binary logging is disabled
[--] Physical Memory     : 255.9G
[--] Max MySQL memory    : 106.0G
[--] Other process memory: 0B
[--] Total buffers: 96.8G global + 18.9M per thread (500 max threads)
[--] P_S Max memory usage: 0B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 98.9G (38.66% of installed RAM)
[OK] Maximum possible memory usage: 106.0G (41.42% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (736/579M)
[OK] Highest usage of available connections: 23% (118/500)
[OK] Aborted connections: 0.00%  (3/29471697)
[OK] Query cache is disabled by default due to mutex contention on multiprocessor machines.
[OK] Sorts requiring temporary tables: 1% (450K temp sorts / 30M sorts)
[!!] Joins performed without indexes: 77652387
[!!] Temporary tables created on disk: 33% (78M on disk / 234M total)
[OK] Thread cache hit rate: 99% (28K created / 29M connections)
[OK] Table cache hit rate: 99% (575M hits / 575M requests)
[OK] table_definition_cache(4096) is upper than number of tables(302)
[OK] Open file limit used: 0% (38/80K)
[OK] Table locks acquired immediately: 100% (15K immediate / 15K locks)

-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[OK] InnoDB buffer pool / data size: 96.0G/11.5G
[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (1.04166666666667 %): 1.0G * 1/96.0G should be equal to 25%
[--] Number of InnoDB Buffer Pool Chunk : 768 for 1 Buffer Pool Instance(s)
[OK] Innodb_buffer_pool_size aligned with Innodb_buffer_pool_chunk_size & Innodb_buffer_pool_instances
[OK] InnoDB Read buffer efficiency: 100.00% (888622438101 hits/ 888622768310 total)
[OK] InnoDB Write log efficiency: 98.08% (18256605 hits/ 18613479 total)
[OK] InnoDB log waits: 0.00% (0 waits / 356874 writes)

-------- Aria Metrics ------------------------------------------------------------------------------
[--] Aria Storage Engine is enabled.
[OK] Aria pagecache size / total Aria indexes: 128.0M/2.5M
[OK] Aria pagecache hit rate: 95.9% (1B cached / 78M reads)

-------- Recommendations ---------------------------------------------------------------------------
    We will suggest raising the 'join_buffer_size' until JOINs not using indexes are found.
    Temporary table size is already large - reduce result set size
    Reduce your SELECT DISTINCT queries without LIMIT clauses
    Performance schema should be activated for better diagnostics
Variables to adjust:
    join_buffer_size (> 256.0K, or always use indexes with JOINs)
    performance_schema=ON
    innodb_log_file_size should be (=24G) if possible, so InnoDB total log files size equals to 25% of buffer pool size.

tuning-primer.sh
Uptime = 4 days 6 hrs 59 min 32 sec
Avg. qps = 1566
Total Questions = 580782849
Threads Connected = 7

SLOW QUERIES (slow/long logging enabled via CLI for testing)
The slow query log is NOT enabled.
Current long_query_time = 10.000000 sec.
You have 736 out of 580782919 that take longer than 10.000000 sec. to complete
Your long_query_time seems to be fine

WORKER THREADS
Current thread_cache_size = 50
Current threads_cached = 47
Current threads_per_sec = 0
Historic threads_per_sec = 0
Your thread_cache_size is fine

MAX CONNECTIONS
Current max_connections = 500
Current threads_connected = 8
Historic max_used_connections = 118
The number of used connections is 23% of the configured maximum.
Your max_connections variable seems to be fine.

INNODB STATUS
Current InnoDB index space = 3.37 G
Current InnoDB data space = 8.77 G
Current InnoDB buffer pool free = 87 %
Current innodb_buffer_pool_size = 96.00 G
Depending on how much space your innodb indexes take up it may be safe
to increase this value to up to 2 / 3 of total system memory

MEMORY USAGE
Max Memory Ever Allocated : 96.47 G
Configured Max Per-thread Buffers : 1.42 G
Configured Max Global Buffers : 96.14 G
Configured Max Memory Limit : 97.56 G
Physical Memory : 256.00 G
Max memory limit seem to be within acceptable norms

KEY BUFFER
No key reads?!
Seriously look into using some indexes
Current MyISAM index space = 0 bytes
Current key_buffer_size = 128 M
Key cache miss rate is 1 : 0
Key buffer free ratio = 81 %
Your key_buffer_size seems to be fine

QUERY CACHE
Query cache is supported but not enabled
Perhaps you should set the query_cache_size

SORT OPERATIONS
Current sort_buffer_size = 2 M
Current read_rnd_buffer_size = 256 K
Sort buffer seems to be fine

JOINS (digging into this tomorrow)
Current join_buffer_size = 260.00 K
You have had 77878027 queries where a join could not use an index properly

OPEN FILES LIMIT
Current open_files_limit = 80579 files
The open_files_limit should typically be set to at least 2x-3x
that of table_cache if you have heavy MyISAM usage.
Your open_files_limit value seems to be fine

TABLE CACHE
Current table_open_cache = 5000 tables
Current table_definition_cache = 4096 tables
You have a total of 219 tables
You have 847 open tables.
The table_cache value seems to be fine

TEMP TABLES
Current max_heap_table_size = 512 M
Current tmp_table_size = 512 M
Of 235007339 temp tables, 25% were created on disk
Perhaps you should increase your tmp_table_size and/or max_heap_table_size
to reduce the number of disk-based temporary tables
Note! BLOB and TEXT columns are not allow in memory tables.
If you are using these columns raising these values might not impact your
ratio of on disk temp tables.

TABLE SCANS
Current read_buffer_size = 128 K
Current table scan ratio = 16 : 1
read_buffer_size seems to be fine

TABLE LOCKING
Current Lock Wait ratio = 0 : 580786125
Your table locking seems to be fine

Open in new window


Before you go 10G do you have such a demand on the network that you need 10G? Are you monitoring (cacti.net with SNMP polling of network interface ) whether you need such through put?

please update when you look at he 736 queries that take longer than 10 seconds to complete (180 per day)
Then once these are solved, lower the time just above what these 736 takes post adjustment and see if it adds additional slow queries between?
You have had 77878027 queries where a join could not use an index properly
This is another indication indexing is not done correctly.......
btw... if the index is on (pkey1, pkey2, pkey3, fkey1, col1, col2, col3, col4)
Then there should be an index (pkey1, pkey2, pkey3)
And another one on fkey1
Also if there is a join where only pkey2, pkey3 are needed then you  also have a requirement for an index pkey2, pkey3 (being a foreign key 2).
there is no need for an index (pkey1, pkey2)   as the primary index will double for this..., you would need an extra index for (pkey1, pkey3) 
if those fields are used in a join... (but that should already be known by the developer).
i second the above totally : the joins without index are your first and possibly only focus. since you have performance schema, you can quite easily identify said queries.

start with the simple cases using on foreignCol = foreigntbl.col or similar clauses. just create those indexes asap. other cases might be more tricky and need to be taken care of on a case by case basis but you probably have a handful of them. possibly none.

also note that the number if queries is the overall number of executed queries so you probably have a MUCH smaller number of query types. additionally solving one issue often solves multiple ones.
We hit 800Mbps at times to the DB when there is an inrush of traffic. I also prefer to have all of the servers with the same network speeds. Just a personal preference as hardware gets repurposed.

I changed the connect_timeout to 30 and still see aborted connections.

Yes, we'll be looking at those joins without index specifically today.
I raised connect_timeout to 60 from 10 on db server.

Still seeing the "[Warning] Aborted connection" "Got timeout reading communication packets"
I'm monitoring to see if there are fewer first.

We know there are some longer client connections and need to know if the connection is being killed on the db or the client side.

On the clients, most of them are php applications and some are a C application with sql client coded into the program.
What's interested in that I see these aborted connections on the DB log for either one of the client types.

On the php apps, maybe php or the apache service is timing out the client connection which in turn generates the log on the db server?
On the C program side, that one is a bit less clear since there's no apache/php in play.

First test of this is to add timeout of 60 (httpd) on one of the clients to see if that raises the number of aborted connections.
Timeout was commented out so was using default of 300. 
Should not make any difference but these are the kinds of tests I'll be doing to see when I start getting some changes and go from there. I suspect php more than httpd on those clients.

Before I finished updating this, one came in already. Will let it run to see if less come in first.
I have to eliminate apache because one of the servers/clients that is also getting seen as aborted isn't running httpd and is using php to run tasks on the database. That one is getting the most aborted connections so I'll dive into that one.

Also, while running the last change, I got a timeout and I'm directly connected onto the server.

mysql> SET GLOBAL connect_timeout=60;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    32851552
Current database: *** NONE ***

Query OK, 0 rows affected (0.00 sec)


If it's ok, let me share some findings before adding any more comments for a bit.
This way, there will be some things eliminated and others become interesting to look at further or also eliminate.

aborted connections on the db side are most likely merely the client script ending without properly closing the connection. most developpers never call mysql_close or whatever the lib you are using provides. use persistent connections and you should not see them any more. if the db complains, the clients are the ones responsible for closing the connection. and the reverse. obviously network glitches can cause the same issues as well.

EDIT : sorry. cross posted
To the message, when you connect to the maria DB service (this is the server as well).To simplify things (that is how it always starts)
The physical machine is the server that hosts things.
The various applications like httpd, mysqld, mariadb are also considered as servers for the specific functions.
in that sense, while you were local to the physical server, your session was a client to the mariaDB server. to avoid exhausting mysql/mariaDB server connections, there is an idle connection timeout window, which you set in the example to 60 seconds. The Maria DB server accepted your connection, updated it's configuration to wait 60 seconds of inactivity before terminating the connection; which is what it did in the post.

The equivalent, you place a call, but something came up, so you tell the other party give me a minute. Instead of the minute you though it would take, the issue you went to address took longer. when you returned to your phone, the party had already hung up. This is what the prior post and message means.
The extending the window to 60 seconds, what you are looking for now, is the max number of connections being used.
Ref: Skullnobrains' comment if your dev does not properly close the connections. you may run into an issue where all X connections mariadb can have open, are now in use for a longer duration, meaning at somepoint the new connection attempt can be achieved.
 (my suggestion to double the connection_timeout because you have ~750 slow queries that take >10 seconds.

Will await to see what you've discovered on the other matter.
Thanks for the info and yes, true, it's a client connection even if I'm on the server itself.
As mentioned, I do too many things and no expert at any one of them which means terminology might not be the greatest.

From this point on, I'll just keep re-editing this comment and post a new one when I'm done otherwise, we'll end up with 50 more short comments.

While our db dev works on the logging/monitoring the index issue, I'm working on the network/connections between servers side. I'd like to get rid of as many of those errors that I can so that we can see real problems when they show up. I'm curious to know why those aborted connections are there to begin with, even if they don't 'seem' to be a problem.

Ed (the dev) found one thing. Each client connection was asking for a schema update on every table. This is because a cache that is normally used was disabled while testing months back. This is now re-enabled which has already reduced some usage. This will be noted and applied to all client servers.

I see a possible correlation between a heavy process and aborted connections mentioned for this client in the db logs but it's not confirmed yet.
This very long process goes through thousands of records repeatedly, 24/7 looking at a start and end time to change a status value. There is a 2 second delay between each run. The process takes only a couple of seconds to our eyes but there may be more happening in the background.
Quite often, as this process is running, I'm seeing aborted connection and while watching the log, it seems as if the process aborted the task.
I doubt sockets are running out but perhaps something else like threads due to the number of clients connecting at the same time.
Total buffers: 96.8G global + 18.9M per thread (500 max threads)

Open in new window

Maybe I have to do something with the configuration to get more threads, like checking the thread pool etc.
18M per thread? I'm only guessing that we're exceeding 500 threads? Have to look into that. I think it was mentioned in this question somewhere too.

Does looking at how many left over connections are on the DB help?
# netstat | grep -c TIME_WAIT
3421

# netstat | grep -c ESTABLISHED
16
Run again and again... results not too interested or concerning...
28,50,13,30,15,49,13,11,48,13,15,14

Ok, think I have to discount threads as something to keep digging into. Just seems like a LOT of memory per thread, 18M.

mysql> show status where variable_name = 'threads_connected';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_connected | 8     |
+-------------------+-------+
1 row in set (0.00 sec)



Taking a break so ending the above comment for now. Feel like I'm sort of on the right track and on the wrong one as well. Definitely it will help to work out the index problems and maybe that will solve most of the problems but to me, it does seem like Aborted connections are important enough to chase down.
Forget the TIME_WAIT....  (those are closed & done with EXCEPT for the IP -stack.  
Threads will be mostly the same as connections.
You may not want to disconnect between repeated runs as that also cleans out buffers & intermediate results that are available in the ssessions.
So keeping the connection alive probably saves on resources on the repeated tasks.
Ensure there is an index on the columns & joins used in the search for that task.
Ok, I'll see if we can change that long process to a single session, not disconnecting repeatedly.
I make the assumption that there is a way to do this on a per session basis. I'll talk with Ed when he's back and we'll try that. It should help at least with the aborted connections.

The rest will take a little longer to get through.


On a side note, my question about looking to find which adapters would work on freebsd;

# man -k 10gb
bxe, if_bxe(4) - QLogic NetXtreme II Ethernet 10Gb PCIe adapter driver
cxgbe, cc, cxl, if_cc, if_cxgbe, if_cxl, if_vcc, if_vcxgbe, if_vcxl, vcc, vcxgbe, vcxl(4) - Chelsio T4-, T5-, and T6-based 100Gb, 40Gb, 25Gb, 10Gb, and 1Gb Ethernet adapter driver
cxgbev, ccv, cxlv, if_ccv, if_cxgbev, if_cxlv(4) - Chelsio T4-, T5-, and T6-based 100Gb, 40Gb, 25Gb, 10Gb, and 1Gb Ethernet VF driver
ixgbe, if_ix, if_ixgbe, ix(4) - Intel(R) 10Gb Ethernet driver for the FreeBSD operating system
liquidio(4) - Cavium 10Gb/25Gb Ethernet driver for the FreeBSD operating system
mlx4en(4) - Mellanox ConnectX-3 10GbE/40GbE network adapter driver
mlx4ib(4) - Mellanox ConnectX-3 10GbE/40GbE network adapter driver
mlx5en, mce(4) - Mellanox ConnectX-4, ConnectX-4 LX and ConnectX-5 based 100Gb, 50Gb, 40Gb, 25Gb and 10Gb Ethernet adapter driver
mlx5ib(4) - Mellanox ConnectX-4 and ConnectX-4 LX based 100Gb, 50Gb, 40Gb, 25Gb and 10Gb network adapter driver
oce(4) - Device driver for Emulex OneConnect 10Gb network adapters
sfxge, if_sfxge(4) - Solarflare 10Gb Ethernet adapter driver
sume, if_sume(4) - NetFPGA SUME 4x10Gb Ethernet driver


To test the hypothesis on the client connection,
see if your Dev can create one page that generates one of the slow queries.

Include error detection reporting within the page, and fire one of these queries. see whether the client gets the data or errors out.

What 10G is currently in the system?
does lspci work on your physical installed OS?
what does it report?

there have been so many different tools
that I am not sure which to suggest for you check what the current network status your system reflects.

ethtool, mii-tool, etc.
does the feed to which this system connect show that it is not connected at 10G?
They are Emulex, dual mode, FC/Eth.

Equivalent to lspci is 'pciconf -lv'
The os does see the darn things.

none2@pci0:7:0:0:       class=0x020000 rev=0x01 hdr=0x00 vendor=0x19a2 device=0x0710 subvendor=0x103c subdevice=0x3376
    vendor     = 'Emulex Corporation'
    device     = 'OneConnect 10Gb NIC (be3)'
    class      = network
    subclass   = ethernet
none3@pci0:7:0:1:       class=0x020000 rev=0x01 hdr=0x00 vendor=0x19a2 device=0x0710 subvendor=0x103c subdevice=0x3376
    vendor     = 'Emulex Corporation'
    device     = 'OneConnect 10Gb NIC (be3)'
    class      = network
    subclass   = ethernet

Sorry, yes, the link is up at 10G.
And yes, we're already working on the index stuff. I'll update once I have more info.

Done.
It was as simple as adding this and now have 10GbE.
loader.conf
      if_oce_load="YES"

Open in new window


also forget about raising the 500 threads limit unless you actually have that many connected clients. the message is not telling you the 500 threads are used. it rather warns you about the maximum memory that can be used if 500 clients do connect at once.

each connected client consumes memory due to a bunch of buffer allocation when the client connects. there is a documented formula. in your situation, better use a reasonable number of well managed and seldom changing long term connections than reauthenticate and reallocate repetitively. you will end up with probably 50-100 stable connections. harmonize the max number of clients and fpm threads with what seems reasonable.

tweaking mysql config options such as number of writers or key buffer sizes may be useful at some point. but that is AFTER you sanitize the queries. or you may end up tweaking stuff counter productively. for ex waste memory on unreasonable key buffers to cope with bad queries that won t exist once that job is done.
Understood.
The db dev found some things that we're not so obvious and has made some changes. The changes are already showing better numbers which I'll share once we have more information. It means we're in the right direction now.

We no longer see joins without indexes at all.

Set as follows for testing.
log_queries_not_using_indexes  = ON
long_query_time = 999.000000
slow_query_log  = ON
log_slow_verbosity = query_plan,explain

The goal first is to isolate the queries that are not using indexes.
The tables that are showing up in the slow log file, do have indexes.
They also have very low query times like this one;
# Query_time: 0.003953  Lock_time: 0.000043  Rows_sent: 0  Rows_examined: 3749

How can I read/understand these results and what is it that I should fix since there doesn't seem to be a problem.


probably nothing besides a slow query log being a little trigger happy. i suggest you setup a more reasonable value such as 2s for the slow log.

do you still have frequent deadlocks ?

regarding further debug, you can use the accounting db and/or explains : whatever is not ref or eq_ref in explain might be worth working on or legit depending on the cases.

i suggest you focus on full table scans and index scans, solve what can be solved and then move on to more pressing matters.
btw, rows examined is the number of rows scanned. 3k is not much but will be worth an index if you read a lot and write seldom or if you plan on having a bigger table at some point. we d need to see the query to tell you more.

0 returned rows means the query is either not a select query or a select that returns nothing.
Yes, it's set to 2s now. We just wanted to start really high as explained.
No lockups at all since moving to hardware.

Continuing with testing.
My guess is that it is likely the slow query log are reflecting large un-indexed joins.



Mike, you might want to open a separate question dealing with what  information is needed from the dev about the application/db in use. Generally what the application/db is about. Does it have open/close type of events, if it does, what happens to events that have been closed, etc.


IMHO, the slow query log from before should be used to improve the design and architecture of the DB.
at some point, the amount of data in the DB would reach the same threshold where the physical server will run into the same resource consumption issue.


Ed the dev has his hand in a cast which is why I'm writing this otherwise, I'll often just take his notes and format them into a question. I'm sure he'll be able to ask more questions if needed once his hand heals.

Yes, this was mentioned earlier, that if we don't actually fix or optimize, that we'll basically be using hardware to keep up with the problem rather than getting better performance :).

Totally agree on that and is what we want to achieve.


feel free to post suspect queries and related explains. if we help with a handful of explains, you will probably gain enough knowledge to handle the rest.

one worthy optimisation would be the number of xtradb read and write threads which would reasonably match the number of disk pairs in a raid10 array but do not expect a huge performance boost.

hardware optimisations if needed later on would be to throw in a couple of small slc nvme drives, partition them 1/4 raid1 for the zfs intent log, and the rest in raid0 for thr zfs ARC read cache. that will improve performance by orders of magnitude if needed. additionally, the lifetime of the other drives will improove significantly. likely double. sticking binlog and the likes on different drives is imho useless on zfs.
Sure, like the SLOG in my TrueNAS system :).
I would have to rebuild the two servers I already dedicated to the cluster that's coming however.

BTW, there are zero xtradb read/write according to dev.
BTW, there are zero xtradb read/write according to dev.
And, there are zero un-indexed joins.
The only problem is with four tables that are generating slow queries (over 2 seconds) on small tables.
The slow query log file is only about 470k after running for several hours.
log size does not really help, I would need to know how big is the query?
though narrowing to a 2 second before addressing the 476 (I think) that exceeded the 10 seconds....
one table might be 10 rows, but the query might be aggregating much more data from multiple larger tables...

It seems, you've got Murphy's Law come and visit.
Dev injured/broke hand. and performance issues hit ...
Number of rows is part of the issue. If there are fields  like blobs.  of 1GB each then the IO load is significant.
There indexing will also only marginally help.
The query(s) need to be analyzed.    Percona Tools  provides scripts  (not the agent or perfomance monitor) that help in analyzing.
No blobs in use.
No aggregations since no select with joins in the slow log.

can t you post the query and the output of the explain ? anyone in this thread will be able to advise.

missing indexes do not only apply to joins and properly indexed cols cannot be used if the query is poorly written

there are also perfectly legit queries that have an actual reason to scan many rows

Probably "irrelevant" but since we passed 150 comments on this question I guess that all "ideas" are on the table
You mentioned that you have 4 small tables with very slow querying...why not bite the bullet and test on it on other database engines.
Pick a decent workstation and migrate these table to other database engines e.g. mySQL (not MariaDB), MsSQL,Oracle ...you can find free converters (or with some little extra money for more professional results).
So 4 tables...just test them against the other big guns...maybe the results will be different or you will get some other information that might lead you to the cause of your problem...e.g. while a longshot there might be something in the engine of MariaDB that is "blocking" your performance...not a bug...just a different way of handling of data.
not a chance with oracle mysql. mariadb and mysql are essentially one and the same and mysql is lagging behind mariadb and porting mariadb features since then. same applies to innodb vs xtradb.

the required information is in the "explain".

it is highly unlikely that simple queries perform differently in sgbds with similar construction. the differences can mostly be found in the query parser/optimizer and applies to complex queries involving multiple tables.

besides that you might expect differences with significantly different software with sharding, column storage, in ram indexes, nosql...
While they are essentially the same there is a slim chance that each engine might reveal a different piece of info ..that will help resolve the issue .
If you want to test a different DB try postgresql... again that is another learning curve added. (different tuning, management...).
For now more important is to use the tools you known, trust and are familiar with.   Later you can experiment & test.

note that unless all your queries are written with plain ansi sql-92, you will probably need to rewrite a significant portion of them if you want to use other engines. and that is assuming your developpers do not rely on some unique feature. additionnally moving data from one db to another is far from trivial. that is going to be looooong and probably produce nil.

that said

postgres has rather advanced indexing policies allowing to index up to certain length, ignore some rows in some cases, index calculated fields and is much faster than most dbs when it comes to index merge (which should not occur on most properly crafted apps)... beyond that i personally dislike postgres

sql server has read committed snapshotted or whatever they call that isolation level that allows to process multiple dirty analytical queries such as the ones produced by business object in parallel without choking

oracle has a few unique features such as native xpath querying and a decent throughput

there are pros and cons everywhere but mysql can definitely hold for itself, totally qualifies as one of the big guns, and has its own set of unique features as well.

i honestly do not believe you should take that road...

what about the explains on those 4 tables ?
Definitely not going to be changing database from mariadb, at least, not now. It is something that's come up now and then, wondering if we might gain better performance but not now.

Sorry I've not updated, just letting it run for a while to see how things go. I find it's better to make changes, wait a day or two then see how that change affected things. Too many changes at once and it's not obvious what's going on usually.

>can t you post the query and the output of the explain ? anyone in this thread will be able to advise.

I had to spend the rest of the week catching up on things that fell behind due to this problem so my dev has been working on this part. I'll ask him to share some of these things.

I'll give a full update next week since the dev will be gone until Monday which will give things a good chance to run. Everything is stable however and has been since moving to bare metal so that change alone made a world of difference.

Here are a few results that may shed some light on how things are now. Keep in mind that the settings in the cnf are long term while the testing is all being done through the CLI.

mysqltuner-1-9-8

[--] Up for: 7d 7h 25m 50s (926M q [1K qps], 52M conn, TX: 21570G, RX: 182G)
[--] Reads / Writes: 91% / 9%
[--] Binary logging is disabled
[--] Physical Memory     : 255.9G
[--] Max MySQL memory    : 115.2G
[--] Other process memory: 0B
[--] Total buffers: 96.8G global + 18.9M per thread (1000 max threads)
[--] P_S Max memory usage: 0B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 99.1G (38.74% of installed RAM)
[OK] Maximum possible memory usage: 115.2G (45.03% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (339K/926M)
[OK] Highest usage of available connections: 12% (129/1000)
[OK] Aborted connections: 0.00%  (4/52855737)
[OK] Query cache is disabled by default due to mutex contention on multiprocessor machines.
[OK] Sorts requiring temporary tables: 1% (765K temp sorts / 53M sorts)
[!!] Joins performed without indexes: 89649292
[!!] Temporary tables created on disk: 33% (91M on disk / 270M total)
[OK] Thread cache hit rate: 99% (46K created / 52M connections)
[OK] Table cache hit rate: 99% (867M hits / 867M requests)
[OK] table_definition_cache(4096) is upper than number of tables(305)
[OK] Open file limit used: 0% (59/80K)
[OK] Table locks acquired immediately: 100% (16K immediate / 16K locks)

-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[OK] InnoDB buffer pool / data size: 96.0G/12.4G
[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (1.04166666666667 %): 1.0G * 1/96.0G should be equal to 25%
[--] Number of InnoDB Buffer Pool Chunk : 768 for 1 Buffer Pool Instance(s)
[OK] Innodb_buffer_pool_size aligned with Innodb_buffer_pool_chunk_size & Innodb_buffer_pool_instances
[OK] InnoDB Read buffer efficiency: 100.00% (1526151447460 hits/ 1526151777669 total)
[OK] InnoDB Write log efficiency: 98.26% (34430040 hits/ 35041321 total)
[OK] InnoDB log waits: 0.00% (0 waits / 611281 writes)

Variables to adjust:
    join_buffer_size (> 256.0K, or always use indexes with JOINs) (it was set to 512 the other day)
    performance_schema=ON


tuning-primer.sh
Uptime = 7 days 7 hrs 36 min 2 sec
Avg. qps = 1467
Total Questions = 927673293
Threads Connected = 13

SLOW QUERIES
Current long_query_time = 2.000000 sec.
You have 339769 out of 927673547 that take longer than 2.000000 sec. to complete
Your long_query_time seems to be fine

WORKER THREADS
Current thread_cache_size = 50
Current threads_cached = 44
Current threads_per_sec = 0
Historic threads_per_sec = 0
Your thread_cache_size is fine

MAX CONNECTIONS
Current max_connections = 1000
Current threads_connected = 11
Historic max_used_connections = 129
The number of used connections is 12% of the configured maximum.
Your max_connections variable seems to be fine.

INNODB STATUS
Current InnoDB index space = 3.43 G
Current InnoDB data space = 9.00 G
Current InnoDB buffer pool free = 86 %
Current innodb_buffer_pool_size = 96.00 G
Depending on how much space your innodb indexes take up it may be safe
to increase this value to up to 2 / 3 of total system memory

MEMORY USAGE
Max Memory Ever Allocated : 96.50 G
Configured Max Per-thread Buffers : 2.84 G
Configured Max Global Buffers : 96.14 G
Configured Max Memory Limit : 98.98 G
Physical Memory : 256.00 G
Max memory limit seem to be within acceptable norms

KEY BUFFER
Current MyISAM index space = 0 bytes
Current key_buffer_size = 128 M
Key cache miss rate is 1 : 0
Key buffer free ratio = 81 %
Your key_buffer_size seems to be fine

QUERY CACHE
Query cache is supported but not enabled
Perhaps you should set the query_cache_size

SORT OPERATIONS
Current sort_buffer_size = 2 M
Current read_rnd_buffer_size = 256 K
Sort buffer seems to be fine

JOINS
Current join_buffer_size = 260.00 K
You have had 89649292 queries where a join could not use an index properly

OPEN FILES LIMIT
Current open_files_limit = 80579 files
The open_files_limit should typically be set to at least 2x-3x
that of table_cache if you have heavy MyISAM usage.
Your open_files_limit value seems to be fine

TABLE CACHE
Current table_open_cache = 5000 tables
Current table_definition_cache = 4096 tables
You have a total of 222 tables
You have 866 open tables.
The table_cache value seems to be fine

TEMP TABLES
Current max_heap_table_size = 512 M
Current tmp_table_size = 512 M
Of 270641808 temp tables, 25% were created on disk

TABLE SCANS
Current read_buffer_size = 128 K
Current table scan ratio = 13 : 1
read_buffer_size seems to be fine

TABLE LOCKING
Current Lock Wait ratio = 0 : 927676855
Your table locking seems to be fine


Open in new window

So as we can see, things have gotten better and we'll make some additional changes next week.
I'll take notes with the dev and share what we've done to get to where we're at and what we'll do to improve as well.

I like the direction this is going into because it will allow me to adjust some over allocated resources and put them into other areas that could better use them.

There is one thing I'm wondering about.
Query cache.

I've read that query cache is being deprecated over time but I've continued seeing it for years now. Query cache is something we could take advantage of but is it worth using anymore?

BTW, anyone know the equivalent of mysqldumpslow for freebsd?


it should be the same thing, the issue the package you installed might have its own path, not included in your search path
echo $PATH

ps -ef | grep mysql

see whether you see the path from which it is running
possibly the
not sure which package you installed, but if it is based on /opt or /usr/local
and they are not reflected in the PATH above,
export PATH=$PATH:/usr/local/bin:/opt/bin:/opt/mariadb/bin
might be what you need.

pkg --list | egrep -i maria
pkg info --list <mariadb client> | egrep -i mysqldumpslow
failing that
pkg info --list <mariadb server> | egrep -i mysqldumpslow

the command should be included.
it is good policy to make one change at a time and see how things go. +1

the main way to reallocate resources is probably lowering that crazily high max connection value and use a bigger buffer pool. but you have a rather sane setup which can cope with your data easily.

mysqldump should be included as mentionned above but that is imho a rather dumb way to backup a db anyway. you also can use innodbhotbackup or xtrabackup which should be provided by percona.

better ways to backup databases include snapshotting a dedicated node (stop the node, take a snapshot, start the node) but even that is quite useless as mysql features ways to restore from any position that is still in the binlog using the mysqlbinlog command. on a cluster setup, frequent backuping is quite useless.

i would suggest you setup a regular replication to a distant node and run backups/snapshots there. that will allow disaster recovery plans for situations where the whole datacenter was burnt to the ground. any other situation should be properly handled by regular cluster operations. once the setup is correct, this is very reliable.

EDIT s/burst/burnt/
Skullnobrains, initially thought he was looking at a backup, but he is looking for the slow log parser tool, myslqdumpslow
this script should be part of the server version.

It is a perlscript, you might be able to take the one you have on the one system and use it on the other.

The log is plain text.
try head -4 /var/run/mysqld/mysql_slow_query.log
as an example and see what is there.
oups, my bad.
anyway, given the fact the author has performance schema, i see little point to bother with the slow log too much.
>mysqldumpslow

Maybe I typo'd somewhere but yes, mysqldumpslow.

This program exists for centos for example. It simply goes through the raw slow query logs and provides a nice output with a bunch of options.

You can copy it as it is a perk script
file /usr/bin/mysqldumpslow

In centos it is packaged with the server components

Perhaps in freebsd it is bundled with mariadb utils as an example.
Etc.

Try scp mysqldumpslow from centos to the freebsd and see
I doubt it but let's try. Copied from centos to freebsd.

# ./mysqldumpslow
-bash: ./mysqldumpslow: /usr/bin/perl: bad interpreter: No such file or directory

Not sure I want to go down this path of messing with the server just to try and get this to work.
I don't much need it anyhow, we don't have many long queries anymore. As mentioned, I'll share our findings next week when the dev is back.

you are misreading the error.

the location of perl is at issue

what happens when you run perl mysqldumpslow?
where is perl on freebsd, it might not be in /usr/bin and this is what the message says.


in script
the first line in the format
#!/path/to/where/the/application/can/be/found/perl

Since you lowered the slow query treshold from 10 to 2 you've been inundated with queries being added.

run the mysqldumpslow on the vm and see what it reports. then see whether you want to apply what you learn there on the production side.
Just grepping the output of the slow query log is showing long ones are under three seconds.
Quite a change from what we were seeing before.

Next week, we should have zero queries near 2 seconds.
Yup, that worked. Good to know if we ever need to use mysqldump.
I'm not following arnold. We already found all of the problems and just letting things run until next week.
Next week, we'll make a few more changes and that should be the end of long queries which is what I'll sum up for this question.

in freebsd, perl and such programs would be located in /usr/local/bin rather than /usr/bin as perl is not part of the base system. but this is mostly off topic and anyway perl should be in the path.

compared to linux there is quite a difference : every single thing you install ends up somewhere in /usr/local while the base system commands are not found in regular packages. it is good practice to stick that dir in a separate mount.
Yes, it's a bit of a change using freebsd. I've installed it before to run something here and there but never used it in production. So far, I kinda like it. I was wondering what we would be moving to from Centos at some point and I might have found it.


got to love the unknown giant ;) they made a few debateable choices lately but still provide one of the most robust stack ever. and scripts i wrote over 15 years ago while we were between bsd 4 and 6 still work in production verbatim. i have nothing vaguely close in the linux world. stable versions of bsd which are merely their test versions tend to be superior to LTS linux releases not to mention the crazy number of things that just work better such as the tcp stack or routing layer. welcome to a better world !
You solvedi for the mmidiate situation, the stats reported by nysql-tunner suggest additional adjustments can significantly improve performance.

Post move to physical, you lowered the threshold from 10 seconds to 2 .

I usually  start addressing the slower, longer running queries then lower from 10 to 5 solve those if possible, then go to 2.

Hopefully your dev is on the mend. And will approach to further improve perfromance, ,




Sorry for no update, just don't have any yet, still working on the DB and even decided to restructure some tables while we're at it. That of course is leading to a bunch of development work but meant to optimize DB usage.

I'll try to sum it all up with the dev once we're done.

In the meantime, all is working perfectly, the only thing that caught my eye a few times but never really confirmed they were related is this. The server had been sitting at around 10GB memory usage since it settled into its job.

I accidentally used mysqldump which locked some tables up (of course) which in turn caused memory usage to jump to 26GB. That was days ago but the server has never released the memory. 

Since I was doing something related to the DB, then I suppose it's mariadb that took and is holding that memory and not the OS but why would it not release it?
the memory used by connections is released as the users disconnect (except if you have a server side pool)
the memory used by the innodb_buffer_pool is never released. this is normal operation.

i fail to see the reason why locking tables would cause the memory to jump.
on the other hand, dumping the whole db caches the whole db if there is enough available ram space.
Locking the tables won't cause the memory to jump, i guess  it got ALL the data into memory due to the backup.
So probably your regular use needs 10GB,, where the temporary dump gets all the data.
And why should a database invest in removing data fetched into buffers.  Access from memory always is faster then getting it from disk.
(If you look at data IO stats then you may notice the disk probably isn;t read very often, and the database is mostly written to.)

>the memory used by the innodb_buffer_pool is never released. this is normal operation.

That's good to know, thank you.

>i fail to see the reason why locking tables would cause the memory to jump. on the other hand,
>dumping the whole db caches the whole db if there is enough available ram space.

I don't know either, only that I watched the memory stay around the 10GB mark from the time the server was started after it settled in then when I ran the backup, it jumped and now it seems to be around the 26GB mark since.

I haven't had the chance to set anything up for monitoring other than the two scripts I use and phpmyadmin. We've ended up re-designing a number of things which will help.

This is where we're at however.

# Query_time: 10.600640  Lock_time: 0.000077  Rows_sent: 1  Rows_examined: 2613263

The majority of queries are now on average a bit over 2 seconds.

# Query_time: 2.071785  Lock_time: 0.000075  Rows_sent: 0  Rows_examined: 18446744073709551610
# Query_time: 2.216178  Lock_time: 0.003760  Rows_sent: 0  Rows_examined: 18446744073709551612
# Query_time: 2.237429  Lock_time: 0.000074  Rows_sent: 1  Rows_examined: 7384670

# Query_time: 3.015370  Lock_time: 0.000125  Rows_sent: 0  Rows_examined: 18446744073709551612
# Query_time: 3.419093  Lock_time: 1.299152  Rows_sent: 0  Rows_examined: 0
# Query_time: 3.422189  Lock_time: 0.000084  Rows_sent: 0  Rows_examined: 18446744073709551612
# Query_time: 4.092854  Lock_time: 1.702691  Rows_sent: 0  Rows_examined: 18446744073709551610
# Query_time: 5.951408  Lock_time: 2.350782  Rows_sent: 0  Rows_examined: 18446744073709551612
# Query_time: 6.708663  Lock_time: 2.716330  Rows_sent: 0  Rows_examined: 18446744073709551612

# Query_time: 8.812781  Lock_time: 0.000079  Rows_sent: 1  Rows_examined: 2178772

Some long running processes are taking quite a while but we know what they are and those are the ones we're working on to change. 

# Query_time: 20.381255  Lock_time: 0.000092  Rows_sent: 0  Rows_examined: 18446744073709551606
# Query_time: 20.431876  Lock_time: 0.000102  Rows_sent: 0  Rows_examined: 18446744073709551606

This is the worst one since it locked up a table I guess for some 20 seconds.

# Query_time: 22.761607  Lock_time: 20.353293  Rows_sent: 0  Rows_examined: 18446744073709551612

This is the highest one left so far.

# Query_time: 23.157190  Lock_time: 2.700705  Rows_sent: 0  Rows_examined: 18446744073709551606

Open in new window


Again, we know about the long running ones and we're working on ideas and ways to improve them. I'm glad I posted this and that we got so much input because as it was mentioned in this question, my only fix would have been throwing more and more hardware at it until hardware didn't work anymore.

Having this conversation has lead to improving the DB and learning a lot along the way.

If anyone is interested, I'm happy to share what the stats look like lately. Still working on the db structure/method changes however.
The stats at this point are not as relevant as the long running queries review a huge amount of rows.
how many tables are involved in those queries.
if memory serves the DB is fairly small, so to get that many rows, suggests these are outer left/right type of queries that increase the total number of rows on which a where clause is acting.
Many of your slower running queries are scanning through about the same number of rows. are they assembling about the same data-set? are they using the same tables?
There potentially other options i.e. using triggers that populate and maintain another table that contains this data. are views suitable? how do you get to 18446744073709551xxx number of rows the variance is insignificant at these numbers where xxx is 606-612

my best guess would be left joins in places where inner joins would do and possibly where clauses that trigger full table scans.

share the expain if you are unsure. this should be quite obvious. that many rows call for quick fixes
This is  Ed, still "on the mend" but definitely better than 3 weeks ago.

I will try to clarify as much as possible where we are...

The first goal was to understand why we were getting so many joins without indexes and after digging a bit I found that some months ago a decision was made to avoid every kind of cache mechanism and among them there was a very needed one that was in charge of checking against the information_schema any change to the tables'structure.
This schema cache removal was causing an overload of requests to the DB server since for every single statement sent by the app there was a request of checking possible changes to the schema itself.
As you can imagine the simple change of enabling the schema cache made the whole environment more stable immediately.

The second goal was to identify all of the possible "slow queries" which mainly were related to 5 different tables.
In the last 5 days I refactored all the code related to 4 of these 5 tables and as of today only one was left behind because, the module that uses it, has to be completely re-written (next week we will be focusing on the logic).
This last table is the one, referred in latest Mike's comment, that shows an enormous number of rows examined (18446744073709551612): apart from the need to review completely that module it is mainly and UPDATE statement with JOIN (totally not useful and dangerous!)

When this last table will "disappear" from the slow query log then I think we will share again the PHPMYADMIN stats plus the results of some script that Mike was referring to.

The only question mark I still have is related to some aborted connections (cannot find any clue in the DB/app so far) and some casual messages at the PHP level saying the DB server has gone away but I bet they are related to the procedure we are following now to update the application in its different modules...
...dreaming of a near future with BEAM the Erlang virtual machine!!!




The connection ending is either a handler on the local client side that has an expectation of the connection to remain present even if idle longer than the setting on the server when it terminates an idle connection.
the mechanism within the PHP code, you have a a handler that checks whether a connection is present and if it does, proceeds with the db activity, if the connection is not present, it reinitiates the connection?
when the PHP code ends, does the connection gets closed, disconnected?
schema cache : i would suggest you update the schema cache whenever and only when you get a query error. and rerun the query. it is sane practice to have a query wrapper that does that.

disconnect messages are again quite harmless.

on the server side, they indicate a client script terminated without closing the connection nicely. even with persistent connections, process rotation will produce a few of them.

on the client side, they indicate the server decided to timeout or a network glitch. just make sure you run mysqlping when a query fails and rerun failed queries. you should monitor the frequency of that event, though.

kudos for the analysis work you did
There can be a configuration setting in the DB server configuration that limits the number of queries in a session? That might be a reason for the DB-server to close a session.

Ed: @noci this could make sense but I am not sure which variable (or combination of variables) in MariaDB

i do not believe such a setting exists and it would not make much sense for a db imho. there surely is a timeout to inactive sessions. most of the time, it is actually the tcp session that times out which both the server and client woukd see as a lost connection. there should be a setting to enable tcp keep alives on the server and most clients. but as long as the numbers are reasonable and there is no client impact, who cares
On those aborted connections, since making the changes in the db structure, so far, they seem to have slowed to next to none but we continue to monitor.

Two interesting points.

1: The ones we still see regularly are from a C program that communicates with the db server. That one is still being logged by the db server as aborted over and over again. Might have to dig into the C code.

2: I'm working on a galera cluster so am testing by using it with a light application that is updating a one table database.

[Warning] Aborted connection 66 to db: 'tests' user: 'test' host: 'localhost' (Got timeout reading communication packets)

Makes no sense. The cluster is doing nothing. This app is super light and there is no other traffic to the cluster.
The connection events are as noted before, either a mismatch in the idle timeout between the client/server.
Other and more likely, the client does not properly close the connection, does not gracefully disconnect.
Does the app have its own log?

test the event.
open a connection and put the session to sleep for 10 seconds (I think this was the timeout).

then on wake try to query the connection.

the other test open a connection send a query while setting an alarm for 1 second, that terminates the process.

see which event you get and how they are recorded.
the max timeout in mysql is a year. 10 seconds makes little sense in your case. you can use the highest possible setting once you check low settings also work which ensures your clients can reconnect if they need to.

the c program either terminates or calls fclose/shudown without calling mysql_close beforehand. not much of an issue.

it is perfectly normal that connections time out if you set them to whether the cluster is busy or not.

on the other hand this message occurs when you try to connect and probably indicates a network issue.

just in case, you cannot run connection against garbd
Hi,

Hoping we can update once and for all this coming week.

The load on the DB has been well under 2.00 constantly and it's still using 26GB, no change.
The longest running queries are 4 seconds now and we know what those are and will try to find a way to improve.
Ed will post a little more detail on how things have gotten this coming week.

Arnold, this new program is a bash script reading some data then updating a table on the new cluster. Mainly just a testing. It's only running every 10 seconds and there is nothing else happening on the server or the cluster.
There is no web service or php in the mix either. In fact, the script is only connecting to one of the servers, it's not even going through proxysql for example.

The client and the server are on the same LAN and switch which makes it hard to think it's a network issue since it was happening on the previous network too.

>just in case, you cannot run connection against garbd

If you mean because there's no mysql service on it, yes :). If you mean something else, what'cha mean?

garbd : yes, i just meant there is no mysql service. for some reason i was worried this was not clear.

if you do setup a galera cluster, i strongly advice you do not use a proxy or even a load balancer. some clients had better run locally on the db server and use the unix socket authentication which is arguably significantly faster and safer than sticking passwords in code. other clients will do an excellent failover job natively and even much better if you set them to reconnect to the next server should one become unavailable or rather on ANY error. the proxies are slow, might become spofs themselves, and add no significant security since they do not do any actual protocol rupture exdept for the mysql proxy which is once again slow.

i concur it is totally weird a bash script (likely using the unix socket) would produce such errors. one possibility would be a dns lookup that takes forever when the client connects. you might want to try adding

skip-name-resolve to your config to either rule that out or correct the issue if i was spot-on

>if you do setup a galera cluster, i strongly advice you do not use a proxy or even a load balancer.

Oh, that's interesting to point out. Yes, I dedicated two bare metal servers to mysql and a small vm to gardb. Not using it in production yet mainly because I had not gotten around to proxysql.

I read a lot about the overhead that proxysql would add but I decided it would be worth that cost in order to have redundancy. I've not gotten around to that yet because we're stil working on the last little bits to close this question and then was going to do the move.

>some clients had better run locally on the db server and use the unix socket authentication
>which is arguably significantly faster and safer

None of the clients are on the db servers, they are all remote of it.

>other clients will do an excellent failover job natively and even much better if you set them to
>reconnect to the next server should one become unavailable or rather on ANY error. the proxies
>are slow

Hmm, what magic method might you know of that could accomplish this without a front end? When researching galera, I only ever came across using proxysql or haproxy either centrally or on each client. I'd love not to have to use either of those but not sure how to tell the various applications how they could get to another db server if one goes away?

>i concur it is totally weird a bash script (likely using the unix socket) would produce such errors.

Welcome to my world lately :).

>one possibility would be a dns lookup that takes forever when the client connects.
>you might want to try adding skip-name-resolve

Yes, I have that in all my cnf files, no need for resolve.

We touched on using an NVMe device to speed up mysql but I think I'll start a new question for that. Please, no one respond to what I just said and I'll post a new question. Thanks.

https://www.experts-exchange.com/questions/29240637/Best-use-of-NVMe-with-Mysql.html


most if not all libraries will failover between multiple ip addresses resolved from a single domain name at connection time.  the failover will be much better if you code it manually or at least retry any failed query which is common and good practice. in simple dead server cases, a simple retry will trigger a reconnection. there are no tools in galera because no tool is needed.

beware that your proxies need to be redundant using a shared ip or you will merely move the spof to a different layer. imho totally overkill. if needed, i would rather rely on a less smart lb on whichever firewall you already have. and i would limit it s usage to clients that cannot recover and deal with said clients asap.
I might have to open yet another question just to better understand your last comment skullnobrains otherwise, it will fill this question with unrelated stuff again. Do you minf if I do that? It's an interesting area that others could benefit from when considering how they would want to access a db cluster.

BTW, while checking things this morning, I came across this, innodb_buffer_pool_instances.
I noticed our current instance has this set to 1 on all servers which is the default.

Description: If innodb_buffer_pool_size is set to more than 1GB, innodb_buffer_pool_instances divides the InnoDB buffer pool into this many instances. The default was 1 in MariaDB 5.5, but for large systems with buffer pools of many gigabytes, many instances can help reduce contention concurrency. The default is 8 in MariaDB 10 (except on Windows 32-bit, where it varies according to innodb_buffer_pool_size, or from MariaDB 10.2.2, where it is set to 1 if innodb_buffer_pool_size < 1GB). Each instance manages its own data structures and takes an equal portion of the total buffer pool size, so for example if innodb_buffer_pool_size is 4GB and innodb_buffer_pool_instances is set to 4, each instance will be 1GB. Each instance should ideally be at least 1GB in size. Deprecated and ignored from MariaDB 10.5.1, as the original reasons for for splitting the buffer pool have mostly gone away

Open in new window

This seems like something I should be taking advantage of considering the bare metal server is now handling everything with easy and resources to spare. I have a 98GB buffer pool so could easily set the above to say 12 instances and see how that goes.

Wish there was a way of reloading a db server instead of having to fully shut down as some variables cannot be set in real time like this one.
feel free to open a new question.

multiple buffer pools seldom brought significative performance differences frim my experience.

i would advise you skip that entirely.

then if you want to play you probably want the same number of read and write threads as the number of pools (but that is REALLY not something i am sure of). and you would raise that setting progressively. the best setting should probably be the lowest between the number of cpus and the number of disk pairs in a raid10 array but you may find out something different. in your specific case, i bet it won t make much of a differerence.

you probably should nice the db to high priority on bsd hosts, remove swap if you still have one and focus on the queries. the performance schema is your next level debug tool.
Separate pools MAY be effective in NUMA architectures IF your can direct pools to be on different systems AND have IO task affinity bind to the right CPU.
Quite a bit of IF's... 
to build on that, specialzing cpus will occur naturally when needed on any more or less dedicated system. dbs with a lot of cpus will likely perform a bit better without stuff like hyperthreading if you use intel. you can disable it at runtime on bsd system with a sysctl
This is Ed giving some answers and wrapping up to close this question ...

Schema cache
The framework we use is already using efficiently the schema update, the only error was that no cache was used for it and that caused an overwhelming number of requests to the DB server.

Slow queries
Apart the two tables belonging to a module we want to refactor completely (they are not affecting the normal behavior of the app), the longest query is a bit more than 2 seconds.

Cluster
We have now a cluster in place and are getting used to the tricks needed to have full control of it in case of crash/recovery, then we will move the whole DB to it

Conclusion
The main goal has been achieved even if there are some tiny things that can be considered as fine tuning.
And this is Mike.

It will take me a while to go through the question and point out the helpful comments and the ones that helped solve the issue. A lot of new things were learned from this as well.

There was a real human aspect to this that is sometimes lost in questions and answers. People don't always show their appreciation when receiving help and more importantly, sometimes, we don't realize how important we were to someone we don't even know. I'm talking about the experts :).

This was a very real problem that did in fact lose us customers and there was a lot of stress behind the scenes as it was happening.

I started writing a thanks but it grew too long so it's now a post. Thank you for all the help.

https://www.experts-exchange.com/articles/37192/The-Value-Of-Experts-Exchange-In-My-Daily-IT-Life.html
Love it, Mike!
Since things are working smoothly, it's time to take a few days away from technology.
Thanks again everyone that took part.


Glad it worked out. good luck with your operation, it has been a journey. (or maybe a rollercoaster like ride.). 
Thanks Noci.
Glad everything went well at the end...
Yes, happy to say that everything remains well at this point. Thank you again John for taking part to help.

Took a few days off without much Internet and back to work this morning to find everything humming along nicely.
Now to tackle things that got backed up as we got sidetracked.



kudos and good to know. hope you get along witb the cluster as well. your feedback is much appreciated.

do not forget to setup some lightweight monitoring or whatever allows you you figure out you run into query/index level issues or are close to running out of iops.

ps : i think this thread is probably indeed the longest on ee ; the longest i saw, clearly and the only one i really have a hard time to deal with on my mobile phone ;). a few more like this one and we might create a new disease called the scrolling index ! xD
What we need is a 'Kudos' button to acknowledge certain comments. The 'No, but it is helpful' doesn't always fit.
I didn't want to keep this question growing so let me only add, thanks for the additional tip and yes, we are now going to routinely check to see if there are any problems.

Since moving away from centos, I cannot use the percona client anymore since they don't have one for freebsd. I was getting to know that and was enjoying using it and it would be handy for the clusters as well.

I'm sure there are new diseases being named right now for the future problems we'll have with mobile only communications. I hope PC's don't die. I like my big screens and keyboard.


the percona client can probably be used from a separate host... ? i do not use it so i would not know.

it most likely will work quite easily on freebsd in linux compatibility mode which is based on fc core. you might miss a few .so to install manually using the relevant packages fc_something but chances are it just works out of the box.
+1 for keeping screens that do not display your fingerprints all over the place