mariadb tuning for lots of small queries

skullnobrains used Ask the Experts™

i'm working on a mariadb cluster with high performance requirements in terms of number of queries.

i can work with read-uncommited. actually, most of the app would not require more than eventual consistency. i'm well aware that app should probably be using cassandra or a similar tool, but moving from mysql is simply not currently an option. mariadb using a cassandra store neither. i'll be probably moving part of the load to memcache/redis for parts that do not need to be in an SQL store. little by little, though.

i've worked quite extensively with mysql tuning in the old days, but not that much in the past 5 to 10 years and i feel i might be quite outdated and additionally working with both an environment and a workload i'm not used to on mysql.

i can obviously add more NDB members but i'd rather switch to maria if possible.

the overall quantity of data is rather small : around 10Go but we expect about *10 growth quite shortly. actually more, but i'll probably manage to optimise many things by then.



the current NDB cluster has 2 members

it performs about 10k select queries per second on each member, and about 200 inserts/s. the rest is comparatively neglectible.

most of the select and inserts will be a single row.

there are a few bigger queries, but performance on those queries is not critical as long as running them does not produce contention.


hardware and OS requirements. i have no control over most of these.

os is ubuntu 64 xenial. i'll probably be able to grab a bionic. nothing else.

barebone machines is unfortunately not an option : but running a de facto almost dedicated esx is feasible.

current esx hosts are g8 machines with 4 disks raid 10 for the datastore. i should be able to get better hardware, but that's not the point here : i need to beat NDB on similar hardware. this is quite the challenge because the workload favors ndb a lot.

the filesystem is XFS. no specific tuning. i have full control over this choice. i will consider dedicated drives if it is significantly faster on linux.

10 vcpus
8 Go RAM
no swap at all


current settings

6 Go buffer cache
innodb_log_file_size = 512M
innodb_log_buffer_size = 64M
doublewrite is active
innodb_flush_log_at_trx_commit = 0
8 read threads
8 write threads
8 buffer pool instances.

i used the number of cpus for the 3 last settings. not sure what is best.


i'm benchmarking by replayiing about 10mn of load on the cluster.
i'm using 3 members and 10 to 100 parallel connections per member.

currently, i manage to replay in about 3 to 4 minutes
i feel the machines are not fully loaded and i'm not giving the full power to innodb. or gallera throttles it.

i'd like to be under 2 if possible

ideas ?
settings i have not considered ? things to check during the bench ?

note : i want to achieve the best possible on the current hardware. please do not suggest new servers. this topis is about mysql tuning.

thanks all
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
David FavorFractional CTO
Distinguished Expert 2018

Biggest consideration is amount of RAM for buffering.

Use mysqltuner to provide you ballpark RAM size, then double or triple this number, as RAM is cheap.

Also mysqltuner will provide you fairly good config settings for the few seconds it takes mysqltuner to run.
I need to tune as best i can on the existing hardware.
I am voluntarily using a pool size slightly smaller than the dataset in order to tune OTHER parameters first.
My pool size is ramsize - 2Go
I will be using at least 32G in production.
More would be pointless as this could alteady hold the whole dataset, indexes, and tmp buffers for about 100 concurrent clients.
I am looking into the best settings for parallel threads, buffer poll insta ces, ... in order to avoid mutex contention.
Most of these settings either did not exist or needed to be defined based on the number of disks back in the days and on barebone.

I have little no experience with virtualisation of sql servers without direct disk access except in cases where performance was not an issue... and there are many new settings in xtradb...

Additionaly, i need to show that a galera cluster can outperform ndb in this case and allow datasets bigger than the available ram. Unfortunately, the app is not designed to split reads in a cache efficient manner and cannot be significantly improved in this regard.
Exploring SharePoint 2016

Explore SharePoint 2016, the web-based, collaborative platform that integrates with Microsoft Office to provide intranets, secure document management, and collaboration so you can develop your online and offline capabilities.

nociSoftware Engineer
Distinguished Expert 2018

IMHO database server and file servers should not be virtualized.
(In virtualized environments IO will ALWAYS be one more step for each IO, so besides the actual transfer (which might be in one go, although i doubt that), at least the setup is done twice.
One for the Virtual Environment, [ VM -> Host ]  + One for the host. [ Host-> storage ].

If you connect using iSCSI then the Network IO still has this flow...
So for the best performance try to avoid doing in intensive applications in VM's
i'm totally aware of that... and fighting my way to get a few barebone servers for this platform.

but that's not really the current topic : when i test, i do not saturate either the virtual drive's I/O throughput or the CPUs.

given the workflow, all of the data pages should be buffered during the test at some point. this exceeds the pool size by about magnitude 2. which means the machine should litteraly be dying from I/O shortage during the test.

it is actually responsive and only using a small fraction of the available tps and throughput. so i'm missing something that produces contention. i'm expecting some tuning ideas i may not have had yet or possibly some tuning specific to virtualized envs.

nociSoftware Engineer
Distinguished Expert 2018

latency?     if the replacement of blocks is high you will have latency  in all transfer points. (buffer copying),  unpacking of DB block, repacking them, .... waiting to get blocks from disk, waiting for blocks being evicted, but not yet gone (locked cache blocks).   Lack of CPU capacity to handle load?
Cache saturation,  if SSD is used, satured caches of the SDD drives due to volume of writes. (more or less the same for SAN's).  & OS.

I have noticed huge problems when multiple independnt transactional programs  (doing fsyncs()/syncs()) are severly impacting each other when used in one filesystem.
(ex. MQ  & MySQL on one disk, the MQ expected only fsyncs for a single block [ no persistent data ] for the transaction counter, but was away several seconds doing the sync of Mysql as well.
During this fsync [ which should have been short ], the MQ was blocked causing a pileup of work.  
Splitting them to each their own filesystem, (mounted on separate mount points helped a LOT ).
This would also hold if multipe instances of databases are running ...  ie. Single disk design may be killing.

Those are hard to diagnose..  Latency will show where there is a lot of queueing of requests.
David FavorFractional CTO
Distinguished Expert 2018

Lots of RAM + mysqltuner are still your friends.

After running for a while, use mysqltuner to determine if you've guessed correctly about your settings.

Normally (there are exceptions) mysqltuner provides good diagnostics for config modifications.
Thanks noci.

I already use a dedicated disc ( well a virtual one ) and mysql is using directio or supposed to do so. I managed to improve performance slightly by using bigger replication buffers. For some reason, a bigger gcache seems to improve perf as well.

I am starting to believe the messy indexation is also responsible for part of the load : even a single row to write produces a number of locks, some of which are useless.

more buffer_pool instances did help
I am unsure the number of read threads makes a difference as i always used the same number of threads and pools. Which may or may not be a good idea

my read threads now manage to saturate 4cpus but  not all of them. I m still unsure why. i have reasons to belive the linux scheduler does not properly split the threads among availa le cpus and also believd there is either a mutex or something gallera related that stalls the reads at times

Write performance to the drive is not the issue. I am using rather unsafe settings that allow to loose about 1s worth of writes and i have few of them.
i need to test sticking the binlog on a separate drive. Anyone hax experience with that ?

On barebone and freebsd ufs, the difference was nil with the settings i am using but quite huge with safer transactions. And even more with an ssd. with zfs using thd ssd for thd ZIL, there was no gain at all

edit : binlog should be read as innodb log files though in this case both would probably apply

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial