Link to home
Start Free TrialLog in
Avatar of marrowyung
marrowyung

asked on

Docker Swarm and how it use with MariaDB maxscale .

hi,

reading this :

https://severalnines.com/blog/mariadb-maxscale-load-balancing-docker-deployment-part-1?utm_campaign=MariaDB_Campaign_JUN19&utm_content=maxscale_docker_1&utm_medium=Social_Media&utm_source=Facebook&fbclid=IwAR3pMjAyEA7qt4x9CKRHQdJah1feHuwIF_OhhRb-K6Bc_MjTDSRMJyfJzNA

what is  Docker Swarm ?

it say

"MaxScale Clustering with Docker Swarm
With Docker Swarm, we can create a group of MaxScale instances via Swarm service with more than one replica together with Swarm Configs."

so what is that ?


also in here:

https://severalnines.com/blog/mariadb-maxscale-load-balancing-docker-management-part-2

it say:

Query Rewriting
Query rewrite is a feature that, depending on the queries running against the database server, quickly allows to isolate and correct problematic queries and improve performance.

Open in new window


I dont' understand what it means ?

and this one :

uery rewriting can be done via regexfilter. This filter can match or exclude incoming statements using regular expressions and replace them with another statement. Every rule is defined in its own section and include the section name in the corresponding service to activate it.

Open in new window


this means if we see a bad query we use maxsale to replace any string ?
Avatar of noci
noci

Docker Swarm:    is a set of docker host computers that have been joined into a cluster.   A docker swarm only provides a single management interface to the cluster. There is no sharing of data or other resources.    Swarm can be used to provide better coninuity for example. If you need to stop one docker host for maintenance you can migrate all tasks to another node. and start servicing the node that has been freed of duties.

Query rewrite:
Please read the examples... You can build a filter to prevent some SQL qeuries..  the example given is if you want to block SHOW sql commands, and replace it with something else like returning the string 'Not Allowed' is a given example.
With regex (regular expressions, or string matching) you can  match ANY query, and you need to take care on how to write them...  

show will match anything containing show ...
also:   select * from hosted_shows where event = 'whatever music festival';
 ^show will only match show on start of a string like an sql statement see example in your referenced site.

See also: https://en.wikipedia.org/wiki/Regular_expression
To build on what noci stated above, MaxScale Clustering with Docker Swarm - collection of Docker containers, each running a MariaDB instance.

Running many database instances tends to be overkill for most applications.

Most applications run much faster if one instance is used + the one instance is tuned correctly.

Open another question describing the problem you're trying to solve + likely you'll have some great implementation suggestions.
Avatar of marrowyung

ASKER

David Favor,

"Open another question describing the problem you're trying to solve + likely you'll have some great implementation suggestions."

no. as the post said, I just read an article and do not understand sth. and i am not running docket swarm at this moment.

"Most applications run much faster if one instance is used + the one instance is tuned correctly."

what is the one instance is tuned correctly ? you mean that ONLY ONE DB is tuned correctly THEN application running faster than in cluster environment ?

noci,

"If you need to stop one docker host for maintenance you can migrate all tasks to another node. and start servicing the node that has been freed of duties."

so everything including data and configuration will migrate to another Docker Swarm nodes and keep going ? so everything is replicated automatically well before the failure started ?

So the pool of docker is a  Swarm? of course with failover capability ?
Data will not migrate, you will need NFS mounts or some other Data sharing method to do that.
Swarm will take care of the starting of application on the right platform.

So SWARM will not replicate storage,  it is a Task supervisor that can run in a cluster transparantly.
(Your docker instances won't know on what host they run, your external applications won't,don't need to, know  on what host the apps run.)
Networking & Cpu management is what docker is about.
Docker will mount the needed persistent storage for you. It will need to be reachable through NFS or some other method you provide.

A well tuned Mysql server can probably deal with a mixed load best.
If you have a MYSQL database that is mostly read and only has a few updates then a cluster having many readonly nodes might work.
(In the proposed setup only one node can do updates, the others follow with a certain lag).
1) I personally use LXD for this type of setup.

With Docker, managing all the Docker volumes (one /var/lib/mysql per container) becomes massively complex.

With Docker all these data volumes live in some ad-hoc location you have to manage, then each Docker container must be provided the correct data volume containing the correct /var/lib/mysql when container is instantiated (starts).

With LXD all software packages + /var/lib/mysql data files live in a file collection all owned by LXD.

2) You still haven't described the actual problem you're trying to solve. Running a Docker Swarm is a solution to some problem, not the actual problem. Keep in mind, if you open another question about resolution approaches to your actual problem you'll get some very smart answers, as there are many smart people on EE.

3) You asked, "what is the one instance is tuned correctly ? you mean that ONLY ONE DB is tuned correctly THEN application running faster than in cluster environment ?"

No way to know if this is true for your use case till #2 is answered... and...

a) Tuned correctly - the App (data consumer) is designed well. The database is designed well. Various tools like mysqltuner are run to optimize database performance. Correct storage engine is selected for each table. Many factors.

b) Single/Multi Instance Performance - When moving to a multi-instance database, normally speed slows to a crawl because of the replication speed. I saw this problem repeated when working with HACMP for a decade at IBM. Most multi-instance database problems were a result of using a multi-instance database where a single-instance database worked far better.

4) You asked, "so everything including data and configuration will migrate to another Docker Swarm nodes and keep going ? so everything is replicated automatically well before the failure started ?"

a) If you stop one instance + migrate that instance... Remember using Docker rather than LXD, migration is a highly human/manual process, where as LXD can be automated. With Docker, you will have to stop + destroy the Docker instance (however you've manually determined to manage your instances), then rsync the related Docker /var/lib/mysql else where, then startup up Docker on the new server.

Depending on how you've setup your replication, there may be work to do on every Docker instance to restart every single instance.

For example, if you use fixed IPs (not recommended), then you'll have to change the old IP to the new IP + restart all Docker instances, to integrate the instance by the new IP. If you use host names, then when the IP begins to answer (new Docker container starts) the new IP will integrate with all other instances eventually.

b) Automatic replication... wow... to answer this will require far more information.

If each of your Docker containers is referenced by hostname, likely this will be true. If IPs are used instead, then manual intervention is required each time any IP in the swarm changes.

5) You asked, "So the pool of docker is a  Swarm? of course with failover capability ?"

Whether this is true or not is highly dependent on your entire design.

Said another way, to have HA like failover capability (if this is what you mean by failover), you must design this from the beginning. So, since you asked this question, the answer is likely no, as this type of design tends to be very complex.

Again, having #2 answered will move this discussion out of theoretical answers into far more practical answers.
noci,

"A well tuned Mysql server can probably deal with a mixed load best."

load test , right ? or what do you mean ?

"If you have a MYSQL database that is mostly read and only has a few updates then a cluster having many readonly nodes might work."

MySQL is good for write! cluster is good for read operate you mean ?

"So SWARM will not replicate storage,  it is a Task supervisor that can run in a cluster transparantly."

so SWARM is the one make docker cluster able ?

so it only control and manage the docker primary and secondary, failover and monitoring ?

David Favor,

" I personally use LXD for this type of setup."

sorry ,what is LXD ? Linux container hypervisor ? how good it is when compare to container? it is also a container, right ?

"With LXD all software packages + /var/lib/mysql data files live in a file collection all owned by LXD."

by a shared storage ?

"2) You still haven't described the actual problem you're trying to solve. "

I have mention that:

no. as the post said, I just read an article and do not understand sth. and i am not running docket swarm at this moment.

Open in new window


I do not have any problem on docker but just want to know what swarm do and how it helps on docker .

"Most multi-instance database problems were a result of using a multi-instance database where a single-instance database worked far better."

tks. good point but if we accept failover/HA/DR, this kind of slowest can be accept.

we can tune query for it, but DB load balanced read only operation also good to trade that off.

", you will have to stop + destroy the Docker instance (however you've manually determined to manage your instances), then rsync the related Docker /var/lib/mysql else where, then startup up Docker on the new server.

I thought is sync data to other docker first then destroy the old one ?

"If each of your Docker containers is referenced by hostname, likely this will be true."

why? another docker do not have the data of the old docker, resync is needed anyway?
mixed load:     = updates/inserts(write) as well as selects (read).
with mysql having only one updateable node and many readonly nodes you need a load that fits this description.
(and updates may lag as they need to be replicated to the readonly nodes).   So your queries MAY show "old" data.
Anything requiring consistent transactions cannot use this clustered setup....

For environments where you need consistent transactions this is nog usable, for the average webshop (excluding the payments) this can work.
"So your queries MAY show "old" data.
Anything requiring consistent transactions cannot use this clustered setup...."

oh,, I think it will be a read commit operation. if it is the problem, I think all replication cluster has it and I think it is only for async replication, right?

MySQL said to be do the replication in transaction commit level and sync replication so when data change and it has to change in all node before back to application, right?

"For environments where you need consistent transactions this is nog usable,"

what is  "nog" ? not usable , right?

if I need consistent transactions , what other solution with HA and DR can I use
?
When using full 2 phase commit then all systems are consistent and there is no need for master & slaves,
Master & slaves implies commits to the master, and transaction sets later being copied to the slaves.
(Another example of this setup is f.e. the DNS server system).
Master slave is much easier to implement, only complete transaction need to be copied. And suffices for most solutions
(think word-press like websites..., no need for transactional safety there).

"nog" is a typo, the t is slightly above the g on my keyboard, i meant "not"  but missed the t and hit the g.

Any clustering that has no dedicated write/read server but you can connec to to any which way. And uses a consistency mechanism should be doable.  
In most cases this is done using "2-phase commit". https://en.wikipedia.org/wiki/Two-phase_commit_protocol
1) What is LXD.

https://linuxcontainers.org/lxd/introduction/ provides details.

The difference between Docker + LXD is that LXD has the concept of persistent data + allows setting up an entire, bootable, Distro + packages, just like a physical machine.

Docker provides a runtime environment with no persistent data, so data must be managed through some user determined directories. This makes Docker management cumbersome. Especially if you have many containers.

2) Running a Docker swarm for MariaDB/MySQL rarely makes sense.

a) If you run many database instances on one machine replicating with each other, this will run far slower than a single instance. As the same CPU, memory, disk i/o channels are all used.

b) Using many instances connected over a network is even slower, because now your system speed is limited by network speed, which will always be orders of magnitude slower than CPU/memory speed on a local machine.

This is why most applications work best on a single machine, with plenty of memory + continuous tuning of your database instance.

3) I do not have any problem on docker but just want to know what swarm do and how it helps on docker .

A swarm simply means your running many containers.

Swarm == Many

Whether this helps or hurts an application relates to type of application running.

In the case of databases (see #2) using a Docker swarm will require database replication (per #2) + will run much slower than using a single database instance.

If you think through this you can get the idea quickly.

a) If you have one database instance, then that instance arranges i/o correctly to optimize i/o throughput.

b) If you have many instances (swarm) you have much more context switching.

c) With many instances the worst problem is disk i/o contention. With many instances, each instance will be fighting for memory for memory buffers, then fighting for i/o priority.

d) Memory buffering gets very ugly with this. If one instance requires 2G to optimize reads + you run 10 instances, you now require 20G of memory instead of 2G of memory, just to optimize reads.
Suggestion: After you close out this question, consider opening another questions describing your specific application, asking for people about how they might design + implement your specific App (database + code).
noci,

"Master slave is much easier to implement, only complete transaction need to be copied. And suffices for most solutions"

Then consistent transactions this is doable by this, right ?

"In most cases this is done using "2-phase commit". https://en.wikipedia.org/wiki/Two-phase_commit_protocol"

yeah, usually it is! master commit and ALL other slave commit too or no one commit the transaction.

"Any clustering that has no dedicated write/read server but you can connec to to any which way. And uses a consistency mechanism should be doable.  "

you mean no dedicated master or slave can do consistency transaction, is that right ? should be the other way around.

David Favor,

"Swarm == Many"


so Swarm is the cluster of docker but if we want data can be share between dockers, then we need a SHARE storage between docker. Swarm only control the failover of dockers and nothing else ?

or it is not!

if it mean many dockers then I don't think that word is important.

so if it means many then I am not sure why MaxScale can Clustering with Docker Swarm !
SOLUTION
Avatar of noci
noci

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
noci,

"Maybe look into a Galera Cluster there are no master & slaves... or it is mentioned as a "Synchronous  Multi Master" cluster:   "

Galera cluster there are master and slaves! that one is synch multi master but usually the default installation do not enable multi master mode.

Actually we do not prefer that  as single WRITE master always safest !

We should consider this when we have more than one office and each office write to their own local MariaDB server.

however what are you supposed to say by this
?

David Favor,

"You can't share raw /var/lib/mysql data files between database instances.

1) There's no point.

2) All your data will instantly be corrupted."

that's why you said all data must be move/migrate manually ?

"To do a failover you'll have exactly one slave, which will be replicated from exactly one master."

yeah, docker slave, but we have to setup replication between dockers manually ?
tks all,

David, please also update me:

"yeah, docker slave, but we have to setup replication between dockers manually ?"
Yes, you must come up with your own... hand rolled (custom) data organization to run replication between 2x or more Docker containers.

LXD is much easier for this.

And, independent of container mechanism you use, there's no point in running replicated databases with all instances on the same machine.

The only result of this will be much slower database access, with no redundancy, because if any hardware fails then likely both instances will either be completely dead or data will be corrupted.

Likely good to open a new question describing your App.

Replication is rarely required.