<

Go Premium for a chance to win a PS4. Enter to Win

x

Master your MySQL data - Table partitioning approach for better performance.

Published on
4,149 Points
1,049 Views
1 Endorsement
Last Modified:
When table data gets too large to manage or queries take too long to execute the solution is often to buy bigger hardware or assign more CPUs and memory resources to the machine to solve the problem. However, the best, cheapest and most effective solution is to partition the data into more manageable parts.
Modern databases (including MySQL and MariaDB) have and offer a feature called table partitioning which reduces query execution times and makes the data more manageable. In this article I assume you have MySQL 5.6 or newer or MariaDB 10.0.4 or newer.

What is Table Partitioning?


Table partitioning is a data organization scheme to divide large quantity of data into multiple data storage (data partitions). The whole purpose of table partitioning is to divide the table data into smaller and more manageable data units, shorten reorganization needs/ -time and speed up inserts/updates/deletes and queries execution time that on regular tables becomes slower as the data in the table grows.


Partitioning types


MySQL and MariaDB (as does most modern databases) offer several types of table partitioning. These types are


Range and range columns partitioning (the most commonly used),

create table mytable ( id int(11) NOT NULL, ….. )

partition by range ( id ) (

partition p0 values less than ( 10000),

partition p1 values less than ( 20000), ….

);

Open in new window


List and list columns partitions,

create table mydepartment_table  ( department int(11) NOT NULL, ….. )   
partition by list ( department ) ( 
	partition headquarters values in  ( 100,200),
        partition eastcost values in  ( 300,400),
	partition westcost values in ( 500, 600),….
);

Open in new window


Both range and list partitioning are suitable for queries that uses range predicates (columnA between .. and ... , =<,>=, <, > ) and in predicates ( columnA IN ( ... , ... ) ).

Hash and Key partitioning

Note that in order of using key partitioning your table must have either a primary key or a unique key as the hashing function is supplied by the server.

create table mytable  ( id int(11) NOT NULL,mydate DATE NOT NULL, ….. )
PARTITION BY HASH( MONTH(mydate) )
PARTITIONS 12;

create table mytable  ( id int(11) NOT NULL PRIMARY KEY ,mydate DATE NOT NULL, ….. )
PARTITION BY KEY()
PARTITIONS 12;

Open in new window


Partition with sub-partitions (composite partitioning which can be any combination of the above)

create table mytable  ( id int(11) NOT NULL,mydate DATE NOT NULL, ….. ) 
partition by range ( id ) 
PARTITION BY HASH( MONTH(mydate) )
( 
	partition p0 values less than ( 10000)
	(
		subpartition s0,
		subpartition s1
	),
            partition p1 values less than ( 20000)
	(
		subpartition s2,
		subpartition s3
	)
	, …. 
);

Open in new window


In addition to having columns in a partitioning key, you can have partitioning key with functions (partitioning expressions as shown in above examples), usually aggregate functions, on those columns. In fact some partitioning types requires such setup. Such function would return an integer or temporal value to determine which partition contains or should contain a given row. Using functions in your partitioning should however only be done if your SQL statements have those functions as well to filter out the data and those partitions that are involved.


When does data in a table become a candidate for partitioning?


In my opinion and many others, you should look at all tables that have 5 million or more records and make them partitioned. You should also consider partitioning if the statistical distribution of the data shows a good distribution, even though, total number of records are under 5 million records. Why ? This is all about balancing the workload. By partitioning your data you divide it into smaller more manageable parts for the whole database-system to work with whether it is to optimize (reorg), analyze (runstats), insert/update/delete or SQL queries. And what is most important is that you can reduce ( in some cases ) backup and recovery time which is vital part of your SLA's.
 

What do we have to look at when designing a partitioned table ?


The most important factor to look at when designing a partition table (or to an existing table) and choosing the partitioning key is the table statistics and the statistical distribution of your data. The statistical distribution of the data gives you a clear vision on how you can divide the data. Look at the column statistics of the table, execute several group by SQL queries to determine what column(s) give you the most evenly distribution of data. Each partition should hold as close to a similar amount of data to evenly distribute the data across partitions for optimal performance and maintenance. Note that you must also make sure to avoid creating hot spots partitions as much as possible.

You will also need to carefully look at how your workload and your most used DML statements to determine and choose the correct partitioning type. However, the most commonly used partitioning type, as I mentioned before, is range partitioning. That does not mean that it is optimal for your workload.

So choose your partitioning key carefully and most of your queries will benefit from the partitioning from the moment the partitioning is applied. Other queries would need to be tuned.

Let's take a look at these two examples that produces the same result but the SQL is written in slightly different ways.querysample1.jpg
Explain of these two queries gives you this:
Query 1) explain_query1.jpgQuery 2)
explain_query2.jpg

In the salaries table I have roughly 2.8 million rows and employees I have roughly 300 thousand rows.

The different execution times are :


  1. With partitioning 8.68 sec
    Without partitioning 10.98 sec

  2. With partitioning 35.92 sec
    Without partitioning 33.33 sec
Even though my tables are small you can clearly see on the explain and the execution time of query 1 that you gain lower execution time as smaller amount of data is searched using the partitioning feature.
However, query 2 does not benefit of the partitioning as the “dependent subquery” needs to search all partitions. This is something that needs to be fixed in the optimizer as I believe that the limit on the year in the query is not sent and/or used in the subquery.
 
MySQL and MariaDB that have table partitioning capabilities have a feature called partition pruning (as many other modern databases like Oracle and DB2). This feature allows the RDBMS system to look at the query and its where clause column values to determine what partitions needs to be read and what partitions can be skipped. This is done by looking at the partitioning keys max and min values for each partition and partitioned indexes, This feature obviously reduces the query time tremendously for very large to huge tables.
 

Storage Engines Supporting Partitioning and distribution across filesystem/storage devices


Storage Engines that I find to be most suitable for and support partitioning are InnoDB and MyISAM (Aria in MariaDB). For InnoDB make sure that the system variable innodb_file_per_table = 1 for the tables to be stored in different tablespaces which can be moved around as needed.

To determine if partitioning is supported issue the command in the mysql console
 

SHOW PLUGINS;

Open in new window


and you should see this line
 

| partition                  | ACTIVE   | STORAGE ENGINE     | NULL        | GPL     |

Open in new window


meaning that partitioning is supported. You can also issue this sql statement for the same purpose :
 

SELECT PLUGIN_NAME as Name,
       PLUGIN_VERSION as Version,
       PLUGIN_STATUS as Status
FROM INFORMATION_SCHEMA.PLUGINS
WHERE PLUGIN_TYPE='STORAGE ENGINE' and PLUGIN_NAME = 'partition';

Open in new window



Partitioning allows you to place and/or move data in the same table across multiple storage devices and/or filesystems. Partitioning are specially useful when you are dealing with historical data where you put the most current data on SSD disks while older data are on slower and larger disks but available through query on the same table.
 

create table mytable ( id int(11) NOT NULL, createddate DATE NOT NULL,….. )  
 partition by range ( year(createddate) ) ( 
 partition p0 values less than ( 2015) DATA DIRECTORY = '/slowdiskpath' INDEX DIRECTORY = '/slowdiskpath',
 partition p1 values less than ( 2017) DATA DIRECTORY = '/fastdiskpath' INDEX DIRECTORY = '/fastdiskpath', …. 
);

Open in new window



Useful commands for managing partitions


When it comes to managing your partitioned tables the first thing to look at is the distribution of data between partitions. To see the data-distribution you query the INFORMATION_SCHEMA like this.
 

SELECT PARTITION_NAME, TABLE_ROWS
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 'salaries';

Open in new window


infoschema1.jpgFurther info and useful columns to queryfor the partitions are shown below
 
infoschema2.jpg

This will give you clear idea if your partitioning has a good distribution or not.
 

The ALTER TABLE statement is used to manage the partitions and you can


  • ADD or DROP partitions,

  • REORGANIZE partition_names INTO single partition or other partition definitions of your choice.
    This is used for RANGE and LIST partitions. This is useful when you need to combine partitions to even the distribution of the data. The min and max range of the new partitions must account for the same range values including the minimum and maximum range values of the partitions that are listed in partition_names.

  • COALESCE PARTITION <number>.
    This is useful when you need to reduce or increase HASH and KEY partitions to even the distribution of the data.

  • ANALYSE / OPTIMIZE PARTITION <partition_names>
    To optimize (reorganize) and run statistics on partitions so the data in the partitions are in perfect search order and statistics are current.

  • CHECK / REBUILD / REPAIR < partition_names >
    These commands are necessary in recovering the partitions after a system crash or other failures to the table or it's partitions.

  • EXCHANGE PARTITION <partition_name> WITH <another table_name>
    This comes in handy when you need to move partitions between two tables. Especially convenient when you have an ONLINE_TABLE and an ARCHIVE_TABLE strategy in your system.

  • REMOVE PARTITIONING
    This is useful if you don't want to use partitioning or when you change from one partitioning type to another.

 

Conclusion


Here I have talked briefly about some main issues in regarding to table partitioning in MySQL/MariaDB that makes your application perform better as well as it makes your data management a little more manageable.
When designing and/or tuning applications that query tables with very large amount of data table partitioning can make a huge difference in speed of the queries as well as other maintenance. Choose your table partition type and strategy carefully.
 
1
Comment
0 Comments

Featured Post

NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

Join & Write a Comment

This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…
This lesson discusses how to use a Mainform + Subforms in Microsoft Access to find and enter data for payments on orders. The sample data comes from a custom shop that builds and sells movable storage structures that are delivered to your property. …
Suggested Courses

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month