Power of the Recursive SQL arrives in MySQL and MariaDB

Recursive SQL is one of the most fascinating and powerful and yet dangerous feature offered in many modern databases today using a Common Table Expression (CTE) first introduced in the ANSI SQL 99 standard. The first implementations of CTE began appearing in 2006-7 and as of MySQL version 8.0 and MariaDB version 10.2.2.

Common Table Expression ( the WITH RECURSIVE … clause )

Recursive SQL is achieved by using Common Table Expression or CTE which is a temporary named resultset, derived from a simple query and defined within the execution scope of SELECT, INSERT, UPDATE or DELETE statements. CTE is often used to improve readability of a SQL code by simplifying and/or prise the code into more readable sections. One “feature” of CTE is the ability to reference recursively the named resultset, hence recursive SQL.

With recursive SQL queries you can achieve things you would not have imagined being possible with this type of SQL and at the speed it executes. You can solve many types of business problems and even rewrite some complex SQL/application logic down to a simple recursiveSQL-call to the database.

Some example of useful usage of recursive CTE is that you can use it to find gaps in data, create organization charts and create test data.

What I like most of using recursive SQL query is the ability to produce large amount of test-data in amount of seconds or few minutes, depending on complexity of the data and the available database/systems resources. Using recursive CTE we can produce many hundreds, thousands or millions of records in very short period of time which is then only limited to available database memory and other database/system resources. Recursive query produces test data way faster compared to other test-data procedures that I have seen and experienced.
Also, it has been proven that recursive queries outperforms other queries that take days to execute on huge amount of data by running in several minutes.

The word recursive says it all. You have a query that repeatedly calls itself with some starting point and that which is EXTREMELY IMPORTANT an ending point (a fail-safe exit as I call it). If you don't have a fail-safe exit or your recursive formula goes beyond it you are in deep trouble. Your query will go into aninfinite loop resulting in very high CPU and very high LOG utilization which will lead to memory and/or storage exhaustion. If your query goes haywire you must think very fast and stop it. If you are unable to do so then alert your DBA immediately so he/she can prevent the database system of choking by killing the runnaway thread.

Examples

Let's look at some simple examples that shows how easy and effective it can be.

with RECURSIVE mydates (level,nextdate) as (
select 1 level, FROM_UNIXTIME(RAND()*2147483647) nextdate from DUAL
union all 
select level+1, FROM_UNIXTIME(RAND()*2147483647) nextdate
from mydates
where level < 100000
)
SELECT nextdate from mydates
);

Here we produce 100k of random dates. The level is our counter and fail-safe exit to safely exit the recursive query. The line ( no 2 ) is our starting point and the line (no 4-5 ) is the recursive call with the ending point in the where clause ( no 6) . Then the last lines ( no 8 - 9 ) is the call to execute the recursive query and retrieve the data.

The above query executed in 0.84 sec and outputted 100k of rows. Note that even though you put a LIMIT 10 on the query as shown below it will first produce the 100k rows and then limit the result to the first 10 rows in the same amount of time or 0.84 sec.

with RECURSIVE mydates (level,nextdate) as (
select 1 level, FROM_UNIXTIME(RAND()*2147483647) nextdate from DUAL
union all 
select level+1, FROM_UNIXTIME(RAND()*2147483647) nextdate
from mydates
where level < 100000
)
SELECT nextdate from mydates
) LIMIT 10;

You can also create a table and by that physically store the data by surrounding the CTE query like this.

create table somedates as (
with RECURSIVE mynumbers (level,nextdate) as (
select 1 level, FROM_UNIXTIME(RAND()*2147483647) nextdate from DUAL
union all 
select level+1, FROM_UNIXTIME(RAND()*2147483647) nextdate
from mynumbers
where level < 100000
)
SELECT nextdate from mynumbers
);

This query ran on my laptop in 2.93 seconds storing 100k rows of dates in a table called somedates.

Another simple example, a “gaps in data” solution, which produces last 1000 dates to current date and compares it to the order table to find all dates that don't have any orders showing the newest date first.

For this example I created 1 million rows with random orderdates using recursive SQL in 32.95 seconds. Created index on (orderdate,orderid) on the the table in 9.12 seconds, deleted thousands of random rows to create a gap and ran the below query

with RECURSIVE mydates (level,nextdate) as (
select 1 level, ADDDATE(CURRENT_DATE,INTERVAL -1000 DAY) nextdate from DUAL
union all 
select level+1, ADDDATE(nextdate,INTERVAL 1 DAY) nextdate
from mydates
where level < 1000
)
SELECT md.nextdate from mydates md left join myorders od on md.nextdate = od.order_date
where od.order_date is null order by md.nextdate desc;

This query ran on my test data and looked for gaps in the last 1000 days from current date and returned 68 dates in 0.05 seconds.

Conclusion

Here I have talked briefly about recursive queries and showed how useful and powerful it can be to solve problems in very short and effective way. However, when writing the recursive query be VERY careful NOT to forget the fail-safe exit and make sure you double check twice that the calculations used in the fail-safe exit produces a stop signal.

Power of the Recursive SQL arrives in MySQL and MariaDB

Conclusion

Comments (0)