Recursive SQL in UDB/LUW (you can use 'recursive' and 'SQL' in the same sentence)
A growing number of database queries lend themselves to recursive solutions. It's not always easy to spot when recursion is called for, especially for people unaccustomed to the technique, but it's an extremely powerful tool. When used properly it can solve problems that can't otherwise be easily solved.
There are many common definitions for recursion. The one that perhaps describes it best with regards to SQL is, "defining an infinite statement using finite components".
Of course, understanding that statement is only slightly less difficult than understanding a recursive query. (In plain English, the statement simply means "joining an arbitrary number of rows into a single result".)
The Problem (why would I want to use recursive SQL?)
It's often desirable to take data in consecutive rows and display it on a single row. That's an integral part of modern SQL and happens all the time when two tables are joined. Throw in a concatenation and items that were once in separate rows are in a single column in one row. If you want to compress three rows to a single row, join twice. If there is doubt that there will always be three rows a NULL result must be handled. Use the COALESCE function to make sure that the optional data is handled correctly.
But what happens when there can be a large number of rows that need to be compressed to a single row? The query has to be written to join each item. If the query could return up to 20 rows, 19 joins need to be written into the query to put all of the results in a single row.
Even worse is the case where there is no limit on the number of possible rows. Imagine a database that records documents by saving each word in the document in a separate row. To reconstruct any sentence in the document the query must concatenate the contents of an arbitrary number of consecutive rows. The available solutions are 1) writing a stored procedure or function that loops through the rows and concatenates the contents into a single string; 2) write a query with a lot of joins and hope that no sentence is longer than the query allows; and 3) recursive SQL.
A more common (and practical) application would be to process a transaction file, concatenating the different status codes into a single string. But practical application is your responsibility. This article shows you how to get there.
The Solution (can you really do that?)
Recursive SQL solves the problem of "how many joins are necessary to produce the full result?" by repeatedly joining related items until there is no need for further joins. The syntax takes a bit of study, but with a basic understanding of recursion, it's not difficult.
The problem of reconstructing a sentence from the separate words is an easy one to demonstrate. First we need a table to contain each of the words.
CREATE TABLE rec ( snum INTEGER, -- sentence number wordnum INTEGER, -- word number in the sentence word VARCHAR(100) -- word being saved);
It's ugly, and from a technical standpoint it's incomplete as it will only report the first 7 words of a sentence. Note that a couple more joins were done than necessary to produce the sample sentence to demonstrate that the query must contain at least enough joins to rebuild the sentence and that the query must gracefully handle the NULL values that are generated when the join is meaningless.
A recursive solution is shorter and isn't limited by the number of joins that are coded.
WITH rquery (snum, wordnum, sentence)AS( SELECT base.snum, base.wordnum, base.word FROM rec base WHERE wordnum = 1 UNION ALL SELECT t1.snum, t1.wordnum, sentence || ' ' || t1.word FROM rquery t0, rec t1 WHERE t0. snum = t1. snum AND t0.wordnum + 1 = t1.wordnum)SELECT * FROM rquery;
Whether there is 1 word or 1,000 words in the sentence, that one recursive query will handle it. (There is one caveat -- it is explained below.)
How it works (I feel a headache coming on)
The query has two parts. The upper subquery (lines 4-6) defines the starting point. Because the goal of this query is to reconstruct a sentence from the separate words, we start by selecting the first word of the sentence (WHERE wordnum = 1). Note that just as with all queries that use the WITH construct, the item list returned by the subquery must match the item list in the table definition. rquery returns snum (sentence number), wordnum (word number), and sentence (the constructed string); therefore, the subquery returns the sentence number, word number (always 1), and the first word of the sentence.
The lower subquery (lines 10-13) is the recursive part of the solution. It takes the results of the previous subquery and joins the next row to it. The actual values returned from the subquery are the sentence number, word number, and the sentence with the next word appended to it. The previously generated result (returned from rquery) is joined with the row in rec that contains the next word.
The diagram below shows how this works. The upper subquery and each recursive call to the lower subquery generate 1 row.
SNUM WORDNUM SENTENCE1 1 This1 2 This is1 3 This is a1 4 This is a fine1 5 This is a fine example
The upper subquery returns the first word (This). The lower subquery joins the next word (is) and appends it to the sentence and returns the newly created row. The recursion occurs when the lower subquery joins to rquery (the newly created row) and repeats the lower subquery. Every time that the lower subquery executes, it appends the next word to the sentence.
The Results (it really does work!)
This recursive query returns all of the generated rows, not just the final one. For a lot of applications this will be just fine, but for our purpose, we only want the complete sentence, not the partial sentences that were constructed during the recursion. That's easily solved by selecting only the last row.
WITH rquery (snum, wordnum, sentence)AS( SELECT base.snum, base.wordnum, base.word FROM rec base WHERE wordnum = 1 UNION ALL SELECT t1.snum, t1.wordnum, sentence || ' ' || t1.word FROM rquery t0, rec t1 WHERE t0. snum = t1. snum AND t0.wordnum + 1 = t1.wordnum)SELECT * FROM rqueryWHERE wordnum = (SELECT max(wordnum) FROM rquery);
Expanding the Results (I've got lots of data to deal with!)
The example that we've used so far is very simple. We have one sentence with a single starting value. In the real world the table would likely contain a lot of sentences, so a small change to the SQL is required. To see what happens when the table contains 2 sentences we need to add more data.
We also need a slight change to the SQL to select the last line generated for each sentence.
WITH rquery (snum, wordnum, sentence)AS( SELECT base.snum, base.wordnum, base.word FROM rec base WHERE wordnum = 1 UNION ALL SELECT t1.snum, t1.wordnum, sentence || ' ' || t1.word FROM rquery t0, rec t1 WHERE t0. snum = t1. snum AND t0.wordnum + 1 = t1.wordnum)SELECT * FROM rquery rqWHERE rq.wordnum = (SELECT max(wordnum) FROM rquery WHERE snum = rq.snum);
There are several places where it's easy to get tripped up using recursive SQL. Stay mindful of the items mentioned here and you'll avoid most of the major pitfalls.
Select the correct starting row(s). If the upper subquery doesn't select the correct initial value(s), the rest of the query is meaningless. Always select the base item(s) in the upper subquery that will be built upon by the recursive queries.
Avoid infinite recursion. If you structure your query so that there is no logical end point, DB2 will go along merrily joining the additional rows until it detects a stack overflow and throws an exception. This is almost always caused by applying the wrong filter in the lower subquery.
Join items in the proper sequence whenever possible. In our sentence example we joined words 1 and 2, then 2 and 3, then 3 and 4, etc. But the data may not contain an incrementing counter. The processing of a transaction log is a fine example. The transaction's timestamp puts the transaction in order, but there isn't a convenient column that readily identifies the next item. With a modest data volume you may be able to ignore the performance impact of allowing the Cartesian join and filtering to the correct items in the final query. This may be impractical with a large amount of data. Remember that the result of a join is a derived table that has no index(es). The lower subquery will always join a derived table to a user table so the query may result in a lot of full table scans over a large amount of data.
Object inheritance defines the data types of the result. (This is the caveat mentioned above.) DB2 uses the data types of the objects in the query to establish the data types of the returned columns. In our example table "rec" the column "word" is defined as a VARCHAR(100), so DB2 creates a result set where the third column is a VARCHAR(100). If the resulting string is longer than the declaration DB2 will throw an error. Try it yourself by running the INSERT statement below and repeating the query.
Simply recasting the string in the upper subquery solves the problem.
WITH rquery (snum, wordnum, sentence)AS( SELECT base.snum, base.wordnum, CAST(base.word AS VARCHAR (200)) FROM rec base WHERE wordnum = 1 UNION ALL SELECT t1.snum, t1.wordnum, sentence || ' ' || t1.word FROM rquery t0, rec t1 WHERE t0. snum = t1. snum AND t0.wordnum + 1 = t1.wordnum)SELECT * FROM rquery rqWHERE rq.wordnum = (SELECT max(wordnum) FROM rquery WHERE snum = rq.snum);
Recursive SQL isn't for the faint of heart. It requires a good understanding of the user tables and data, as well as a comfort level with recursion. A programming background in any of the recursive languages (e.g., C++, PASCAL, etc.) is a tremendous help.
But it is extremely powerful. Master it and you can produce a recursive query in a few minutes that would take all day to write and debug in a non-recursive style.
Thank you for putting it together.
Big "Yes" vote above.
A very good article on DB2 recursive query. I also need to write a recursive query (in DB2) and while searching in Google I got this web page. This concept is very new to me. I got a requirement where I need to convert below rows to columns using recursive query..
Query result should be.. (rows will become columns)
Alabama Delaware Alaska
Just wanted to know your suggestion if this can be achieved using recursive query?