Community Pick: Many members of our community have endorsed this article.
Editor's Choice: This article has been selected by our editors as an exceptional contribution.

Recursive SQL in DB2 (Converting rows to columns)

Kent OlsenData Warehouse / Database Architect
CERTIFIED EXPERT
Published:
Recursive SQL in UDB/LUW (you can use 'recursive' and 'SQL' in the same sentence)


A growing number of database queries lend themselves to recursive solutions.  It's not always easy to spot when recursion is called for, especially for people unaccustomed to the technique, but it's an extremely powerful tool.  When used properly it can solve problems that can't otherwise be easily solved.

There are many common definitions for recursion.  The one that perhaps describes it best with regards to SQL is, "defining an infinite statement using finite components".

Of course, understanding that statement is only slightly less difficult than understanding a recursive query.  (In plain English, the statement simply means "joining an arbitrary number of rows into a single result".)


The Problem (why would I want to use recursive SQL?)

It's often desirable to take data in consecutive rows and display it on a single row.  That's an integral part of modern SQL and happens all the time when two tables are joined.  Throw in a concatenation and items that were once in separate rows are in a single column in one row.  If you want to compress three rows to a single row, join twice.  If there is doubt that there will always be three rows a NULL result must be handled.  Use the COALESCE function to make sure that the optional data is handled correctly.

But what happens when there can be a large number of rows that need to be compressed to a single row?  The query has to be written to join each item.  If the query could return up to 20 rows, 19 joins need to be written into the query to put all of the results in a single row.

Even worse is the case where there is no limit on the number of possible rows.  Imagine a database that records documents by saving each word in the document in a separate row.  To reconstruct any sentence in the document the query must concatenate the contents of an arbitrary number of consecutive rows.  The available solutions are 1)  writing a stored procedure or function that loops through the rows and concatenates the contents into a single string;  2) write a query with a lot of joins and hope that no sentence is longer than the query allows; and 3) recursive SQL.

A more common (and practical) application would be to process a transaction file, concatenating the different status codes into a single string.  But practical application is your responsibility.  This article shows you how to get there.


The Solution (can you really do that?)

Recursive SQL solves the problem of "how many joins are necessary to produce the full result?" by repeatedly joining related items until there is no need for further joins.  The syntax takes a bit of study, but with a basic understanding of recursion, it's not difficult.

The problem of reconstructing a sentence from the separate words is an easy one to demonstrate.  First we need a table to contain each of the words.
CREATE TABLE rec 
                      (
                        snum      INTEGER,        -- sentence number
                        wordnum   INTEGER,        -- word number in the sentence
                        word      VARCHAR(100)    -- word being saved
                      );

Open in new window


Next, we'll need some data.
INSERT INTO rec 
                      VALUES (1, 1, 'This'), (1, 2, 'is'), (1, 3, 'a'), (1, 4, 'fine'), (1, 5, 'example');

Open in new window


A quick check of the data shows a sentence, stored as words:
SELECT * FROM rec;
                      
                      SNUM  WORDNUM    WORD
                      1     1          This
                      1     2          is
                      1     3          a
                      1     4          fine
                      1     5          example

Open in new window


Using traditional SQL, we could construct the sentence like this:
SELECT t0.word ||
                        COALESCE (' ' || t1.word, '') ||
                        COALESCE (' ' || t2.word, '') ||
                        COALESCE (' ' || t3.word, '') ||
                        COALESCE (' ' || t4.word, '') ||
                        COALESCE (' ' || t5.word, '') ||
                        COALESCE (' ' || t6.word, '')
                      FROM rec t0
                      LEFT JOIN rec t1
                        ON t0.snum = t1.snum
                       AND t1.wordnum = 2
                      LEFT JOIN rec t2
                        ON t0.snum = t2.snum
                       AND t2.wordnum = 3
                      LEFT JOIN rec t3
                        ON t0.snum = t3.snum
                       AND t3.wordnum = 4
                      LEFT JOIN rec t4
                        ON t0.snum = t4.snum
                       AND t4.wordnum = 5
                      LEFT JOIN rec t5
                        ON t0.snum = t5.snum
                       AND t5.wordnum = 6
                      LEFT JOIN rec t6
                        ON t0.snum = t6.snum
                       AND t6.wordnum = 7
                      WHERE t0.wordnum = 1;

Open in new window


It's ugly, and from a technical standpoint it's incomplete as it will only report the first 7 words of a sentence.  Note that a couple more joins were done than necessary to produce the sample sentence to demonstrate that the query must contain at least enough joins to rebuild the sentence and that the query must gracefully handle the NULL values that are generated when the join is meaningless.

A recursive solution is shorter and isn't limited by the number of joins that are coded.
WITH rquery (snum, wordnum, sentence)
                      AS
                      (
                        SELECT base.snum, base.wordnum, base.word 
                        FROM rec base 
                        WHERE wordnum = 1
                        
                        UNION ALL
                        
                        SELECT t1.snum, t1.wordnum, sentence || ' ' || t1.word
                        FROM rquery t0, rec t1
                        WHERE t0. snum = t1. snum
                          AND t0.wordnum + 1 = t1.wordnum
                      )
                      SELECT * 
                      FROM rquery;

Open in new window


Whether there is 1 word or 1,000 words in the sentence, that one recursive query will handle it. (There is one caveat -- it is explained below.)


How it works (I feel a headache coming on)

The query has two parts.  The upper subquery (lines 4-6) defines the starting point.  Because the goal of this query is to reconstruct a sentence from the separate words, we start by selecting the first word of the sentence (WHERE wordnum = 1).  Note that just as with all queries that use the WITH construct, the item list returned by the subquery must match the item list in the table definition.  rquery returns snum (sentence number), wordnum (word number), and sentence (the constructed string); therefore, the subquery returns the sentence number, word number (always 1), and the first word of the sentence.

The lower subquery (lines 10-13) is the recursive part of the solution.  It takes the results of the previous subquery and joins the next row to it.  The actual values returned from the subquery are the sentence number, word number, and the sentence with the next word appended to it.  The previously generated result (returned from rquery) is joined with the row in rec that contains the next word.

The diagram below shows how this works.  The upper subquery and each recursive call to the lower subquery generate 1 row.
SNUM  WORDNUM    SENTENCE
                      1     1          This
                      1     2          This is
                      1     3          This is a
                      1     4          This is a fine
                      1     5          This is a fine example

Open in new window


The upper subquery returns the first word (This).  The lower subquery joins the next word (is) and appends it to the sentence and returns the newly created row.  The recursion occurs when the lower subquery joins to rquery (the newly created row) and repeats the lower subquery.  Every time that the lower subquery executes, it appends the next word to the sentence.


The Results (it really does work!)

This recursive query returns all of the generated rows, not just the final one.  For a lot of applications this will be just fine, but for our purpose, we only want the complete sentence, not the partial sentences that were constructed during the recursion.  That's easily solved by selecting only the last row.
WITH rquery (snum, wordnum, sentence)
                      AS
                      (
                        SELECT base.snum, base.wordnum, base.word 
                        FROM rec base 
                        WHERE wordnum = 1
                        
                        UNION ALL
                        
                        SELECT t1.snum, t1.wordnum, sentence || ' ' || t1.word
                        FROM rquery t0, rec t1
                        WHERE t0. snum = t1. snum
                          AND t0.wordnum + 1 = t1.wordnum
                      )
                      SELECT * 
                      FROM rquery
                      WHERE wordnum = (SELECT max(wordnum) FROM rquery);

Open in new window


This returns only the desired row:
SNUM  WORDNUM    SENTENCE
                      1     5          This is a fine example

Open in new window



Expanding the Results (I've got lots of data to deal with!)

The example that we've used so far is very simple.  We have one sentence with a single starting value.  In the real world the table would likely contain a lot of sentences, so a small change to the SQL is required.  To see what happens when the table contains 2 sentences we need to add more data.
INSERT INTO rec 
                      VALUES (2, 1, 'This'), (2, 2, 'is'), (2, 3, 'another'), (2, 4, 'example');

Open in new window


We also need a slight change to the SQL to select the last line generated for each sentence.
WITH rquery (snum, wordnum, sentence)
                      AS
                      (
                        SELECT base.snum, base.wordnum, base.word 
                        FROM rec base 
                        WHERE wordnum = 1
                        
                        UNION ALL
                        
                        SELECT t1.snum, t1.wordnum, sentence || ' ' || t1.word
                        FROM rquery t0, rec t1
                        WHERE t0. snum = t1. snum
                          AND t0.wordnum + 1 = t1.wordnum
                      )
                      SELECT * 
                      FROM rquery rq
                      WHERE rq.wordnum = (SELECT max(wordnum) FROM rquery WHERE snum = rq.snum);

Open in new window


The result is that we now generate and select all of the sentences that are represented in the table.
SNUM  WORDNUM    SENTENCE
                      1     5          This is a fine example
                      2     4          This is another example

Open in new window




Gotchas (I didn't mean to do that!)

There are several places where it's easy to get tripped up using recursive SQL.  Stay mindful of the items mentioned here and you'll avoid most of the major pitfalls.

Select the correct starting row(s).  If the upper subquery doesn't select the correct initial value(s), the rest of the query is meaningless.  Always select the base item(s) in the upper subquery that will be built upon by the recursive queries.

Avoid infinite recursion.  If you structure your query so that there is no logical end point, DB2 will go along merrily joining the additional rows until it detects a stack overflow and throws an exception.  This is almost always caused by applying the wrong filter in the lower subquery.

Join items in the proper sequence whenever possible.  In our sentence example we joined words 1 and 2, then 2 and 3, then 3 and 4, etc.  But the data may not contain an incrementing counter.  The processing of a transaction log is a fine example.  The transaction's timestamp puts the transaction in order, but there isn't a convenient column that readily identifies the next item.  With a modest data volume you may be able to ignore the performance impact of allowing the Cartesian join and filtering to the correct items in the final query.  This may be impractical with a large amount of data.  Remember that the result of a join is a derived table that has no index(es).  The lower subquery will always join a derived table to a user table so the query may result in a lot of full table scans over a large amount of data.

Object inheritance defines the data types of the result. (This is the caveat mentioned above.)  DB2 uses the data types of the objects in the query to establish the data types of the returned columns.   In our example table "rec" the column "word" is defined as a VARCHAR(100), so DB2 creates a result set where the third column is a VARCHAR(100).  If the resulting string is longer than the declaration DB2 will throw an error.  Try it yourself by running the INSERT statement below and repeating the query.
INSERT INTO rec 
                      VALUES (3, 1, 'This'), (3, 2, 'example '), (3, 3, 'shows'), (3, 4, 'what'),
                             (3, 5, 'can'), (3, 6, 'happen'), (3, 7, 'when'), (3, 8, 'the'),
                             (3, 9, 'data'), (3, 10, 'represents'), (3, 11, 'a'), (3, 12, 'really'),
                             (3, 13, 'long'), (3, 14, 'line'), (3, 15, 'that'), (3, 16, 'overflows'),
                             (3, 17, 'the'), (3, 18, 'length'), (3, 19, 'of'), (3, 20, 'an'),
                             (3, 21, 'object');

Open in new window


Simply recasting the string in the upper subquery solves the problem.
WITH rquery (snum, wordnum, sentence)
                      AS
                      (
                        SELECT base.snum, base.wordnum, CAST(base.word AS VARCHAR (200)) 
                        FROM rec base 
                        WHERE wordnum = 1
                        
                        UNION ALL
                        
                        SELECT t1.snum, t1.wordnum, sentence || ' ' || t1.word
                        FROM rquery t0, rec t1
                        WHERE t0. snum = t1. snum
                          AND t0.wordnum + 1 = t1.wordnum
                      )
                      SELECT * 
                      FROM rquery rq
                      WHERE rq.wordnum = (SELECT max(wordnum) FROM rquery WHERE snum = rq.snum);

Open in new window



Conclusion (I thought that he'd never shut up!)

Recursive SQL isn't for the faint of heart.  It requires a good understanding of the user tables and data, as well as a comfort level with recursion.  A programming background in any of the recursive languages (e.g., C++, PASCAL, etc.) is a tremendous help.

But it is extremely powerful.  Master it and you can produce a recursive query in a few minutes that would take all day to write and debug in a non-recursive style.


Good Luck...
15
42,926 Views
Kent OlsenData Warehouse / Database Architect
CERTIFIED EXPERT

Comments (4)

CERTIFIED EXPERT
Author of the Year 2011
Top Expert 2006

Commented:
Excellent presentation.
Thank you for putting it together.

Big "Yes" vote above.
I will definitely be using this! Thanks!

Commented:
Very helpful. Thanks a lot!!!

Commented:
Hi,

A very good article on DB2 recursive query. I also need to write a recursive query (in DB2) and while searching in Google I got this web page. This concept is very new to me. I got a requirement where I need to convert below rows to columns using recursive query..

City_Name          State_Name
Montgomery       Alabama
Wilmington         Delaware
Birmingham        Alabama
Anchorage           Alaska
Dover                   Delaware

Query result should be.. (rows will become columns)

Alabama           Delaware      Alaska
Montgomery
Birmingham    
                           Wilmington
                           Dover
                                                  Anchorage          

Just wanted to know your suggestion if this can be achieved using recursive query?
Thanks

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.