Recursive SQL in DB2 (Converting rows to columns)

AID: 3618
  • Status: Published

16010 points

  • ByKdo
  • TypeTutorial
  • Posted on2010-08-25 at 12:19:19
Awards
  • Community Pick
  • Experts Exchange Approved
  • Editor's Choice
Recursive SQL in UDB/LUW (you can use 'recursive' and 'SQL' in the same sentence)


A growing number of database queries lend themselves to recursive solutions.  It's not always easy to spot when recursion is called for, especially for people unaccustomed to the technique, but it's an extremely powerful tool.  When used properly it can solve problems that can't otherwise be easily solved.

There are many common definitions for recursion.  The one that perhaps describes it best with regards to SQL is, "defining an infinite statement using finite components".

Of course, understanding that statement is only slightly less difficult than understanding a recursive query.  (In plain English, the statement simply means "joining an arbitrary number of rows into a single result".)


The Problem (why would I want to use recursive SQL?)

It's often desirable to take data in consecutive rows and display it on a single row.  That's an integral part of modern SQL and happens all the time when two tables are joined.  Throw in a concatenation and items that were once in separate rows are in a single column in one row.  If you want to compress three rows to a single row, join twice.  If there is doubt that there will always be three rows a NULL result must be handled.  Use the COALESCE function to make sure that the optional data is handled correctly.

But what happens when there can be a large number of rows that need to be compressed to a single row?  The query has to be written to join each item.  If the query could return up to 20 rows, 19 joins need to be written into the query to put all of the results in a single row.

Even worse is the case where there is no limit on the number of possible rows.  Imagine a database that records documents by saving each word in the document in a separate row.  To reconstruct any sentence in the document the query must concatenate the contents of an arbitrary number of consecutive rows.  The available solutions are 1)  writing a stored procedure or function that loops through the rows and concatenates the contents into a single string;  2) write a query with a lot of joins and hope that no sentence is longer than the query allows; and 3) recursive SQL.

A more common (and practical) application would be to process a transaction file, concatenating the different status codes into a single string.  But practical application is your responsibility.  This article shows you how to get there.


The Solution (can you really do that?)

Recursive SQL solves the problem of "how many joins are necessary to produce the full result?" by repeatedly joining related items until there is no need for further joins.  The syntax takes a bit of study, but with a basic understanding of recursion, it's not difficult.

The problem of reconstructing a sentence from the separate words is an easy one to demonstrate.  First we need a table to contain each of the words.
CREATE TABLE rec 
(
  snum      INTEGER,        -- sentence number
  wordnum   INTEGER,        -- word number in the sentence
  word      VARCHAR(100)    -- word being saved
);
                                    
1:
2:
3:
4:
5:
6:

Select allOpen in new window



Next, we'll need some data.
INSERT INTO rec 
VALUES (1, 1, 'This'), (1, 2, 'is'), (1, 3, 'a'), (1, 4, 'fine'), (1, 5, 'example');
                                    
1:
2:

Select allOpen in new window



A quick check of the data shows a sentence, stored as words:
SELECT * FROM rec;

SNUM  WORDNUM    WORD
1     1          This
1     2          is
1     3          a
1     4          fine
1     5          example
                                    
1:
2:
3:
4:
5:
6:
7:
8:

Select allOpen in new window



Using traditional SQL, we could construct the sentence like this:
SELECT t0.word ||
  COALESCE (' ' || t1.word, '') ||
  COALESCE (' ' || t2.word, '') ||
  COALESCE (' ' || t3.word, '') ||
  COALESCE (' ' || t4.word, '') ||
  COALESCE (' ' || t5.word, '') ||
  COALESCE (' ' || t6.word, '')
FROM rec t0
LEFT JOIN rec t1
  ON t0.snum = t1.snum
 AND t1.wordnum = 2
LEFT JOIN rec t2
  ON t0.snum = t2.snum
 AND t2.wordnum = 3
LEFT JOIN rec t3
  ON t0.snum = t3.snum
 AND t3.wordnum = 4
LEFT JOIN rec t4
  ON t0.snum = t4.snum
 AND t4.wordnum = 5
LEFT JOIN rec t5
  ON t0.snum = t5.snum
 AND t5.wordnum = 6
LEFT JOIN rec t6
  ON t0.snum = t6.snum
 AND t6.wordnum = 7
WHERE t0.wordnum = 1;
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:

Select allOpen in new window



It's ugly, and from a technical standpoint it's incomplete as it will only report the first 7 words of a sentence.  Note that a couple more joins were done than necessary to produce the sample sentence to demonstrate that the query must contain at least enough joins to rebuild the sentence and that the query must gracefully handle the NULL values that are generated when the join is meaningless.

A recursive solution is shorter and isn't limited by the number of joins that are coded.
WITH rquery (snum, wordnum, sentence)
AS
(
  SELECT base.snum, base.wordnum, base.word 
  FROM rec base 
  WHERE wordnum = 1
  
  UNION ALL
  
  SELECT t1.snum, t1.wordnum, sentence || ' ' || t1.word
  FROM rquery t0, rec t1
  WHERE t0. snum = t1. snum
    AND t0.wordnum + 1 = t1.wordnum
)
SELECT * 
FROM rquery;
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:

Select allOpen in new window



Whether there is 1 word or 1,000 words in the sentence, that one recursive query will handle it. (There is one caveat -- it is explained below.)


How it works (I feel a headache coming on)

The query has two parts.  The upper subquery (lines 4-6) defines the starting point.  Because the goal of this query is to reconstruct a sentence from the separate words, we start by selecting the first word of the sentence (WHERE wordnum = 1).  Note that just as with all queries that use the WITH construct, the item list returned by the subquery must match the item list in the table definition.  rquery returns snum (sentence number), wordnum (word number), and sentence (the constructed string); therefore, the subquery returns the sentence number, word number (always 1), and the first word of the sentence.

The lower subquery (lines 10-13) is the recursive part of the solution.  It takes the results of the previous subquery and joins the next row to it.  The actual values returned from the subquery are the sentence number, word number, and the sentence with the next word appended to it.  The previously generated result (returned from rquery) is joined with the row in rec that contains the next word.

The diagram below shows how this works.  The upper subquery and each recursive call to the lower subquery generate 1 row.
SNUM  WORDNUM    SENTENCE
1     1          This
1     2          This is
1     3          This is a
1     4          This is a fine
1     5          This is a fine example
                                    
1:
2:
3:
4:
5:
6:

Select allOpen in new window



The upper subquery returns the first word (This).  The lower subquery joins the next word (is) and appends it to the sentence and returns the newly created row.  The recursion occurs when the lower subquery joins to rquery (the newly created row) and repeats the lower subquery.  Every time that the lower subquery executes, it appends the next word to the sentence.


The Results (it really does work!)

This recursive query returns all of the generated rows, not just the final one.  For a lot of applications this will be just fine, but for our purpose, we only want the complete sentence, not the partial sentences that were constructed during the recursion.  That's easily solved by selecting only the last row.
WITH rquery (snum, wordnum, sentence)
AS
(
  SELECT base.snum, base.wordnum, base.word 
  FROM rec base 
  WHERE wordnum = 1
  
  UNION ALL
  
  SELECT t1.snum, t1.wordnum, sentence || ' ' || t1.word
  FROM rquery t0, rec t1
  WHERE t0. snum = t1. snum
    AND t0.wordnum + 1 = t1.wordnum
)
SELECT * 
FROM rquery
WHERE wordnum = (SELECT max(wordnum) FROM rquery);
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:

Select allOpen in new window



This returns only the desired row:
SNUM  WORDNUM    SENTENCE
1     5          This is a fine example
                                    
1:
2:

Select allOpen in new window




Expanding the Results (I've got lots of data to deal with!)

The example that we've used so far is very simple.  We have one sentence with a single starting value.  In the real world the table would likely contain a lot of sentences, so a small change to the SQL is required.  To see what happens when the table contains 2 sentences we need to add more data.
INSERT INTO rec 
VALUES (2, 1, 'This'), (2, 2, 'is'), (2, 3, 'another'), (2, 4, 'example');
                                    
1:
2:

Select allOpen in new window



We also need a slight change to the SQL to select the last line generated for each sentence.
WITH rquery (snum, wordnum, sentence)
AS
(
  SELECT base.snum, base.wordnum, base.word 
  FROM rec base 
  WHERE wordnum = 1
  
  UNION ALL
  
  SELECT t1.snum, t1.wordnum, sentence || ' ' || t1.word
  FROM rquery t0, rec t1
  WHERE t0. snum = t1. snum
    AND t0.wordnum + 1 = t1.wordnum
)
SELECT * 
FROM rquery rq
WHERE rq.wordnum = (SELECT max(wordnum) FROM rquery WHERE snum = rq.snum);
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:

Select allOpen in new window



The result is that we now generate and select all of the sentences that are represented in the table.
SNUM  WORDNUM    SENTENCE
1     5          This is a fine example
2     4          This is another example
                                    
1:
2:
3:

Select allOpen in new window




Gotchas (I didn't mean to do that!)

There are several places where it's easy to get tripped up using recursive SQL.  Stay mindful of the items mentioned here and you'll avoid most of the major pitfalls.

Select the correct starting row(s).  If the upper subquery doesn't select the correct initial value(s), the rest of the query is meaningless.  Always select the base item(s) in the upper subquery that will be built upon by the recursive queries.

Avoid infinite recursion.  If you structure your query so that there is no logical end point, DB2 will go along merrily joining the additional rows until it detects a stack overflow and throws an exception.  This is almost always caused by applying the wrong filter in the lower subquery.

Join items in the proper sequence whenever possible.  In our sentence example we joined words 1 and 2, then 2 and 3, then 3 and 4, etc.  But the data may not contain an incrementing counter.  The processing of a transaction log is a fine example.  The transaction's timestamp puts the transaction in order, but there isn't a convenient column that readily identifies the next item.  With a modest data volume you may be able to ignore the performance impact of allowing the Cartesian join and filtering to the correct items in the final query.  This may be impractical with a large amount of data.  Remember that the result of a join is a derived table that has no index(es).  The lower subquery will always join a derived table to a user table so the query may result in a lot of full table scans over a large amount of data.

Object inheritance defines the data types of the result. (This is the caveat mentioned above.)  DB2 uses the data types of the objects in the query to establish the data types of the returned columns.   In our example table "rec" the column "word" is defined as a VARCHAR(100), so DB2 creates a result set where the third column is a VARCHAR(100).  If the resulting string is longer than the declaration DB2 will throw an error.  Try it yourself by running the INSERT statement below and repeating the query.
INSERT INTO rec 
VALUES (3, 1, 'This'), (3, 2, 'example '), (3, 3, 'shows'), (3, 4, 'what'),
       (3, 5, 'can'), (3, 6, 'happen'), (3, 7, 'when'), (3, 8, 'the'),
       (3, 9, 'data'), (3, 10, 'represents'), (3, 11, 'a'), (3, 12, 'really'),
       (3, 13, 'long'), (3, 14, 'line'), (3, 15, 'that'), (3, 16, 'overflows'),
       (3, 17, 'the'), (3, 18, 'length'), (3, 19, 'of'), (3, 20, 'an'),
       (3, 21, 'object');
                                    
1:
2:
3:
4:
5:
6:
7:

Select allOpen in new window



Simply recasting the string in the upper subquery solves the problem.
WITH rquery (snum, wordnum, sentence)
AS
(
  SELECT base.snum, base.wordnum, CAST(base.word AS VARCHAR (200)) 
  FROM rec base 
  WHERE wordnum = 1
  
  UNION ALL
  
  SELECT t1.snum, t1.wordnum, sentence || ' ' || t1.word
  FROM rquery t0, rec t1
  WHERE t0. snum = t1. snum
    AND t0.wordnum + 1 = t1.wordnum
)
SELECT * 
FROM rquery rq
WHERE rq.wordnum = (SELECT max(wordnum) FROM rquery WHERE snum = rq.snum);
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:

Select allOpen in new window




Conclusion (I thought that he'd never shut up!)

Recursive SQL isn't for the faint of heart.  It requires a good understanding of the user tables and data, as well as a comfort level with recursion.  A programming background in any of the recursive languages (e.g., C++, PASCAL, etc.) is a tremendous help.

But it is extremely powerful.  Master it and you can produce a recursive query in a few minutes that would take all day to write and debug in a non-recursive style.


Good Luck...
Asked On
2010-08-25 at 12:19:19ID3618
Tags

DB2

,

UDB

,

databases

,

recursion

,

SQL

Topic

DB2 Database

Views
4068

Comments

Expert Comment

by: younghv on 2010-09-18 at 14:36:20ID: 19601

Excellent presentation.
Thank you for putting it together.

Big "Yes" vote above.

Expert Comment

by: MilburnDrysdale on 2010-10-25 at 09:26:57ID: 20782

I will definitely be using this! Thanks!

Add your Comment

Please Sign up or Log in to comment on this article.

Join Experts Exchange Today

Gain Access to all our Tech Resources

Get personalized answers

Ask unlimited questions

Access Proven Solutions

Search 3.2 million solutions

Read In-Depth How-To Guides

1000+ articles, demos, & tips

Watch Step by Step Tutorials

Learn direct from top tech pros

And Much More!

Your complete tech resource

See Plans and Pricing

30-day free trial. Register in 60 seconds.

Loading Advertisement...

Top DB2 Experts

  1. Kdo

    88,178

    Master

    0 points yesterday

    Profile
    Rank: Genius
  2. momi_sabag

    55,709

    Master

    0 points yesterday

    Profile
    Rank: Genius
  3. sathyaram_s

    31,168

    0 points yesterday

    Profile
  4. Gary_The_IT_Pro

    23,040

    0 points yesterday

    Profile
    Rank: Genius
  5. daveslash

    19,618

    0 points yesterday

    Profile
    Rank: Sage
  6. mustaccio

    11,996

    0 points yesterday

    Profile
    Rank: Master
  7. tliotta

    9,718

    0 points yesterday

    Profile
    Rank: Genius
  8. murphey2

    8,650

    0 points yesterday

    Profile
    Rank: Wizard
  9. TomasHelgi

    6,550

    0 points yesterday

    Profile
    Rank: Sage
  10. _b_h

    4,400

    0 points yesterday

    Profile
    Rank: Wizard
  11. sdstuber

    4,000

    0 points yesterday

    Profile
    Rank: Genius
  12. woolmilkporc

    3,850

    0 points yesterday

    Profile
    Rank: Genius
  13. mwvisa1

    2,800

    0 points yesterday

    Profile
    Rank: Genius
  14. sjef_bosman

    2,664

    0 points yesterday

    Profile
    Rank: Genius
  15. Cenjoy100

    2,600

    0 points yesterday

    Profile
    Rank: Master
  16. dvz

    2,000

    0 points yesterday

    Profile
    Rank: Sage
  17. dtodd

    2,000

    0 points yesterday

    Profile
    Rank: Genius
  18. ggzfab

    2,000

    0 points yesterday

    Profile
  19. James0628

    1,853

    0 points yesterday

    Profile
    Rank: Genius
  20. sammySeltzer

    1,600

    0 points yesterday

    Profile
    Rank: Genius
  21. BCUNNEY

    1,500

    0 points yesterday

    Profile
    Rank: Guru
  22. DaveBaldwin

    1,500

    0 points yesterday

    Profile
    Rank: Genius
  23. tapanpattanaik

    1,400

    0 points yesterday

    Profile
    Rank: Sage
  24. AngocA

    1,400

    0 points yesterday

    Profile
  25. sjm_ee

    1,350

    0 points yesterday

    Profile
    Rank: Wizard

Hall Of Fame