[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 462
  • Last Modified:

Concatenation with CTE Recursion

Hello all,
I am having a bit of trouble with a code that I have tailored.  The code runs fine when the table data that it pull back is less than 2500 records but I have a few tables that I need to grab multiple rows based on ssno, varid and then concatenate them with a ','.  The table rows with just a simple select * from table where condition will pull back around 480,000 records.  Now from those records found my statement should pull together what I need.  I let the query run for 1 plus hours and it still ran until I had to stop the execution.  I need help to one understand why this does not run on tables with higher records, two what query should I run to get my results.


 
WITH CTE ( SSNO, ODESC, VARID, VARTEXT, length )
     AS (SELECT DISTINCT SSNO,
                         'Accidents' AS ODESC,
                         VARID,
                         Cast(vartext AS VARCHAR(max)),
                         0
         FROM   DICTIONARY
         WHERE  STATUS = 'Y'
                AND SECTION = 'PAST'
                AND VARCODE = 'ACCIDENTS'
         GROUP  BY SSNO,
                   VARID,
                   VARTEXT
         UNION ALL
         SELECT P.SSNO,
                'Accidents' AS ODESC,
                P.Varid,
                Cast(C.VARTEXT + CASE
                                   WHEN length = 0 THEN ', '
                                   ELSE ', '
                                 END + P.VARTEXT AS VARCHAR(max)),
                length + 1
         FROM   CTE c
                INNER JOIN DICTIONARY p
                  ON p.SSNO = c.SSNO
         WHERE  p.varid < c.varid
                AND STATUS = 'Y'
                AND SECTION = 'PAST'
                AND VARCODE = 'ACCIDENTS')

SELECT SSNO,
       ODESC,
       VARTEXT
FROM   (SELECT SSNO,
               ODESC,
               VARTEXT,
               Rank() OVER ( PARTITION BY SSNO ORDER BY length DESC )
        FROM   CTE) D ( SSNO, ODESC, VARTEXT, rank )
WHERE  rank = 1  OPTION (MAXRECURSION 110)

Open in new window


Thanks
0
Sqlspider
Asked:
Sqlspider
  • 10
  • 7
  • 5
  • +2
2 Solutions
 
lluddenCommented:
Run the query in MSSM with Show Actual Execution Plan on a smaller, then progressively larger dataset.

Look for table scans, and any suspiciously large numbers of rewinds and rows returned.

It my just be an indexing issue - going from 2500 rows to 480K really magnifies indexing problems.
0
 
SqlspiderAuthor Commented:
Sorry if I mis-informed.  The cte works if the table data contains 2500 rows or less.  If the table has rows larger than 2500 the query just runs.  I will run query in mssm.
0
 
lluddenCommented:
It probably works if the table is larger, just that run time can scale geometrically, which, when you increase data size by 250x, can be a long time.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LowfatspreadCommented:
;WITH dic as (SELECT  SSNO,                
                 VARID,
                 Cast(vartext AS VARCHAR(max)) as vartext
         FROM   DICTIONARY
         WHERE  STATUS = 'Y'
                AND SECTION = 'PAST'
                AND VARCODE = 'ACCIDENTS'
         )
    ,CTE
     AS (SELECT  SSNO,                
                 VARID,
                 vartext,
                 0 as length
         FROM   DIC
         GROUP  BY SSNO,
                   VARID,
                   VARTEXT
         UNION ALL
         SELECT P.SSNO,
                P.Varid,
                C.VARTEXT + ', ' + P.VARTEXT ,
                length + 1
         FROM   CTE c
         INNER JOIN DIC p
            ON p.SSNO = c.SSNO
         WHERE  p.varid < c.varid
                )

SELECT SSNO, 'Accidents' AS ODESC,
       VARTEXT
FROM   (SELECT SSNO,
               VARTEXT,
               Rank() OVER ( PARTITION BY SSNO ORDER BY length DESC )
        FROM   CTE) D
WHERE  rank = 1
order by ssno
 OPTION (MAXRECURSION 110)

does this help?
do you have an index on dictionary of

ssn,varcode,section,status,varid include vartext

or preferably
a filtered index
ssn,varid include vartext
WHERE  STATUS = 'Y'
                AND SECTION = 'PAST'
                AND VARCODE = 'ACCIDENTS'
0
 
SqlspiderAuthor Commented:
I will try what you have and let you know.  Give me about an hour to run this.  Also by the way the only ix is on ssno and section.  I would have to speak to the dba to see if I could get an ix as you specified.
0
 
Scott PletcherSenior DBACommented:
I think you're building way more result rows than you need.

;WITH CTE ( SSNO, ODESC, VARID, VARTEXT, length )
     AS (SELECT DISTINCT SSNO,
                         'Accidents' AS ODESC,
                         MAX(VARID),
                         Cast(vartext AS VARCHAR(max)),
                         0
         FROM   DICTIONARY
         WHERE  STATUS = 'Y'
                AND SECTION = 'PAST'
                AND VARCODE = 'ACCIDENTS'
         GROUP  BY SSNO,
                   VARTEXT
         UNION ALL
         SELECT P.SSNO,
                'Accidents' AS ODESC,
                P.Varid,
                Cast(C.VARTEXT + CASE
                                   WHEN length = 0 THEN ', '
                                   ELSE ', '
                                 END + P.VARTEXT AS VARCHAR(max)),
                length + 1
         FROM   CTE c
                INNER JOIN DICTIONARY p
                  ON p.SSNO = c.SSNO AND
                     p.varid < c.varid)             

SELECT SSNO,
       ODESC,
       VARTEXT
FROM   (SELECT SSNO,
               ODESC,
               VARTEXT,
               Rank() OVER ( PARTITION BY SSNO ORDER BY length DESC )
        FROM   CTE) D ( SSNO, ODESC, VARTEXT, rank )
WHERE  rank = 1  OPTION (MAXRECURSION 110)

Open in new window

0
 
Scott PletcherSenior DBACommented:
Btw, I STRONGLY urge you to alias EVERY column in a query with multiple tables.  It may be clear to you which column is intended, but not for anyone not familiar with the code ... and it would likely help you out if you went back to the code six months from now :-)
SELECT P.SSNO,
                'Accidents' AS ODESC,
                P.Varid,
                Cast(C.VARTEXT + CASE
                                   WHEN length = 0 THEN ', ' --<<-- add alias to length
                                   ELSE ', '
                                 END + P.VARTEXT AS VARCHAR(max)),
                length + 1 --<<-- add alias to length
         FROM   CTE c
                INNER JOIN DICTIONARY p
                  ON p.SSNO = c.SSNO AND
                     p.varid < c.varid)

Open in new window

0
 
Scott PletcherSenior DBACommented:
CORRECTION:
D'OH, sorry, when I adjusted the join clause, I left off some conditions you might still need:

...
UNION ALL
         SELECT P.SSNO,
                'Accidents' AS ODESC,
                P.Varid,
                Cast(C.VARTEXT + CASE
                                   WHEN length = 0 THEN ', '
                                   ELSE ', '
                                 END + P.VARTEXT AS VARCHAR(max)),
                length + 1
         FROM   CTE c
                INNER JOIN DICTIONARY p
                  ON p.SSNO = c.SSNO AND
                     p.varid < c.varid         
           WHERE  
                          p.STATUS = 'Y'
                 AND p.SECTION = 'PAST'
                 AND p.VARCODE = 'ACCIDENTS')
...

Open in new window

0
 
LowfatspreadCommented:
@scott isn't the caveat with yours that you would loose rows if multiple  max(varid) exist for an ssno ?


@sqlspider can you clarify what the key to the dictionary table is ?
 also was that a combine ssno section index or 2 indexes?

0
 
Scott PletcherSenior DBACommented:
UPDATED VERSION.
Sorry, didn't see any sample data, so no way for me to test it:

The basic idea is to avoid concatenating once for every varid, but instead to do it once per ssno.
;WITH CTE ( SSNO, ODESC, VARID, VARTEXT )
     AS (SELECT SSNO,
                'Accidents' AS ODESC,
                MAX(VARID),
                Cast('' AS VARCHAR(max)),
                0
         FROM   DICTIONARY
         WHERE  STATUS = 'Y'
                AND SECTION = 'PAST'
                AND VARCODE = 'ACCIDENTS'
         GROUP  BY SSNO
         UNION ALL
         SELECT P.SSNO,
                'Accidents' AS ODESC,
                P.Varid,
                Cast(C.VARTEXT + CASE WHEN C.VARTEXT = '' THEN '' ELSE ', ' END +
                     P.VARTEXT AS VARCHAR(max))
         FROM   CTE c
         INNER JOIN DICTIONARY p
            ON p.SSNO = c.SSNO
         WHERE  p.STATUS = 'Y'
                AND p.SECTION = 'PAST'
                AND p.VARCODE = 'ACCIDENTS' 
         )             

SELECT SSNO,
       ODESC,
       VARTEXT
FROM   (SELECT SSNO,
               ODESC,
               VARTEXT,
               ROW_NUMBER() OVER ( PARTITION BY SSNO ORDER BY LEN(VARTEXT) DESC ) AS rank
        FROM   CTE) D ( SSNO, ODESC, VARTEXT, rank )
WHERE  rank = 1  OPTION (MAXRECURSION 110)

Open in new window

0
 
Scott PletcherSenior DBACommented:
>> @scott isn't the caveat with yours that you would loose rows if multiple  max(varid) exist for an ssno ? <<

Yes, with the prior code that would be true ... have to admit, I didn't think there would be duplicate ids.

I think the last version I posted takes care of that, and the GROUP BY issue as well.

Again, it's very hard to tell for sure, without being able to run the code on some sample data.
0
 
SqlspiderAuthor Commented:
Sorry about that ScottPletcher, I was not able to make dummy data for you guys to test.  Let me put something together for you.  Also Lowfatspread that is a combined ix non clustered on ssno-section.
0
 
Scott PletcherSenior DBACommented:
>> The basic idea is to avoid concatenating once for every varid, but instead to do it once per ssno. <<

That is, if a given SSNO has 2500 varids, I *think* the original code would do something like this:

for the first varid, varid = 1, output:  varid vartext1.  (no concat needed)
for 2, output:  text2 + text1; text1. (2 rows output)
for 3, output:  3 + 2 + 1; 2 + 1; 1. (3 rows output)
...(nnnn rows output)
for 2499: output 2499 + 2498 +2497 + ... + 1; ... (2499 rows output)
for 2500: output 2500 + 2549 + 2548 + ... + 1; /* this is the only one you need */, ... (2500 rows output)

My code attempts to output only the last set of values, since that's the only one you need.

Overall, hopefully this can speed it up quite a bit.
0
 
Anthony PerkinsCommented:
>>I need help to one understand why this does not run on tables with higher records<<
But to answer your original question, I suspect this article should tell you why using recursion with CTE's is a bad idea:
http://www.sqlservercentral.com/articles/T-SQL/74118/
0
 
Scott PletcherSenior DBACommented:
That article states that a recursive CTE is not a good method to *count*.  

I certainly agree with that, but don't see any relevance to *concatenation*.
0
 
LowfatspreadCommented:
@sqlspider can we have clarity on the key of the dictionary table...

to me your initial use of rank() rather than row_number() implies that you could (and wish to see) have multiple rows per ssno...

@scott its not clear to me how you can use len(vartext) in the row_number() ... wouldn't that be dependant on the presence of nulls in the data? thus not guaranteeing the same result as rank() (and multiple rank()=1 are possible)


@acperkins i assume the point is counting is "the same" as concatenation cf length+1


0
 
Scott PletcherSenior DBACommented:
If there can be NULLs, they need to be dealt with in:

 Cast(C.VARTEXT + CASE
                                   WHEN length = 0 THEN ', '
                                   ELSE ', '
                                 END + P.VARTEXT AS VARCHAR(max)),

Otherwise any NULL will force the entire string to be NULL from then on.

The desired result is the longest one, i.e., the one with *all* the concatenations in it.
0
 
SqlspiderAuthor Commented:
Sorry gang, I have been out sick.  Thanks for the continued support on this topic.  

@sqlspider can we have clarity on the key of the dictionary table...

"The primary key is on Varid which is numeric(18,0),not null"

to me your initial use of rank() rather than row_number() implies that you could (and wish to see) have multiple rows per ssno...

"to answer this question it is my lack of knowledge I would say on writting the sql to get the proper results.  I did another cte that uses row_number() to show me the ssno,vartext and then put in proper order the varid. This would then ensure that the concatenation would come out in order.
What I truly want to see is just one iteration of ssno with all the varids associated to the ssno.

@scott its not clear to me how you can use len(vartext) in the row_number() ... wouldn't that be dependant on the presence of nulls in the data? thus not guaranteeing the same result as rank() (and multiple rank()=1 are possible)

"ssno is not null, varid is not null but vartext can have nulls"


@acperkins i assume the point is counting is "the same" as concatenation cf length+1Ok first the primary key is
0
 
Scott PletcherSenior DBACommented:
If vartext can have NULLs,  you need an adjustment to your concat code:
...
Cast(C.VARTEXT + CASE WHEN C.VARTEXT = '' THEN '' ELSE ', ' END +
                     COALESCE(P.VARTEXT, '') AS VARCHAR(max))
...
0
 
LowfatspreadCommented:
more like

cast(COALESCE(C.VARTEXT ,'')+coalesce(CASE WHEN coalesce(C.VARTEXT,'') = '' THEN '' ELSE ', ' END+p.vartext,'') AS VARCHAR(max))

?


0
 
Scott PletcherSenior DBACommented:
Not sure that would be necessary if the initial concat value was set to '' -- as I did in my code -- and you never allowed the concat value to become NULL.

But other than some slight overhead, it doesn't hurt anything.
0
 
SqlspiderAuthor Commented:
Ok .  I need to know this.  ScottPletcher I tried the sql you updated.  It ran faster than mine but the result was 10 hours versus my 12 hours.  So I tried this sql.
 
; WITH [CON] 
    AS (SELECT DISTINCT varid,SSNO,
						VARTEXT,
						'ACCIDENTS' ODESC,
						ROW_NUMBER() OVER ( PARTITION BY SSNO ORDER BY varid ) AS ROWNUM
           FROM DICTIONARY
           WHERE STATUS='Y' AND SECTION='PAST' AND VARCODE='ACCIDENTS'
           )
                   
    SELECT SSNO,
		   ODESC,
		   (MAX( CASE ROWNUM WHEN 1 THEN VARTEXT ELSE ', ' END ) + 
           MAX( CASE ROWNUM WHEN 2 THEN VARTEXT ELSE ', ' END ) + '' +
           MAX( CASE ROWNUM WHEN 3 THEN VARTEXT ELSE ', ' END ) + '' +
           MAX( CASE ROWNUM WHEN 4 THEN VARTEXT ELSE ', ' END )) 
       FROM [CON] 
       GROUP BY SSNO,ODESC
       order by ssno

Open in new window


This script ran a total of 1min 55 secs for 57645 records.  The only problem is I can not get the comma's to act correctly.

Ex.  SSNO      ODESC      VARTEXT
1001604      Accidents      10/06 MVA: rt femoral shaft fracture,bilateral pulmonary contusion,pneumothoraces2007 benign abd mass,

I can not get the comma not to show when there is only one vartext .  Any ideas.

The other query is my ideal way to run but it takes way to long to pull back results.
0
 
LowfatspreadCommented:
you mean this ?

; WITH [CON]
    AS (SELECT DISTINCT varid,SSNO,vartext from dictionary
           WHERE STATUS='Y' AND SECTION='PAST' AND VARCODE='ACCIDENTS')
 ,con1 as (select x.*
             , ROW_NUMBER() OVER ( PARTITION BY SSNO ORDER BY varid ) AS ROWNUM
           from con as x
          )
 
                   
    SELECT SSNO,'Accidents' as odesc
      .coalesce([1],'')
        +coalesce(', '+[2].'')
+coalesce(', '+[3].'')
+coalesce(', '+[4].'')

       FROM con1 as x
       pivot (max(vartext) for rownum in ([1],[2],[3],[4])) as pvt
       
       order by ssno
0
 
SqlspiderAuthor Commented:
Pretty cool lowfatspread did not even think of that way.  Ok I will try this tomorrow.  I need to split this solution scoring between you and Scott.  Your ideas have helped greatly.  I really hate having to do it the second way because I have to know how many row numbers and make the adjustments accordingly since this script will run against several varcodes in that table. My dba was ready to hang me for the other scripts I was running due to the cost on the server.  But again thanks for sticking with me on this one guys.  I will do the scoring in the morning after I run your script Lowfatspread.  Hope to speak with you guys again and i maybe able to help.

Peace to you both
0
 
SqlspiderAuthor Commented:
Thanks to both Lowfatspread and ScottPletcher.  I ended up doing the suggestion from Lowfatspread since it ran the quickest.  The other query worked but ran for a total of 12 hours.  Thank you for all the help
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 10
  • 7
  • 5
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now