Link to home
Start Free TrialLog in
Avatar of BAM
BAMFlag for United States of America

asked on

I need to merge several rows into one

I have a table that is like this:

Job Code Line Note
111 aaa  1    Short note
111 aaa  2    More note
111 aaa  3    and more
111 bbb  1    Note
111 bbb  2    text
111 bbb  3    more stuff
111 bbb  4    and some more
222 aaa  1    Tiny field
222 aaa  2    won't hold
222 aaa  3    all the data
222 aaa  4    so I need to
222 aaa  5    concatenate

Open in new window


I need to create a single row  for each group by concatenating the note field. The result should be:
Job Code Note
111 aaa  Short note More note and more
111 bbb  Note text more stuff and more stuff
222 aaa  Tiny field won't hold all the data so I need to concatenate

Open in new window


The solution should be obvious, but for me today it is not. Any help would be greatly appreciated.
Avatar of Éric Moreau
Éric Moreau
Flag of Canada image

You can use STUFF and FOR XML in TSQL and you will get your result. I have a sample of that at http://emoreau.com/Entries/Blogs/2009/10/SQL-Concatenate-values-multiple-from-multiple-lines.aspx
ASKER CERTIFIED SOLUTION
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
the problem with CLR is that it very often kills the performance
Eric,

the problem with CLR is that it very often kills the performance
I would encourage you to read the link I posted earlier.  The author has given a very balanced comparison of the different options and when you should use them.

My experience is actually quite the opposite.  T-SQL is lousy with string manipulation and CLR presents a very good alternative.  In this specific case, CLR proved to be a lot faster than using STUFF with FOR XML, which you have to admit is a tad contrived to say the least: Xml was never intended for this and to be quite frank you are relying on a cute trick.  Which brings me to my second point.  Compare these two pieces of code:

This is the code using STUFF() ... FOR XML PATH
SELECT  Job,
        Code,
        STUFF((SELECT   ',' + Note
               FROM     @YourTable i
               WHERE    i.Job = o.Job AND i.Code = o.Code
              FOR
               XML PATH('')
              ), 1, 1, '') AS Note
FROM    @YourTable o
GROUP BY
	Job,
        Code
ORDER BY
	Job,
	Code

Open in new window


Here is the code using GROUP_CONCAT():
SELECT  Job,
        Code,
        dbo.GROUP_CONCAT(Note) Note
FROM    @YourTable
GROUP BY
	Job,
        Code
ORDER BY
	Job, 
	code

Open in new window


Here is the rhetorical question I have why would you have to rethink/rewrite  all the code using STUFF() ... FOR XML PATH when you already have a canned function that does it all?

But I think we should let the author and future readers judge for themselves and pick whatever is best for them.  That in the end is the only part that counts.
I just made a test comparing the code from CodePlex (Anthony's link) and my pure SQL solution (using STUFF).

I created a table of 1,000,000 random entries.

To my great surprise, the CLR function took 3 seconds to run compared to a big 9 seconds for the pure SQL method. I will have to revisit that tomorrow when I will get more time.
I would not consider STUFF(  <subquery> FOR XML PATH  ) "pure" sql. Yes it's valid T-SQL but not "pure SQL"

It is a "contrivance" as suggested albeit a very handy one. I'm not a great fan of external code however as maintaining a valid configuration for every sql server gets more complicated.

Personally I would trade some speed (the CLR) for ease of maintenance (stuff et al).
Just my 2 cents worth.
Avatar of BAM

ASKER

I posted this just before leaving work on Friday, then came in Monday morning to a dead C: drive. I'll have a chance to read through this later today and try out the suggestions.
I would not consider STUFF(  <subquery> FOR XML PATH  ) "pure" sql. Yes it's valid T-SQL but not "pure SQL"
I disagree, it is quite "pure" SQL just simply not the right tool.

It is a "contrivance" as suggested albeit a very handy one.
And no different from using  something like this to retrieve the third sentence:
SELECT PARSENAME('This is the first sentence.This is the second sentence.This is the third sentence.This is the fourth sentence', 2)
And if you look on this site you will find some solutions that I posted and were accepted that used precisely this approach.  Were they the right solution, not really,  Just a cute trick.

Personally I would trade some speed (the CLR) for ease of maintenance (stuff et al).
But that is just it:  Using CLR you get speed and code that does not have to be re-invented every time and easier to maintain.  That to me means that it is the right tool for the job.
SQL as opposed to T-SQL it isn't "pure"  in that sense.

And by configuration/maintenance I mean making certain that the CLR code is installed and working (and is installed and working whenever you rebuild, upgrade hardware, move instances etc etc); I suggest that this aspect is not overlooked.
I agree.  But that is just one more of dozens of check boxes that need to be checked.
Avatar of BAM

ASKER

I went with this solution, which worked perfectly, for two reasons. It was very easy to implement and it was allegedly the faster way. My table had 9.8 million records so speed was a consideration in this case. Thanks to everyone who contributed. It was an interesting discussion as well as solving my problem.