Job Code Line Note111 aaa 1 Short note111 aaa 2 More note111 aaa 3 and more111 bbb 1 Note111 bbb 2 text111 bbb 3 more stuff111 bbb 4 and some more222 aaa 1 Tiny field222 aaa 2 won't hold222 aaa 3 all the data222 aaa 4 so I need to222 aaa 5 concatenate
I need to create a single row for each group by concatenating the note field. The result should be:
Job Code Note111 aaa Short note More note and more111 bbb Note text more stuff and more stuff222 aaa Tiny field won't hold all the data so I need to concatenate
the problem with CLR is that it very often kills the performance
Anthony Perkins
Eric,
the problem with CLR is that it very often kills the performance
I would encourage you to read the link I posted earlier. The author has given a very balanced comparison of the different options and when you should use them.
My experience is actually quite the opposite. T-SQL is lousy with string manipulation and CLR presents a very good alternative. In this specific case, CLR proved to be a lot faster than using STUFF with FOR XML, which you have to admit is a tad contrived to say the least: Xml was never intended for this and to be quite frank you are relying on a cute trick. Which brings me to my second point. Compare these two pieces of code:
This is the code using STUFF() ... FOR XML PATH
SELECT Job, Code, STUFF((SELECT ',' + Note FROM @YourTable i WHERE i.Job = o.Job AND i.Code = o.Code FOR XML PATH('') ), 1, 1, '') AS NoteFROM @YourTable oGROUP BY Job, CodeORDER BY Job, Code
Here is the rhetorical question I have why would you have to rethink/rewrite all the code using STUFF() ... FOR XML PATH when you already have a canned function that does it all?
But I think we should let the author and future readers judge for themselves and pick whatever is best for them. That in the end is the only part that counts.
I just made a test comparing the code from CodePlex (Anthony's link) and my pure SQL solution (using STUFF).
I created a table of 1,000,000 random entries.
To my great surprise, the CLR function took 3 seconds to run compared to a big 9 seconds for the pure SQL method. I will have to revisit that tomorrow when I will get more time.
PortletPaul
I would not consider STUFF( <subquery> FOR XML PATH ) "pure" sql. Yes it's valid T-SQL but not "pure SQL"
It is a "contrivance" as suggested albeit a very handy one. I'm not a great fan of external code however as maintaining a valid configuration for every sql server gets more complicated.
Personally I would trade some speed (the CLR) for ease of maintenance (stuff et al).
Just my 2 cents worth.
BAM
ASKER
I posted this just before leaving work on Friday, then came in Monday morning to a dead C: drive. I'll have a chance to read through this later today and try out the suggestions.
I would not consider STUFF( <subquery> FOR XML PATH ) "pure" sql. Yes it's valid T-SQL but not "pure SQL"
I disagree, it is quite "pure" SQL just simply not the right tool.
It is a "contrivance" as suggested albeit a very handy one.
And no different from using something like this to retrieve the third sentence:
SELECT PARSENAME('This is the first sentence.This is the second sentence.This is the third sentence.This is the fourth sentence', 2)
And if you look on this site you will find some solutions that I posted and were accepted that used precisely this approach. Were they the right solution, not really, Just a cute trick.
Personally I would trade some speed (the CLR) for ease of maintenance (stuff et al).
But that is just it: Using CLR you get speed and code that does not have to be re-invented every time and easier to maintain. That to me means that it is the right tool for the job.
PortletPaul
SQL as opposed to T-SQL it isn't "pure" in that sense.
And by configuration/maintenance I mean making certain that the CLR code is installed and working (and is installed and working whenever you rebuild, upgrade hardware, move instances etc etc); I suggest that this aspect is not overlooked.
Anthony Perkins
I agree. But that is just one more of dozens of check boxes that need to be checked.
I went with this solution, which worked perfectly, for two reasons. It was very easy to implement and it was allegedly the faster way. My table had 9.8 million records so speed was a consideration in this case. Thanks to everyone who contributed. It was an interesting discussion as well as solving my problem.