Breaking out a comma delimited field in SQL Server

I'm trying to Query an existing SQL server DB that contains voting information for a small polling app. The app is a classic pick up to x of 10 items from the checkboxlist. So there is no fixed number of items.

ie: Pick up to 3 colors you like

Red
Green
Blue
Orange
etc

The applications writes the choices in a comma delimited field ie
Date | Name | Choices
01/27/1999 | John Smith | Red,Blue
01/28/1999 | Jane Doe | Green, Orange, Pink
1/29/1999 | Bob Jones| Blue

I'm Trying to write a query that will sum the each of the options.


Choice | Total
Red | 22
Blue | 44
Pink | 77
etc.

Any help would be appreciated.

Michael


mcunnAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Patrick MatthewsCommented:
If you have a table with the colors...


SELECT c.ColorName, SUM(CASE WHEN v.Choices LIKE '%' + c.ColorName + '%' THEN 1 ELSE 0 END) AS Total
FROM Colors c, Votes v
GROUP BY c.ColorName
ORDER BY SUM(CASE WHEN v.Choices LIKE '%' + c.ColorName + '%' THEN 1 ELSE 0 END) DESC
0
Patrick MatthewsCommented:
ANother approach:

SELECT c.ColorName, COUNT(v.Choices) AS Total
FROM Colors c INNER JOIN
      Votes v ON v.Choices Like '%' + c.ColorName + '%'
GROUP BY c.ColorName
ORDER BY COUNT(v.Choices) DESC
0
mcunnAuthor Commented:
These solutions half solve the issue.  The problem is that I don't really have a color lookup table. The form is being generated by an XML template that I can't access from the SQL server, so all I have to work with to generate the color table that you propose are the comma delimited values themselves. I essentially need to split the items from each of the combined fields and generated the color table and thats where I'm getting stuck.

Michael



0
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

SharathData EngineerCommented:
>> I'm Trying to write a query that will sum the each of the options.
Do you have another table? How did you get your expected result? I don't understand the Total value for each Choice. Can you explain?

0
mcunnAuthor Commented:
I'm hoping to generate a color table at runtime  by combining all distinct values from the Choices column above. Once I can do that I can use "Like" statements as suggested by matthewspatrick.

0
HainKurtSr. System AnalystCommented:
instead of generating a color table, you can try to create choices table with one column, choice and put uniqe values so you can run (just another approach, if data is not huge)

select ChoiceName, count(*) as total
from choices
group by ChoiceName
order by Choicename
0
HainKurtSr. System AnalystCommented:
or  you can try similar to this

select ChoiceName, count(*) as total
from
(
  select GetChoice(Choices,1) as Choicename from Choicetable
UNION
  select GetChoice(Choices,2) from Choicetable
...
UNION
  select GetChoice(Choices,9) from Choicetable
UNION
  select GetChoice(Choices,10) from Choicetable
)
where ChoiceName<>""
group by ChoiceName
order by Choicename

where GetChoice (str, n) will be a matching function in SQL (like, split, Substr etc)
0
SharathData EngineerCommented:
create a function like this and call this function in your query to get your expected result.

select entry as Choices,COUNT(entry) as Total
from YourTable
cross apply dbo.split(Choices,',')
group by entry

CREATE FUNCTION [dbo].[split] 
       (@csv nvarchar(max), 
        @delim varchar(1))
        RETURNS @entries TABLE (entry nvarchar(100))
    AS BEGIN    
       DECLARE @commaindex int    
        SELECT @commaindex = CHARINDEX(@delim, @csv)    
            IF @commaindex > 0        
               BEGIN            
               INSERT INTO @entries -- insert left side            
               SELECT LTrim(RTrim(LEFT(@csv, @commaindex-1)))            -- pass right side recursively            
               UNION ALL            
               SELECT entry           
               FROM dbo.split(RIGHT(@csv, LEN(@csv) - @commaindex), @delim)           
               END    
          ELSE BEGIN            
               INSERT INTO @entries            
               SELECT LTrim(RTrim(@csv))       
               END    RETURN
           END
GO

Open in new window

0
brejkCommented:
There is an XML alternative for traditional splitting - check the snippet. Personally, I prefer CLR for tasks like splitting.
CREATE FUNCTION dbo.ufn_Split(@Input nvarchar(max), @Separator nchar(1))
RETURNS TABLE
AS
RETURN (
  WITH CTE AS 
  (
      SELECT
          CAST(
            N'<Value>' + 
            REPLACE(@Input, @Separator, N'</Value><Value>') + 
            N'</Value>' AS XML
          ) AS Items
  )
  SELECT
      Split.a.value('.', 'nvarchar(max)') AS Item
  FROM CTE
  CROSS APPLY Items.nodes('/Value') Split(a)
);
GO
 
SELECT * FROM dbo.ufn_Split('1,2,3,4',',')

Open in new window

0
Patrick MatthewsCommented:
mcunn,

If you know ahead of time what the possible colors are, you can modify my approach to the following:

SELECT c.ColorName, COUNT(v.Choices) AS Total
FROM Votes v INNER JOIN
      (SELECT 'Red' AS ColorName UNION
      SELECT 'Blue' AS ColorName UNION
      SELECT 'Green' AS ColorName UNION
      SELECT 'Orange' AS ColorName UNION
      SELECT 'Pink' AS ColorName UNION
      SELECT 'Yellow' AS ColorName UNION
      SELECT 'Black' AS ColorName UNION
      SELECT 'Blue' AS ColorName UNION
      SELECT 'Purple' AS ColorName UNION
      SELECT 'Mauve' AS ColorName UNION
      ) Colors c ON v.Choices Like '%' + c.ColorName + '%'
GROUP BY c.ColorName
ORDER BY COUNT(v.Choices) DESC



Regards,

Patrick
0
Mark WillsTopic AdvisorCommented:
OK, pretty cool stuff being posted, so have to dig deep into my box of tricks...

I created a #tmp table just to use for the purposes of a test as the data source posted above. It is not needed and you can replace #tmp with the real table name.





-- step 1 create a sample table used as a datasource...
 
create table #tmp (Date datetime, Name varchar(60), Choices varchar(100))
insert #tmp
select '01/27/1999','John Smith','Red,Blue' union all
select '01/28/1999','Jane Doe','Green, Orange, Pink' union all
select '1/29/1999','Bob Jones','Blue'
 
 
-- now lets run a query on that and return the results.
 
select colours,count(*)
from
(
   SELECT ltrim(SUBSTRING(choices, n, CHARINDEX(',', choices + ',',n) - n)) AS colours
   FROM #tmp
   CROSS JOIN (SELECT number FROM master..spt_values WHERE type = 'P')  AS Numbers(n)
   WHERE SUBSTRING(',' + choices, n, 1) = ','
   AND n < LEN(choices) + 1
) c
group by colours

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Mark WillsTopic AdvisorCommented:
oh, and to make it a bit better should really put a range (being min and max length of choices) on those numbers from the system / master..spt_values table :

CROSS JOIN (SELECT number FROM master..spt_values WHERE type = 'P' and number between 1 and 200)  AS Numbers(n)
   
0
Chris LuttrellSenior Database ArchitectCommented:
Cool solution Mark, I have not run accross that spt_values table before.  I now have a new tool in my toolbelt.
Thanks, Chris
0
Mark WillsTopic AdvisorCommented:
Thanks Chris, glad you found it useful. If you do want sequential numbers, you need to choose the 'P' types. There are a few more little gems locked away in SQL Server.
0
mcunnAuthor Commented:
Wow,

I see at least 2 good approaches guys. I think Mark's solution is exactly what I'm looking for but I'll need to plug it in to be sure. I'm on the road today but I'll try implementing them tomorrow.

Michael



0
SharathData EngineerCommented:

mcunn - You have many good solutions at hand. But a masterpiece from genius dominates all other solutions.
Personally I prefer to achieve the expected result in one query and if it is required, then only will go for UDF or loops or cursors.
Well done Mark for the good work to achieve this without any UDF.
0
Patrick MatthewsCommented:
mark,

That was, in a word, awesome!  I am still trying to wrap my head around it.  I would really appreciate it if you
could spare a few minutes to explain how it works :)

Regards,

Patrick
0
SharathData EngineerCommented:
for the benefit of all, here is the explanation.
select number from master..spt_values
-- 2506 records
-- Mark is interested only in numbers, so choose the filter condition type = 'P'
select number from master..spt_values where type = 'P'
-- 2048 records
-- If you cross join this result set with your table, you will get as many records as r * 2048 where r is the number of records in your table.
 
-- create this table for testing purpose and insert 3 records
create table #tmp (Date datetime, Name varchar(60), Choices varchar(100))
insert #tmp
select '01/27/1999','John Smith','Red,Blue' union all
select '01/28/1999','Jane Doe','Green, Orange, Pink' union all
select '1/29/1999','Bob Jones','Blue'
 
select * 
  from #tmp 
 cross join (select number from master..spt_values where type = 'P') t1
-- 6144 records
 
-- if you run the below query, you will get each character of choices column in a row
select *,SUBSTRING(',' + choices, n, 1) as sub_string
  from #tmp 
 cross join (select number from master..spt_values where type = 'P') Numbers(n)
-- Mark interested in records having comma in the sub_string column in the above query
-- so the below query will display only those records having n value as the comma position in the Choices column
select *
  from #tmp 
 cross join (select number from master..spt_values where type = 'P') Numbers(n)
 where SUBSTRING(',' + choices, n, 1) = ','
-- 6 records
-- with the below query, we can get the poisition of the next comma in the Choices string from the starting position "n"
select *,CHARINDEX(',', choices + ',',n) as char_index
  from #tmp 
 cross join (select number from master..spt_values where type = 'P') Numbers(n)
 where SUBSTRING(',' + choices, n, 1) = ','
 
-- if you delete n from char_index in the above query, you will get how many characters you want to pick after each comma
select *,CHARINDEX(',', choices + ',',n) - n how_many
  from #tmp 
 cross join (select number from master..spt_values where type = 'P') Numbers(n)
 where SUBSTRING(',' + choices, n, 1) = ','
 
-- Now, from the starting position as "n", pick the how_many characters in the above query
select *,substring(Choices,n,CHARINDEX(',', choices + ',',n) - n)
  from #tmp 
 cross join (select number from master..spt_values where type = 'P') Numbers(n)
 where SUBSTRING(',' + choices, n, 1) = ','
 
-- Now you have all the Choices seperated in each row.
 
-- I think, to avoid any error, Mark included the addiitonal filter "AND n < LEN(choices) + 1". Mark - correct me if i am wrong.

Open in new window

0
Patrick MatthewsCommented:
Sharath,

Thank you for the explanation.  I really do appreciate it!

Mark,

I am astounded.  This one gets bookmarked :)

Regards,

Patrick
0
mcunnAuthor Commented:
Mark,

It is indeed a work of art. I had to read Sharath's explaination to fully understand how it worked but it worked perfectly.

Thanks again.

Michael



0
mcunnAuthor Commented:
Perfect Solution To A Tricky Problem
0
Mark WillsTopic AdvisorCommented:
Gosh guys, many glowing comments, very much appreciated. And more importantly, very happy to have been of help.

Sharath has explained it all pretty well, and would be happy to explain any of the bits.

The check of the number value n needs to be within the data length of ','+choices because the value n is used in the substring, and what we are really doing is using the substring index essentially as a pointer to the delimiter so it can be extracted in the select statement.

If your strings are large than 2048, then the magic little table master..spt_values is not good enough, and you then need to create a numbers table, or start to look at other methods.

It is not overly efficient, but for this type of application it is extremely effective, and simple once you get your head around what it is actually doing...

I thought I had better start sharing some tricks because of all the excellent positings and interactions in this thread - it is great to see.

0
Mark WillsTopic AdvisorCommented:
Ummm... you do know that the "trick" with spt_values has been around for a while ?
0
SharathData EngineerCommented:

Yes Mark, today morning when i was thinking about this, I observed one limitation with this approach in the number of characters in that column. As you told, if the string exceeds 2048 characters, then you need to go for alternate approach.
0
mcunnAuthor Commented:
Ok this is referring to the choices contained in a single row,  correct?

0
SharathData EngineerCommented:
yes
0
Mark WillsTopic AdvisorCommented:
mcunn, if your "choices" column is bigger than 2048, please let me know - I can show you how to create a "numbers" table and simply use that instead. Actually looks even simpler with it.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server 2008

From novice to tech pro — start learning today.