T-SQL:  Normalized data to a single comma delineated string and back

Jim HornSQL Server Data Dude
CERTIFIED EXPERT
Microsoft SQL Server database developer, architect, and author specializing in Business Intelligence, ETL, and Data Warehousing.
Published:
Many times as a report developer I've been asked to display normalized data such as three rows with values Jack, Joe, and Bob as a single comma-separated string such as 'Jack, Joe, Bob', and vice versa.  Here's how to do it. 
Have you ever built reports and had major real-estate issues where business owners wanted as much information on a single page as possible, with a minimum font size so you don't need a microscope to read it?  I've been in this situation a number of times, and a common battle is parent-child situations where the child is a sub-report (Table object, tablix control, etc) that can have multiple values, which causes sizing problems for the entire report.  A more cosmetically pleasing solution is to have a column for the parent and a column for the children that is one value with values separated by commas.   The code for this is not easy, but here's a working solution. 

Our starting point will be some sample data of movie characters, rock stars, and cartoon characters. 
 
-- Create and populate tables used in this demo
                      IF EXISTS(SELECT name FROM sys.tables WHERE name='customer')
                         DROP TABLE customer
                      GO
                      
                      IF EXISTS(SELECT name FROM sys.tables WHERE name='customer_type')
                         DROP TABLE customer_type
                      GO
                      
                      -- Create the tables
                      CREATE TABLE customer_type(
                         id int PRIMARY KEY NOT NULL, 
                         type varchar(50)) 
                      CREATE TABLE customer (
                         id int identity(1,1) PRIMARY KEY NOT NULL, 
                         ct_id int REFERENCES customer_type(id), 
                         name varchar(50)) 
                      
                      ALTER TABLE customer
                      ADD CONSTRAINT fk_customer_customer_type
                      FOREIGN KEY (ct_id) 
                      REFERENCES customer_type(id) 
                      
                      INSERT INTO customer_type (id, type) 
                      VALUES (1, 'Movie Characters'), (2, 'Rock Stars'), (3, 'Cartoon Characters')
                      
                      INSERT INTO customer (ct_id, name ) 
                      VALUES 
                         (1, 'Johnny B. Goode'), (1,'Buckaroo Banzai'), (1,'Marty McFly'),
                         (2, 'Rick Springfield'), (2,'Bono'), (2,'Sammy Hagar'),
                         (3, 'Wile E. Coyote'), (3,'Officer Barbrady'), (3,'Sweet Polly Purebred'),  (3,'Bart Simpson') 
                      
                      -- Show the tables
                      SELECT * FROM customer
                      SELECT * FROM customer_type

Open in new window



first-starting-point.jpgNormalized data to comma delineated string


For starters, here's a query that uses FOR XML PATH that takes all column values in a table and returns a single string separated with a comma-space, but this throws in a comma space in the beginning.  

xml-path.jpg
Below is the same query as above, with STUFF used to remove the first comma space.

xml-path-with-stuff.jpg
Below is the completed query, which places the above FOR XML PATH into a subquery that is correlated on a main query of DISTINCT ct.type in order to return only the names for that type. 

comma-delineated-values-string.jpgUsing this methodology you can now include single comma-separated values in your reports main data source, and not be dependent on a parent-child situation that includes multiple data sources that will cause display problems. 
 

Comma delineated string to normalized data


I have rarely seen this situation happen, the few times involved ETL packages where the source data was stored this way and the destination was a normalized database, but just for kicks and giggles I'll demonstrate how to handle this situation.  

Using the above tables and data let's create a new table with the above results, and just for kicks and giggles add a few more rows.
 
-- XML PATH returns a single row with all values, separated by comma space
                      SELECT ', ' + convert(varchar(50), name)
                      FROM customer
                      FOR XML PATH('')
                      
                      -- STUFF removes the first comma space and formats this with ,
                      SELECT DISTINCT STUFF((
                         SELECT ', ' + convert(varchar(50), c2.name)
                         FROM customer c2
                         FOR XML PATH('')), 1, 1, '')
                      FROM customer c1
                      
                      -- THIS WORKS
                      SELECT DISTINCT ct.type,
                         STUFF(
                            (SELECT ', ' + convert(varchar(50), c2.name)
                            FROM customer c2
                            WHERE c1.ct_id = c2.ct_id
                            FOR XML PATH('') ), 1, 2, '') as names
                      FROM customer c1
                      JOIN customer_type ct ON c1.ct_id = ct.id
                      ORDER BY ct.type
                      
                      -- Throw the above results into a temp table
                      IF OBJECT_ID('tempdb..#c') IS NOT NULL
                         DROP TABLE #c 
                      GO
                      
                      CREATE TABLE #c (type varchar(100), names varchar(1000)) 
                      INSERT INTO #c (type, names) 
                      SELECT DISTINCT ct.type,
                         STUFF(
                            (SELECT ', ' + convert(varchar(50), c2.name)
                            FROM customer c2
                            WHERE c1.ct_id = c2.ct_id
                            FOR XML PATH('')  -- SELECT it as XML
                      ), 1, 2, '')   -- This is done to remove the first comma-space (,) FROM the result
                      as names
                      FROM customer c1
                         JOIN customer_type ct ON c1.ct_id = ct.id
                      ORDER BY ct.type
                      
                      -- Add a few more just for kicks and giggles
                      INSERT INTO #c (type, Names) 
                      VALUES 
                         ('Actors', 'Kevin Costner, Meg Ryan, Peter Weller'), 
                         ('Athletes', 'Michael Jordan, Bo Jackson'), 
                         ('Fictional Characters', 'Santa Clause, The Easter Bunny, The Tooth Fairy') 
                      
                      -- Show the results
                      SELECT * FROM #c

Open in new window

So here's the new starting point

select-all-from-comma-delineated-tab.jpg
The below query takes the above and produces a normalized data set.  This uses a Common Table Expression (CTE) that contains a correlated UNION ALL to return a single row for each type and name.

Name is determined by..

First CTE Select Statement (orange)
uses the Names column and CHARINDEX to get the LEFT-most characters from the beginning of the string to just before the comma. 

Second CTE Select Statement (blue)
Same actions as the first, but uses the CTE table tmp, making it recursive so that the tmp table is now populated with a row for each name in table #c.

Outer Query (Purple)
Simply calls the type and names of the CTE. 

normalized-table.jpg
  The entire T-SQL script is below.  

-- Create and populate tables used in this demo
                      IF EXISTS(SELECT name FROM sys.tables WHERE name='customer')
                         DROP TABLE customer
                      GO
                      
                      IF EXISTS(SELECT name FROM sys.tables WHERE name='customer_type')
                         DROP TABLE customer_type
                      GO
                      
                      -- Create the tables
                      CREATE TABLE customer_type(
                         id int PRIMARY KEY NOT NULL, 
                         type varchar(50)) 
                      CREATE TABLE customer (
                         id int identity(1,1) PRIMARY KEY NOT NULL, 
                         ct_id int REFERENCES customer_type(id), 
                         name varchar(50)) 
                      
                      ALTER TABLE customer
                      ADD CONSTRAINT fk_customer_customer_type
                      FOREIGN KEY (ct_id) 
                      REFERENCES customer_type(id) 
                      
                      INSERT INTO customer_type (id, type) 
                      VALUES (1, 'Movie Characters'), (2, 'Rock Stars'), (3, 'Cartoon Characters')
                      
                      INSERT INTO customer (ct_id, name ) 
                      VALUES 
                         (1, 'Johnny B. Goode'), (1,'Buckaroo Banzai'), (1,'Marty McFly'),
                         (2, 'Rick Springfield'), (2,'Bono'), (2,'Sammy Hagar'),
                         (3, 'Wile E. Coyote'), (3,'Officer Barbrady'), (3,'Sweet Polly Purebred'),  (3,'Bart Simpson') 
                      
                      -- Show the tables
                      SELECT * FROM customer
                      SELECT * FROM customer_type
                      
                      -- XML PATH returns a single row with all values, separated by comma space
                      SELECT ', ' + convert(varchar(50), name)
                      FROM customer
                      FOR XML PATH('')
                      
                      -- STUFF removes the first comma space and formats this with ,
                      SELECT DISTINCT STUFF((
                         SELECT ', ' + convert(varchar(50), c2.name)
                         FROM customer c2
                         FOR XML PATH('')), 1, 1, '')
                      FROM customer c1
                      
                      -- THIS WORKS
                      SELECT DISTINCT ct.type,
                         STUFF(
                            (SELECT ', ' + convert(varchar(50), c2.name)
                            FROM customer c2
                            WHERE c1.ct_id = c2.ct_id
                            FOR XML PATH('') ), 1, 2, '') as names
                      FROM customer c1
                      JOIN customer_type ct ON c1.ct_id = ct.id
                      ORDER BY ct.type
                      
                      -- Throw the above results into a temp table
                      IF OBJECT_ID('tempdb..#c') IS NOT NULL
                         DROP TABLE #c 
                      GO
                      
                      CREATE TABLE #c (type varchar(100), names varchar(1000)) 
                      
                      INSERT INTO #c (type, names) 
                      SELECT DISTINCT ct.type,
                         STUFF(
                            (SELECT ', ' + convert(varchar(50), c2.name)
                            FROM customer c2
                            WHERE c1.ct_id = c2.ct_id
                            FOR XML PATH('')  -- SELECT it as XML
                      ), 1, 2, '')   -- This is done to remove the first comma-space (,) FROM the result
                      as names
                      FROM customer c1
                         JOIN customer_type ct ON c1.ct_id = ct.id
                      ORDER BY ct.type
                      
                      -- Add a few more just for kicks and giggles
                      INSERT INTO #c (type, Names) 
                      VALUES 
                         ('Actors', 'Kevin Costner, Meg Ryan, Peter Weller'), 
                         ('Athletes', 'Michael Jordan, Bo Jackson'), 
                         ('Fictional Characters', 'Santa Clause, The Easter Bunny, The Tooth Fairy') 
                      
                      -- Show the results
                      SELECT * FROM #c
                      
                      -- THIS WORKS convert the set back to normalized
                      ;with tmp(type, names1, names2) as (
                      SELECT 
                         type, 
                         CAST(LTRIM(LEFT(Names, CHARINDEX(',', Names + ',') -1 )) as varchar(100)),
                         CAST(STUFF(Names, 1, CHARINDEX(',', Names + ','), '') as varchar(1000))
                      FROM #c
                      UNION ALL
                      SELECT 
                         type, 
                         CAST(LTRIM(LEFT(names2, CHARINDEX(',', names2 + ',') -1)) as varchar(100)),
                         CAST(STUFF(names2, 1, CHARINDEX(',', names2 + ','), '') as varchar(1000))
                      FROM tmp
                      where names2 > ''
                      )
                      SELECT type, names1 as names
                      FROM tmp
                      ORDER BY type

Open in new window


The end.  Thank you for reading my article, and please click the 'Good Article' button if it helped you. 
6
22,957 Views
Jim HornSQL Server Data Dude
CERTIFIED EXPERT
Microsoft SQL Server database developer, architect, and author specializing in Business Intelligence, ETL, and Data Warehousing.

Comments (1)

Jim HornSQL Server Data Dude
CERTIFIED EXPERT
Most Valuable Expert 2013
Author of the Year 2015

Author

Commented:
Done.  This is a republication from My Website which is allowed with the NoIndex checkbox checked per recent rules change.

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.