<

T-SQL:  Normalized data to a single comma delineated string and back

Published on
24,074 Points
14,474 Views
6 Endorsements
Last Modified:
Jim Horn
Microsoft SQL Server database developer, architect, and author specializing in Business Intelligence, ETL, and Data Warehousing.
Many times as a report developer I've been asked to display normalized data such as three rows with values Jack, Joe, and Bob as a single comma-separated string such as 'Jack, Joe, Bob', and vice versa.  Here's how to do it. 
Have you ever built reports and had major real-estate issues where business owners wanted as much information on a single page as possible, with a minimum font size so you don't need a microscope to read it?  I've been in this situation a number of times, and a common battle is parent-child situations where the child is a sub-report (Table object, tablix control, etc) that can have multiple values, which causes sizing problems for the entire report.  A more cosmetically pleasing solution is to have a column for the parent and a column for the children that is one value with values separated by commas.   The code for this is not easy, but here's a working solution. 

Our starting point will be some sample data of movie characters, rock stars, and cartoon characters. 
 
-- Create and populate tables used in this demo
IF EXISTS(SELECT name FROM sys.tables WHERE name='customer')
   DROP TABLE customer
GO

IF EXISTS(SELECT name FROM sys.tables WHERE name='customer_type')
   DROP TABLE customer_type
GO

-- Create the tables
CREATE TABLE customer_type(
   id int PRIMARY KEY NOT NULL, 
   type varchar(50)) 
CREATE TABLE customer (
   id int identity(1,1) PRIMARY KEY NOT NULL, 
   ct_id int REFERENCES customer_type(id), 
   name varchar(50)) 

ALTER TABLE customer
ADD CONSTRAINT fk_customer_customer_type
FOREIGN KEY (ct_id) 
REFERENCES customer_type(id) 

INSERT INTO customer_type (id, type) 
VALUES (1, 'Movie Characters'), (2, 'Rock Stars'), (3, 'Cartoon Characters')

INSERT INTO customer (ct_id, name ) 
VALUES 
   (1, 'Johnny B. Goode'), (1,'Buckaroo Banzai'), (1,'Marty McFly'),
   (2, 'Rick Springfield'), (2,'Bono'), (2,'Sammy Hagar'),
   (3, 'Wile E. Coyote'), (3,'Officer Barbrady'), (3,'Sweet Polly Purebred'),  (3,'Bart Simpson') 

-- Show the tables
SELECT * FROM customer
SELECT * FROM customer_type

Open in new window



first-starting-point.jpgNormalized data to comma delineated string


For starters, here's a query that uses FOR XML PATH that takes all column values in a table and returns a single string separated with a comma-space, but this throws in a comma space in the beginning.  

xml-path.jpg
Below is the same query as above, with STUFF used to remove the first comma space.

xml-path-with-stuff.jpg
Below is the completed query, which places the above FOR XML PATH into a subquery that is correlated on a main query of DISTINCT ct.type in order to return only the names for that type. 

comma-delineated-values-string.jpgUsing this methodology you can now include single comma-separated values in your reports main data source, and not be dependent on a parent-child situation that includes multiple data sources that will cause display problems. 
 

Comma delineated string to normalized data


I have rarely seen this situation happen, the few times involved ETL packages where the source data was stored this way and the destination was a normalized database, but just for kicks and giggles I'll demonstrate how to handle this situation.  

Using the above tables and data let's create a new table with the above results, and just for kicks and giggles add a few more rows.
 
-- XML PATH returns a single row with all values, separated by comma space
SELECT ', ' + convert(varchar(50), name)
FROM customer
FOR XML PATH('')

-- STUFF removes the first comma space and formats this with ,
SELECT DISTINCT STUFF((
   SELECT ', ' + convert(varchar(50), c2.name)
   FROM customer c2
   FOR XML PATH('')), 1, 1, '')
FROM customer c1

-- THIS WORKS
SELECT DISTINCT ct.type,
   STUFF(
      (SELECT ', ' + convert(varchar(50), c2.name)
      FROM customer c2
      WHERE c1.ct_id = c2.ct_id
      FOR XML PATH('') ), 1, 2, '') as names
FROM customer c1
JOIN customer_type ct ON c1.ct_id = ct.id
ORDER BY ct.type

-- Throw the above results into a temp table
IF OBJECT_ID('tempdb..#c') IS NOT NULL
   DROP TABLE #c 
GO

CREATE TABLE #c (type varchar(100), names varchar(1000)) 
INSERT INTO #c (type, names) 
SELECT DISTINCT ct.type,
   STUFF(
      (SELECT ', ' + convert(varchar(50), c2.name)
      FROM customer c2
      WHERE c1.ct_id = c2.ct_id
      FOR XML PATH('')  -- SELECT it as XML
), 1, 2, '')   -- This is done to remove the first comma-space (,) FROM the result
as names
FROM customer c1
   JOIN customer_type ct ON c1.ct_id = ct.id
ORDER BY ct.type

-- Add a few more just for kicks and giggles
INSERT INTO #c (type, Names) 
VALUES 
   ('Actors', 'Kevin Costner, Meg Ryan, Peter Weller'), 
   ('Athletes', 'Michael Jordan, Bo Jackson'), 
   ('Fictional Characters', 'Santa Clause, The Easter Bunny, The Tooth Fairy') 

-- Show the results
SELECT * FROM #c

Open in new window

So here's the new starting point

select-all-from-comma-delineated-tab.jpg
The below query takes the above and produces a normalized data set.  This uses a Common Table Expression (CTE) that contains a correlated UNION ALL to return a single row for each type and name.

Name is determined by..

First CTE Select Statement (orange)
uses the Names column and CHARINDEX to get the LEFT-most characters from the beginning of the string to just before the comma. 

Second CTE Select Statement (blue)
Same actions as the first, but uses the CTE table tmp, making it recursive so that the tmp table is now populated with a row for each name in table #c.

Outer Query (Purple)
Simply calls the type and names of the CTE. 

normalized-table.jpg
  The entire T-SQL script is below.  

-- Create and populate tables used in this demo
IF EXISTS(SELECT name FROM sys.tables WHERE name='customer')
   DROP TABLE customer
GO

IF EXISTS(SELECT name FROM sys.tables WHERE name='customer_type')
   DROP TABLE customer_type
GO

-- Create the tables
CREATE TABLE customer_type(
   id int PRIMARY KEY NOT NULL, 
   type varchar(50)) 
CREATE TABLE customer (
   id int identity(1,1) PRIMARY KEY NOT NULL, 
   ct_id int REFERENCES customer_type(id), 
   name varchar(50)) 

ALTER TABLE customer
ADD CONSTRAINT fk_customer_customer_type
FOREIGN KEY (ct_id) 
REFERENCES customer_type(id) 

INSERT INTO customer_type (id, type) 
VALUES (1, 'Movie Characters'), (2, 'Rock Stars'), (3, 'Cartoon Characters')

INSERT INTO customer (ct_id, name ) 
VALUES 
   (1, 'Johnny B. Goode'), (1,'Buckaroo Banzai'), (1,'Marty McFly'),
   (2, 'Rick Springfield'), (2,'Bono'), (2,'Sammy Hagar'),
   (3, 'Wile E. Coyote'), (3,'Officer Barbrady'), (3,'Sweet Polly Purebred'),  (3,'Bart Simpson') 

-- Show the tables
SELECT * FROM customer
SELECT * FROM customer_type

-- XML PATH returns a single row with all values, separated by comma space
SELECT ', ' + convert(varchar(50), name)
FROM customer
FOR XML PATH('')

-- STUFF removes the first comma space and formats this with ,
SELECT DISTINCT STUFF((
   SELECT ', ' + convert(varchar(50), c2.name)
   FROM customer c2
   FOR XML PATH('')), 1, 1, '')
FROM customer c1

-- THIS WORKS
SELECT DISTINCT ct.type,
   STUFF(
      (SELECT ', ' + convert(varchar(50), c2.name)
      FROM customer c2
      WHERE c1.ct_id = c2.ct_id
      FOR XML PATH('') ), 1, 2, '') as names
FROM customer c1
JOIN customer_type ct ON c1.ct_id = ct.id
ORDER BY ct.type

-- Throw the above results into a temp table
IF OBJECT_ID('tempdb..#c') IS NOT NULL
   DROP TABLE #c 
GO

CREATE TABLE #c (type varchar(100), names varchar(1000)) 

INSERT INTO #c (type, names) 
SELECT DISTINCT ct.type,
   STUFF(
      (SELECT ', ' + convert(varchar(50), c2.name)
      FROM customer c2
      WHERE c1.ct_id = c2.ct_id
      FOR XML PATH('')  -- SELECT it as XML
), 1, 2, '')   -- This is done to remove the first comma-space (,) FROM the result
as names
FROM customer c1
   JOIN customer_type ct ON c1.ct_id = ct.id
ORDER BY ct.type

-- Add a few more just for kicks and giggles
INSERT INTO #c (type, Names) 
VALUES 
   ('Actors', 'Kevin Costner, Meg Ryan, Peter Weller'), 
   ('Athletes', 'Michael Jordan, Bo Jackson'), 
   ('Fictional Characters', 'Santa Clause, The Easter Bunny, The Tooth Fairy') 

-- Show the results
SELECT * FROM #c

-- THIS WORKS convert the set back to normalized
;with tmp(type, names1, names2) as (
SELECT 
   type, 
   CAST(LTRIM(LEFT(Names, CHARINDEX(',', Names + ',') -1 )) as varchar(100)),
   CAST(STUFF(Names, 1, CHARINDEX(',', Names + ','), '') as varchar(1000))
FROM #c
UNION ALL
SELECT 
   type, 
   CAST(LTRIM(LEFT(names2, CHARINDEX(',', names2 + ',') -1)) as varchar(100)),
   CAST(STUFF(names2, 1, CHARINDEX(',', names2 + ','), '') as varchar(1000))
FROM tmp
where names2 > ''
)
SELECT type, names1 as names
FROM tmp
ORDER BY type

Open in new window


The end.  Thank you for reading my article, and please click the 'Good Article' button if it helped you. 
6
Comment
Author:Jim Horn
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
1 Comment
 
LVL 66

Author Comment

by:Jim Horn
Done.  This is a republication from My Website which is allowed with the NoIndex checkbox checked per recent rules change.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Join & Write a Comment

This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month