SPLIT a varchar into rows using CTE

Instead of using a function, I'm hoping to use a CTE to split a string into individual records. I have most of it, except that the last record is the word-before last. (see sample results below for reference). You will notice that "delimiter" is the new rowdata, and "a" is the itemdata. I want another row that has "" as row data, and "delimiter" as itemdata.


DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)

SELECT
      @RowData = 'This is a test for splitting using a CTE to break a varchar into records based on a delimiter',
      @SplitOn = ' '

;WITH CTE (RowData, SplitOn, ItemData, Cnt) AS
(
      SELECT
            SUBSTRING(@RowData, CHARINDEX(@SplitOn, @RowData) + 1, LEN(@RowData)),
            @SplitOn,
            RTRIM(LTRIM(SUBSTRING(@RowData, 1, CHARINDEX(@SplitOn, @RowData) - 1))),
            1
      UNION ALL
      
      SELECT
            SUBSTRING(RowData, CHARINDEX(SplitOn, RowData) + 1, LEN(RowData)),            
            SplitOn,
            RTRIM(LTRIM(SUBSTRING(RowData, 1, CHARINDEX(SplitOn, RowData) - 1))),
            Cnt + 1
      FROM CTE B
      WHERE
            CHARINDEX(SplitOn, RowData) > 0
)

SELECT
      *
FROM CTE

Results.JPG
LVL 10
bchoorAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

bchoorAuthor Commented:
I figured it out, but this is a little dirty. Basically I start by counting the number of delimiters, and iterate until there aren't any more delimiters in the string -count of delimiters should then be one. I then do a final iteration (in the 3rd Union) and add a new row by using the last record (there's no identity, so I find it by count = 1)

I'm open to suggestions to clean it up/optimize it.

DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)

SELECT
      @RowData = 'This is a test for splitting using a CTE to break a varchar into records based on a delimiter',
      @SplitOn = ' '

;WITH CTE (RowData, SplitOn, ItemData, Cnt) AS
(
      SELECT
            SUBSTRING(@RowData, CHARINDEX(@SplitOn, @RowData) + 1, LEN(@RowData)),
            @SplitOn,
            RTRIM(LTRIM(SUBSTRING(@RowData, 1, CHARINDEX(@SplitOn, @RowData) - 1))),
            LEN(@RowData) - LEN(REPLACE(@RowData, @SplitOn, ''))
      UNION ALL
      
      SELECT
            SUBSTRING(RowData, CHARINDEX(SplitOn, RowData) + 1, LEN(RowData)),            
            SplitOn,
            RTRIM(LTRIM(SUBSTRING(RowData, 1, CHARINDEX(SplitOn, RowData) - 1))),
            Cnt - 1
      FROM CTE B
      WHERE
            CHARINDEX(SplitOn, RowData) > 0

      UNION ALL
      
      SELECT
            '',
            SplitOn,
            RowData,
            Cnt - 1
      FROM
            CTE C
      WHERE
            Cnt = 1
      
)

SELECT
      *
FROM CTE
0
Aneesh RetnakaranDatabase AdministratorCommented:

;WITH Res(s, r)
AS
(
SELECT
SUBSTRING(@RowData,1, CHARINDEX(@SplitOn, @RowData)-1) s,
SUBSTRING(@RowData,CHARINDEX(@SplitOn, @RowData)+1, len(@RowData)) r
UNION ALL
SELECT
SUBSTRING(r,1, CHARINDEX(@SplitOn, r)-1) s,
SUBSTRING(r,CHARINDEX(@SplitOn, r)+1, len(r)) r
FROM Res
WHERE
CHARINDEX(@SplitOn, r) > 0
)
SELECT r,s FROM Res
0
Chris LuttrellSenior Database ArchitectCommented:
Try this one, it is based on your original without the extra union to pick up the last Item.  I also noticed your query failed if it was a 1 word string, so this will work on that case also, hince the commented out "Test" to test that scenario.
HTH,
Chris
DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)

SELECT 
      @RowData = 'This is a test for splitting using a CTE to break a varchar into records based on a delimiter',
      --@RowData = 'Test',
      @SplitOn = ' '


;WITH CTE (RowData, SplitOn, ItemData, Cnt) AS
(
      SELECT 
            SUBSTRING(@RowData, CHARINDEX(@SplitOn, @RowData) + 1, LEN(@RowData)),
            @SplitOn,
            CASE CHARINDEX(@SplitOn, @RowData) WHEN 0 THEN @RowData ELSE RTRIM(LTRIM(SUBSTRING(@RowData, 1, CHARINDEX(@SplitOn, @RowData) - 1))) END,
            LEN(@RowData) - LEN(REPLACE(@RowData, @SplitOn, ''))
      UNION ALL
      
      SELECT
            CASE cnt WHEN 1 THEN '' ELSE SUBSTRING(RowData, CHARINDEX(SplitOn, RowData) + 1, LEN(RowData)) END,            
            SplitOn,
            CASE cnt WHEN 1 THEN RowData WHEN 0 THEN NULL ELSE RTRIM(LTRIM(SUBSTRING(RowData, 1, CHARINDEX(SplitOn, RowData) - 1))) END,
            Cnt - 1
      FROM CTE B
      WHERE Cnt > 0
)
SELECT 
      *
FROM CTE

Open in new window

0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

LowfatspreadCommented:
i believe this is an improvement

removes the extra union

allows for multiple character delimiter

has an element number (itemno)

gets rid of the rtrim...

basically start off by appending the delimiter to the input string that allows for an initially empty string
and gets rid of the "third" union

the one-off calculation of the delimiter position may also speed things up?

hth
     @RowData = 'This is a test for  @*#splitting using a CTE@*#@*# to break a varchar into records based on a delimiter',
      @SplitOn = ' '--'@*#'

;WITH CTE (RowData, SplitOn, ItemData, Cnt,ItemNo) AS
(
      SELECT 
            SUBSTRING(indata, pos+datalength(@spliton), LEN(indata)),
            @SplitOn,
            LTRIM(SUBSTRING(indata, 1, pos - 1)),
            (LEN(indata) - LEN(REPLACE(indata, @SplitOn, '')))/datalength(@spliton)
            ,1
            from ( select indata,CHARINDEX(@SplitOn, indata) as pos
                     from (select @RowData+@SplitOn as InData) as x
                  ) as y  
      UNION ALL
      
      SELECT
            SUBSTRING(RowData, pos + datalength(@spliton), LEN(RowData)),            
            SplitOn,
            LTRIM(SUBSTRING(RowData, 1, pos - 1)),
            Cnt - 1
            ,ItemNo+1
      FROM (select cte.*,charindex(spliton,rowdata) as pos from CTE) as B
      WHERE
            pos > 0

      
)

SELECT 
      *
FROM CTE
order by itemno

Open in new window

0
ralmadaCommented:
If your string is not longer than 2047 characters then you don't need to use CTE you can try like this:

DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)
SELECT
@RowData = 'This is a test for splitting using a CTE to break a varchar into records based on a delimiter',
@SplitOn = ' '

SELECT SUBSTRING(RowData, n, CHARINDEX(@spliton, RowData + @spliton,n) - n) AS seq
FROM (select @Rowdata as RowData) a
CROSS JOIN (SELECT number FROM master..spt_values WHERE type = 'P' and number >0) AS Numbers(n)
WHERE SUBSTRING(@Spliton + RowData, n, 1) = @spliton
AND n < LEN(Rowdata) + 1
If it's longer then I would suggest the following approach in the CTE format:

;with CTE as (
	select 1 as n
	union all
	select n+1 from CTE
	where n < len(@RowData) 
)
select n, ltrim(substring(RowData, n, charindex(@spliton, Rowdata + @spliton, n) - n)) as seq
from (select @RowData as RowData) a
cross join CTE 
where substring(@SplitOn+RowData, n, 1) = @SplitOn
and n < len(RowData)+1
option (maxrecursion 0)

Open in new window

0
ralmadaCommented:
Another alternative, but it could be slower is using XML
DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)

SELECT 
      @RowData = 'This is a test for splitting using a CTE to break a varchar into records based on a delimiter',
      @SplitOn = ' '

declare @xml as xml
SET @XML  = '<t><r>' +  Replace(@RowData , @spliton, '</r><r>') + '</r></t>'  

select t.r.value('.', 'VARCHAR(8000)') as val
from @xml.nodes('/t/r') as t(r)

Open in new window

0
Chris LuttrellSenior Database ArchitectCommented:
@ralmada,
Those are all neat alternative, especially the XML one, but when comparing query to query using the Actual Execution Plan and the Query cost (relative to the batch), my query is still better, although your CTE is a close second at 58% compared to 42% for the one I proposed.  For some reason, even though the overall time is still = 0, on the XML and  your cross join example, both claim 100% in the comparison against either CTE query.
0
Chris LuttrellSenior Database ArchitectCommented:
P.S.   Lowfatspread's example also claims 100% Query cost compared to both ralmada's and my CTE queries.
0
ThomasianCommented:
Here's another way
DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)

SELECT @RowData = 'This is a test for splitting using a CTE to break a varchar into records based on a delimiter'
     , @SplitOn = ' '

;WITH CTE (RowData, SplitOn, ItemData, Cnt) AS
(
	SELECT STUFF(@RowData, 1, CHARINDEX(@SplitOn,@RowData+@SplitOn), '')
	     , @SplitOn
	     , CAST(SUBSTRING(@RowData, 1, CHARINDEX(@SplitOn,@RowData+@SplitOn)-1) as varchar(max))
	     , 1

	UNION ALL
	
	SELECT STUFF(RowData, 1, CHARINDEX(SplitOn,RowData+@SplitOn), '')
	     , SplitOn
	     , CAST(SUBSTRING(RowData, 1, CHARINDEX(SplitOn,RowData+@SplitOn)-1) as varchar(max))
	     , Cnt+1
	FROM CTE
	WHERE RowData <> ''
)
SELECT * FROM CTE
OPTION (MAXRECURSION 0)

Open in new window

0
ThomasianCommented:
Forgot to remove the "@" on the recursive part of the query.
DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)

SELECT @RowData = 'This is a test for splitting using a CTE to break a varchar into records based on a delimiter'
     , @SplitOn = ' '

;WITH CTE (RowData, SplitOn, ItemData, Cnt) AS
(
	SELECT STUFF(@RowData, 1, CHARINDEX(@SplitOn,@RowData+@SplitOn), '')
	     , @SplitOn
	     , CAST(SUBSTRING(@RowData, 1, CHARINDEX(@SplitOn,@RowData+@SplitOn)-1) as varchar(max))
	     , 1

	UNION ALL
	
	SELECT STUFF(RowData, 1, CHARINDEX(SplitOn,RowData+SplitOn), '')
	     , SplitOn
	     , CAST(SUBSTRING(RowData, 1, CHARINDEX(SplitOn,RowData+SplitOn)-1) as varchar(max))
	     , Cnt+1
	FROM CTE
	WHERE RowData <> ''
)
SELECT * FROM CTE
OPTION (MAXRECURSION 0)

Open in new window

0
LowfatspreadCommented:
>>cgluttrell
>>P.S.   Lowfatspread's example also claims 100% Query cost compared to both ralmada's and my CTE queries.

the 100% cost is because i have included an order by statement in my query... if you remove that most of the difference goes away...  (if you include an order by cnt desc on cg's we both get 50:50)

i'd argue that you should include an order by clause to ensure that you get the data in the correct sequence ... but sequence may not be important..

0
ThomasianCommented:
I noticed that Lowfatspread's example allows multicharacter delimiter which could be useful.

I modified my query to also have this feature and fix the issue when @SplitOn is set to an empty string.
DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(10)

SELECT @RowData = 'This is a test for  @*#splitting using a CTE@*#@*# to break a varchar into records based on a delimiter'
     , @SplitOn =' ' 
--     , @SplitOn ='@*#' 
--     , @SplitOn ='' 
 
;WITH CTE (RowData, SplitOn, ItemData, Cnt) AS
(
	SELECT COALESCE(STUFF(@RowData, 1, CHARINDEX(@SplitOn,@RowData+@SplitOn) + DATALENGTH(@SplitOn) - 1 , ''),'')
	     , @SplitOn
	     , CASE WHEN @SplitOn<> ''
	            THEN CAST(SUBSTRING(@RowData, 1, CHARINDEX(@SplitOn,@RowData+@SplitOn)-1) as varchar(max))
	            ELSE @RowData 
	       END
	     , 1

	UNION ALL
	
	SELECT STUFF(RowData, 1, CHARINDEX(SplitOn,RowData+SplitOn) + DATALENGTH(SplitOn) - 1, '')
	     , SplitOn
	     , CAST(SUBSTRING(RowData, 1, CHARINDEX(SplitOn,RowData+SplitOn)-1) as varchar(max))
	     , Cnt+1
	FROM CTE
	WHERE RowData <> ''
)
SELECT * FROM CTE
OPTION (MAXRECURSION 0)

Open in new window

0
cyberkiwiCommented:
31896529 doesn't handle 1 word, and has a bug when trailing space (1 more field)
31898489 does not produce the last word
31902479 counts backwards and in the one word scenario itemdata=rowdata, may be easily fixed but I already have an alternative
31902780 does not handle trailing space, subquery that is hard to follow (for me, sorry I like KISS)
31904704 awesome string splitting code! most work is done recombining outside the CTE. one catch only, doesn't handle single space, which
31914848 leading space breaks it completely
XML - not even going to comment!

Obviously I'm interpreting these rules:
1 delimiter = 1 record, therefore consecutive delimiters=multiple records
trailing space/leading space = 1 record
single space = 2 records (1st and 2nd split by space)

Main consideration in my approach
Strings are one of the most expensive resources - so having to keep copying it is a high expense in all the approaches so far (sans XML)
DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)
SELECT
      @RowData = ' This is a test for splitting  using a CTE to break a varchar into records based on a delimiter ',
      --@RowData = ' ',
      @SplitOn = ' '
;WITH CTE (RowData, /*SplitOn,*/ ItemData, Cnt, I, J) AS
(
	SELECT
		convert(varchar(2000),null),
		convert(varchar(2000),null),
		0,
		CharIndex(@SplitOn, @RowData)+1,
		1--case left(@RowData,1) when ' ' then 2 else 1 end
	UNION ALL
	SELECT SubString(@RowData, I, 2000), SubString(@RowData, J, I-J-1), Cnt+1, CharIndex(@SplitOn, @RowData, I)+1, I
	FROM CTE
	WHERE I > 1 AND J > 0
	UNION ALL
	SELECT '', SubString(@RowData, J, 2000), Cnt+1, CharIndex(@SplitOn, @RowData, I)+1, 0
	FROM CTE
	WHERE I = 1 AND J > 1
)
SELECT RowData, SplitOn=@SplitOn, ItemData, Cnt
FROM CTE
WHERE Cnt>0

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
bchoorAuthor Commented:
Thanks guys. Great options.

@ramlada - nice one on the XML, didn't even occur to me.

0
wam-itCommented:
Hello, I realize this is an older post...but I've found it very helpful! I have used the XML suggestion as posted by @Ramlada.  I need some assistance with including this in a query, as I'm not too experienced with CTE/recursive queries. When I run the query below, I only return one record. How would I adjust this to return all rows in my table?

DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)
DECLARE @ObjectID int

SELECT 
@ObjectID = ObjectID, @RowData = ObjectName, @SplitOn = ';' from Objects 

declare @xml as xml
SET @XML  = '<t><r>' +  Replace(@RowData , @spliton, '</r><r>') + '</r></t>'  

select @objectid as objectid, rtrim(ltrim(t.r.value('.', 'VARCHAR(8000)'))) as splitvalue
from @xml.nodes('/t/r') as t(r) 

Open in new window

0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.