Link to home
Start Free TrialLog in
Avatar of bchoor
bchoorFlag for United States of America

asked on

SPLIT a varchar into rows using CTE

Instead of using a function, I'm hoping to use a CTE to split a string into individual records. I have most of it, except that the last record is the word-before last. (see sample results below for reference). You will notice that "delimiter" is the new rowdata, and "a" is the itemdata. I want another row that has "" as row data, and "delimiter" as itemdata.


DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)

SELECT
      @RowData = 'This is a test for splitting using a CTE to break a varchar into records based on a delimiter',
      @SplitOn = ' '

;WITH CTE (RowData, SplitOn, ItemData, Cnt) AS
(
      SELECT
            SUBSTRING(@RowData, CHARINDEX(@SplitOn, @RowData) + 1, LEN(@RowData)),
            @SplitOn,
            RTRIM(LTRIM(SUBSTRING(@RowData, 1, CHARINDEX(@SplitOn, @RowData) - 1))),
            1
      UNION ALL
      
      SELECT
            SUBSTRING(RowData, CHARINDEX(SplitOn, RowData) + 1, LEN(RowData)),            
            SplitOn,
            RTRIM(LTRIM(SUBSTRING(RowData, 1, CHARINDEX(SplitOn, RowData) - 1))),
            Cnt + 1
      FROM CTE B
      WHERE
            CHARINDEX(SplitOn, RowData) > 0
)

SELECT
      *
FROM CTE

Results.JPG
Avatar of bchoor
bchoor
Flag of United States of America image

ASKER

I figured it out, but this is a little dirty. Basically I start by counting the number of delimiters, and iterate until there aren't any more delimiters in the string -count of delimiters should then be one. I then do a final iteration (in the 3rd Union) and add a new row by using the last record (there's no identity, so I find it by count = 1)

I'm open to suggestions to clean it up/optimize it.

DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)

SELECT
      @RowData = 'This is a test for splitting using a CTE to break a varchar into records based on a delimiter',
      @SplitOn = ' '

;WITH CTE (RowData, SplitOn, ItemData, Cnt) AS
(
      SELECT
            SUBSTRING(@RowData, CHARINDEX(@SplitOn, @RowData) + 1, LEN(@RowData)),
            @SplitOn,
            RTRIM(LTRIM(SUBSTRING(@RowData, 1, CHARINDEX(@SplitOn, @RowData) - 1))),
            LEN(@RowData) - LEN(REPLACE(@RowData, @SplitOn, ''))
      UNION ALL
      
      SELECT
            SUBSTRING(RowData, CHARINDEX(SplitOn, RowData) + 1, LEN(RowData)),            
            SplitOn,
            RTRIM(LTRIM(SUBSTRING(RowData, 1, CHARINDEX(SplitOn, RowData) - 1))),
            Cnt - 1
      FROM CTE B
      WHERE
            CHARINDEX(SplitOn, RowData) > 0

      UNION ALL
      
      SELECT
            '',
            SplitOn,
            RowData,
            Cnt - 1
      FROM
            CTE C
      WHERE
            Cnt = 1
      
)

SELECT
      *
FROM CTE
Avatar of Aneesh

;WITH Res(s, r)
AS
(
SELECT
SUBSTRING(@RowData,1, CHARINDEX(@SplitOn, @RowData)-1) s,
SUBSTRING(@RowData,CHARINDEX(@SplitOn, @RowData)+1, len(@RowData)) r
UNION ALL
SELECT
SUBSTRING(r,1, CHARINDEX(@SplitOn, r)-1) s,
SUBSTRING(r,CHARINDEX(@SplitOn, r)+1, len(r)) r
FROM Res
WHERE
CHARINDEX(@SplitOn, r) > 0
)
SELECT r,s FROM Res
Try this one, it is based on your original without the extra union to pick up the last Item.  I also noticed your query failed if it was a 1 word string, so this will work on that case also, hince the commented out "Test" to test that scenario.
HTH,
Chris
DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)

SELECT 
      @RowData = 'This is a test for splitting using a CTE to break a varchar into records based on a delimiter',
      --@RowData = 'Test',
      @SplitOn = ' '


;WITH CTE (RowData, SplitOn, ItemData, Cnt) AS
(
      SELECT 
            SUBSTRING(@RowData, CHARINDEX(@SplitOn, @RowData) + 1, LEN(@RowData)),
            @SplitOn,
            CASE CHARINDEX(@SplitOn, @RowData) WHEN 0 THEN @RowData ELSE RTRIM(LTRIM(SUBSTRING(@RowData, 1, CHARINDEX(@SplitOn, @RowData) - 1))) END,
            LEN(@RowData) - LEN(REPLACE(@RowData, @SplitOn, ''))
      UNION ALL
      
      SELECT
            CASE cnt WHEN 1 THEN '' ELSE SUBSTRING(RowData, CHARINDEX(SplitOn, RowData) + 1, LEN(RowData)) END,            
            SplitOn,
            CASE cnt WHEN 1 THEN RowData WHEN 0 THEN NULL ELSE RTRIM(LTRIM(SUBSTRING(RowData, 1, CHARINDEX(SplitOn, RowData) - 1))) END,
            Cnt - 1
      FROM CTE B
      WHERE Cnt > 0
)
SELECT 
      *
FROM CTE

Open in new window

SOLUTION
Avatar of Lowfatspread
Lowfatspread
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If your string is not longer than 2047 characters then you don't need to use CTE you can try like this:

DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)
SELECT
@RowData = 'This is a test for splitting using a CTE to break a varchar into records based on a delimiter',
@SplitOn = ' '

SELECT SUBSTRING(RowData, n, CHARINDEX(@spliton, RowData + @spliton,n) - n) AS seq
FROM (select @Rowdata as RowData) a
CROSS JOIN (SELECT number FROM master..spt_values WHERE type = 'P' and number >0) AS Numbers(n)
WHERE SUBSTRING(@Spliton + RowData, n, 1) = @spliton
AND n < LEN(Rowdata) + 1
If it's longer then I would suggest the following approach in the CTE format:

;with CTE as (
	select 1 as n
	union all
	select n+1 from CTE
	where n < len(@RowData) 
)
select n, ltrim(substring(RowData, n, charindex(@spliton, Rowdata + @spliton, n) - n)) as seq
from (select @RowData as RowData) a
cross join CTE 
where substring(@SplitOn+RowData, n, 1) = @SplitOn
and n < len(RowData)+1
option (maxrecursion 0)

Open in new window

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@ralmada,
Those are all neat alternative, especially the XML one, but when comparing query to query using the Actual Execution Plan and the Query cost (relative to the batch), my query is still better, although your CTE is a close second at 58% compared to 42% for the one I proposed.  For some reason, even though the overall time is still = 0, on the XML and  your cross join example, both claim 100% in the comparison against either CTE query.
P.S.   Lowfatspread's example also claims 100% Query cost compared to both ralmada's and my CTE queries.
Here's another way
DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)

SELECT @RowData = 'This is a test for splitting using a CTE to break a varchar into records based on a delimiter'
     , @SplitOn = ' '

;WITH CTE (RowData, SplitOn, ItemData, Cnt) AS
(
	SELECT STUFF(@RowData, 1, CHARINDEX(@SplitOn,@RowData+@SplitOn), '')
	     , @SplitOn
	     , CAST(SUBSTRING(@RowData, 1, CHARINDEX(@SplitOn,@RowData+@SplitOn)-1) as varchar(max))
	     , 1

	UNION ALL
	
	SELECT STUFF(RowData, 1, CHARINDEX(SplitOn,RowData+@SplitOn), '')
	     , SplitOn
	     , CAST(SUBSTRING(RowData, 1, CHARINDEX(SplitOn,RowData+@SplitOn)-1) as varchar(max))
	     , Cnt+1
	FROM CTE
	WHERE RowData <> ''
)
SELECT * FROM CTE
OPTION (MAXRECURSION 0)

Open in new window

Forgot to remove the "@" on the recursive part of the query.
DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)

SELECT @RowData = 'This is a test for splitting using a CTE to break a varchar into records based on a delimiter'
     , @SplitOn = ' '

;WITH CTE (RowData, SplitOn, ItemData, Cnt) AS
(
	SELECT STUFF(@RowData, 1, CHARINDEX(@SplitOn,@RowData+@SplitOn), '')
	     , @SplitOn
	     , CAST(SUBSTRING(@RowData, 1, CHARINDEX(@SplitOn,@RowData+@SplitOn)-1) as varchar(max))
	     , 1

	UNION ALL
	
	SELECT STUFF(RowData, 1, CHARINDEX(SplitOn,RowData+SplitOn), '')
	     , SplitOn
	     , CAST(SUBSTRING(RowData, 1, CHARINDEX(SplitOn,RowData+SplitOn)-1) as varchar(max))
	     , Cnt+1
	FROM CTE
	WHERE RowData <> ''
)
SELECT * FROM CTE
OPTION (MAXRECURSION 0)

Open in new window

>>cgluttrell
>>P.S.   Lowfatspread's example also claims 100% Query cost compared to both ralmada's and my CTE queries.

the 100% cost is because i have included an order by statement in my query... if you remove that most of the difference goes away...  (if you include an order by cnt desc on cg's we both get 50:50)

i'd argue that you should include an order by clause to ensure that you get the data in the correct sequence ... but sequence may not be important..

I noticed that Lowfatspread's example allows multicharacter delimiter which could be useful.

I modified my query to also have this feature and fix the issue when @SplitOn is set to an empty string.
DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(10)

SELECT @RowData = 'This is a test for  @*#splitting using a CTE@*#@*# to break a varchar into records based on a delimiter'
     , @SplitOn =' ' 
--     , @SplitOn ='@*#' 
--     , @SplitOn ='' 
 
;WITH CTE (RowData, SplitOn, ItemData, Cnt) AS
(
	SELECT COALESCE(STUFF(@RowData, 1, CHARINDEX(@SplitOn,@RowData+@SplitOn) + DATALENGTH(@SplitOn) - 1 , ''),'')
	     , @SplitOn
	     , CASE WHEN @SplitOn<> ''
	            THEN CAST(SUBSTRING(@RowData, 1, CHARINDEX(@SplitOn,@RowData+@SplitOn)-1) as varchar(max))
	            ELSE @RowData 
	       END
	     , 1

	UNION ALL
	
	SELECT STUFF(RowData, 1, CHARINDEX(SplitOn,RowData+SplitOn) + DATALENGTH(SplitOn) - 1, '')
	     , SplitOn
	     , CAST(SUBSTRING(RowData, 1, CHARINDEX(SplitOn,RowData+SplitOn)-1) as varchar(max))
	     , Cnt+1
	FROM CTE
	WHERE RowData <> ''
)
SELECT * FROM CTE
OPTION (MAXRECURSION 0)

Open in new window

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of bchoor

ASKER

Thanks guys. Great options.

@ramlada - nice one on the XML, didn't even occur to me.

Hello, I realize this is an older post...but I've found it very helpful! I have used the XML suggestion as posted by @Ramlada.  I need some assistance with including this in a query, as I'm not too experienced with CTE/recursive queries. When I run the query below, I only return one record. How would I adjust this to return all rows in my table?

DECLARE @RowData varchar(2000)
DECLARE @SplitOn varchar(1)
DECLARE @ObjectID int

SELECT 
@ObjectID = ObjectID, @RowData = ObjectName, @SplitOn = ';' from Objects 

declare @xml as xml
SET @XML  = '<t><r>' +  Replace(@RowData , @spliton, '</r><r>') + '</r></t>'  

select @objectid as objectid, rtrim(ltrim(t.r.value('.', 'VARCHAR(8000)'))) as splitvalue
from @xml.nodes('/t/r') as t(r) 

Open in new window