Parsing delimited data into their own fields

MariaHalt
MariaHalt used Ask the Experts™
on
I have a table with a text field that contains data that has lot of dots in it.  The number of dots vary.  I parse the field into several fields by altering the table based on the number of dots I find plus 1.  CharacterCount is just a scalar function.  There must be a better, simpler way than the dynamic sql I'm using, if there is, please share it with me.  I've included a sample set of data to test with.  Thanks.
use tempdb

go

CREATE FUNCTION [dbo].[CharacterCount] ( @Input VARCHAR(1000), @Character CHAR(1) )
RETURNS INT
BEGIN

RETURN (LEN(@Input) - LEN(REPLACE(@Input, @Character, '')))

END

go

IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE name = 'mh_table' AND type = 'U' AND uid = 1)
DROP TABLE tempdb.dbo.mh_table

go

CREATE TABLE tempdb.dbo.mh_table (id int not null, descr varchar(255) null, copy_of_descr varchar(255))

go

INSERT INTO tempdb.dbo.mh_table VALUES (1000, 'abc', 'abc')
INSERT INTO tempdb.dbo.mh_table VALUES (1001, 'def', 'def')
INSERT INTO tempdb.dbo.mh_table VALUES (1002, 'ghi', 'ghi')
INSERT INTO tempdb.dbo.mh_table VALUES (9, 'abc.level8', 'abc.level8')
INSERT INTO tempdb.dbo.mh_table VALUES (10, 'abc.level9', 'abc.level9')
INSERT INTO tempdb.dbo.mh_table VALUES (11, 'abc.level10', 'abc.level10')
INSERT INTO tempdb.dbo.mh_table VALUES (12, 'abc.level11', 'abc.level11')
INSERT INTO tempdb.dbo.mh_table VALUES (13, 'abc.level12', 'abc.level12')

INSERT INTO tempdb.dbo.mh_table VALUES (61, 'def.grade1', 'def.grade1')
INSERT INTO tempdb.dbo.mh_table VALUES (62, 'def.grade3', 'def.grade3')
INSERT INTO tempdb.dbo.mh_table VALUES (63, 'def.grade5', 'def.grade5')
INSERT INTO tempdb.dbo.mh_table VALUES (644, 'def.grade5', 'def.grade5')

INSERT INTO tempdb.dbo.mh_table VALUES (789, 'ghi.topicA.level10.a', 'ghi.topicA.level10.a')
INSERT INTO tempdb.dbo.mh_table VALUES (792, 'ghi.topicA.level10.b', 'ghi.topicA.level10.b')
INSERT INTO tempdb.dbo.mh_table VALUES (799, 'ghi.topicA.level10.c', 'ghi.topicA.level10.c')
INSERT INTO tempdb.dbo.mh_table VALUES (800, 'ghi.topicA.level10.x', 'ghi.topicA.level10.x')

go

SET QUOTED_IDENTIFIER OFF

DECLARE @SQL VARCHAR(500), 
		@FIELDS_TO_ADD INT,
		@LOOP INT

--Determine how many fields will need to be added to the table 
SELECT @FIELDS_TO_ADD = MAX(tempdb.dbo.CharacterCount(descr, '.')) + 1 FROM tempdb.dbo.mh_table

--Loop 1/3
--Alter table, add fields to parse complete number out
SELECT @LOOP = 1

WHILE @LOOP <= @FIELDS_TO_ADD
	BEGIN	
		SELECT @SQL = ""	
		SELECT @SQL = "ALTER TABLE tempdb.dbo.mh_table ADD CN" + CONVERT(VARCHAR, @LOOP) + " NVARCHAR(25) NULL " 
		
		EXEC (@SQL)
		
		--Try 1 of 2
		SELECT @SQL = ""
		SELECT @SQL = "UPDATE tempdb.dbo.mh_table " + 
                      "SET copy_of_descr = SUBSTRING(copy_of_descr, CHARINDEX('.', copy_of_descr, 1) + 1, LEN(copy_of_descr) - CHARINDEX('.', copy_of_descr, 1)), " + 
                      "    CN" + CONVERT(VARCHAR, @LOOP) + " = SUBSTRING(copy_of_descr, 1, CHARINDEX('.', copy_of_descr, 1) - 1) " +
                      "WHERE CHARINDEX('.', copy_of_descr, 1) - 1 >= 0 " 
                      
		--PRINT @SQL
        EXEC (@SQL)
        
        --Try 2 0f 2
        SELECT @SQL = ""
		SELECT @SQL = "UPDATE tempdb.dbo.mh_table " + 
                      "SET copy_of_descr = '', " +
                      "    CN" + CONVERT(VARCHAR, @LOOP) + " = copy_of_descr " +
                      "WHERE CHARINDEX('.', copy_of_descr, 1) = 0 " + 
                      "AND CN" + CONVERT(VARCHAR, @LOOP) + " IS NULL "
                      
		--PRINT @SQL
        EXEC (@SQL)
		
		SELECT @LOOP = @LOOP + 1
	END	

SELECT * FROM tempdb.dbo.mh_table

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Expert of the Quarter 2010
Expert of the Year 2010
Commented:
I was going to suggest dynamic pivot columns, but since you are actually modifying the base table and housing the data permanently, there is no escaping the ALTER table commands.

Next the operation of how many to add, the operation there would be to count it and either loop or build a single string using a select statement over a number table, but that is not significantly faster for adding x number of columns.

Then, the actual splitting to columns and populating.  What you have done is progressively consume and shorten the source string and populating rightwards.  Another was may have been to use a split-into-rows function and pivoting back into columns.  Yet I cannot really see that being much faster.

In summary, and I could be wrong here, I don't really see any problems with what you already have.

Author

Commented:
What's the split into rows function?
Expert of the Quarter 2010
Expert of the Year 2010

Commented:
This question is something similar to what you are doing, see my comment there.

http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SQL-Server-2005/Q_26559811.html#33956075
Should you be charging more for IT Services?

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
And how do the rows become fields, sorry, I've never used pivot before.
Expert of the Quarter 2010
Expert of the Year 2010

Commented:
Is your data to be stored permanently into fields?
Do you want to go through splitting each time data is queried?

Have a read of the article first and let me know if you have specific problems with it.

Author

Commented:
No...the splitting is really there so that I can fix the pieces of data that contain both letters and numbers, so it can later be sorted alphabetically.  For example, abc.G5level10, comes before abc.G5level8.  So I'm splitting them up into 'abc' and 'G5level8', then using a function, inserting a ~ between the letters and numbers to yield 'G~5~level~8', just so that I can split them up again into their own fields, 'G', '5', 'level', '8' and then find the max character lengh on the field so that I can stick a leading zero in front of the numbers so they will sort correctly.  Once I got the sort right, I insert the records into a table that uses an identity field to keep the original data in order.

Author

Commented:
For example, abc.G05level010, will then come after, abc.G05level008, alphabetically.
Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018

Commented:
Well, how about using "magic numbers" to delimit dynamically in a single query...


select id, descr, substring(descr,n,charindex('.',descr+'.',n+1)-n) as component
from tempdb.dbo.mh_table
cross join (select number n from master..spt_values where type = 'P') numbers
where substring('.'+descr,n,1) = '.'
and n < len(descr)


have a read of the article : http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/A_1221-Fun-with-MS-SQL-spt-values-for-delimited-strings-and-virtual-calendars.html 
Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018

Commented:
As for replacing the numerics that will be fun...

How big is the table ? If you can give a good indication of size (ie count of distinct descriptions) then maybe a translation table might be the go.

Author

Commented:
Time ran out, I just stuck with the dynamic sql I originally had.  Thanks for responding.
Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018

Commented:
Glad I didnt post my number conversion routine then *laughing*

Author

Commented:
Sorry Mark, If it comes up again, I'll give your solution a try.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial