Split and parse long text field in SQL server database

Louis Capece
Louis Capece used Ask the Experts™
on
I have a field that needs to be parsed and split into ordered rows. The sequence of parsing is important.

Sample Data:
CaseID           WorkLog
--------            ----------------------------------------------------------------------------------------------
123001          1263158105~Application Service~Email sent by: Anna Swapswire~1263205523~qiradasu~Status Changed from to Pending~1263892875~qiradasu~The Status field has changed from Pending  to Resolved~

Desired Output:

Case ID            Date                  Owner                  LogAction                              SequeceNumber      
-------------      ---------------            -------------            -------------------------------------------------------            -------------------      
123001            1/10/2010 9:15:05 PM       Application Service            Email sent by: Anna Swapswire                  1
123001            1/11/2010 10:25:23 AM       qiradasu                  Status Changed from to Pending                  2
123001            1/19/2010 9:21:15 AM       qiradasu                  The Status field has changed from Pending  to Resolved      3      


Notes on Sequence of events in the Worklog file:

* The 10 digit numbers are the epoch timestamp, or the number of seconds elapsed since 1/1/1970. This number (date) will always signify a new row.
* The data is delimited by the "~" tilde character.
* The sequence of data in the Worklog field will always be   DATESTAMP-OWNER-LOGACTION, this sequece may repeat as many as 100 times for a sigle Case ID.
* The last character of the Worklog field will always be the Tilde.
* The sequence number must be derived the natural order of the data from left to right. The sequnce is internal just for the particular Case ID. The next Case ID, the sequence will reset to 1.

ParseSeq.txt
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Kevin CrossChief Technology Officer
Most Valuable Expert 2011

Commented:
Take a look here -- http://www.experts-exchange.com/Q_23786715.html

I would suggest using that "as-is" to split the text by the row delimiter on occurrence and then you have a finite pattern (secondary delimiter) by which to split the row into each column of data; however, since you are using the same delimiter in essence for column and row that would be difficult since timestamps are different and so really the ~ is the delimiter.

You can take the code approach (at least mine - as that is the one I can give permission for) and modify it to your own needs here and go through string checking for PATINDEX() of the numerical timestamp and increment occurrence / sequence number as needed, splitting up the columns in the process and building a table containing your required fields.
Commented:
attached is the solution, it contains: split function, x table, and sample script


--sameple data and script

declare @sampledata table
(caseid       int
,worklog      varchar(max))

insert into @sampledata (caseid, worklog)
values (123001, '1263158105~Application Service~Email sent by: Anna Swapswire~1263205523~qiradasu~Status Changed from to Pending~1263892875~qiradasu~The Status field has changed from Pending  to Resolved~')


select s.caseid, g.*
from @sampledata s
cross apply dbo.fSplitWorkLog(s.worklog) g
splitfunction-solution.sql
splitfunction-solution-result.jpg

Author

Commented:
WOW!
Thank you so much for your accurate and rapid response. This has saved me many hours of work!
G GodwinDatabase Administrator

Commented:
Attached, I have worked through your example.
Take a look at it.  You could turn this into a stored proc or a function.
Best Regards,
-G

declare @Case_id int
declare @TXT varchar(max)

set @Case_id = 123001
set @txt = '1263158105~Application Service~Email sent by: Anna Swapswire~1263205523~qiradasu~Status Changed from to Pending~1263892875~qiradasu~The Status field has changed from Pending  to Resolved~'


declare @Result Table(Case_ID int, EvendDate datetime, OwnerName Varchar(255), LogAction varchar(255), SEQ int)

declare @From int
Declare @to int
declare @seq int
declare @TS datetime
declare @OWNER varchar(255)
declare @LOGACTION varchar(255)

set @seq = 0
set @from = 1
set @to = 1

while @from < len(@txt)
	begin
		set @to = charindex('~', @txt)
		if substring(@txt, @from, @to -1) like '%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%' 
			begin
				set @seq = @seq+1
				set @ts = dateadd(ss, convert(int, substring(@txt, @from, @to -1)), '1/1/1970')
 		 		set @txt = substring(@txt, @to+1, len(@txt)-@to)
			end
-- OWNER
		set @to = charindex('~', @txt)
		set @owner = substring(@txt, @from, @to -1)
		set @txt = substring(@txt, @to+1, len(@txt)-@to)
-- LogAction
		set @to = charindex('~', @txt)
		set @LOGACTION = substring(@txt, @from, @to -1)
		set @txt = substring(@txt, @to+1, len(@txt)-@to)

		insert into @result	
		select @Case_id, @tS, @owner, @LOGACTION, @SEQ
		
	end
	
select * from @result

Open in new window

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial