# Function for FAST String Parse and Summation

We've got a table that stores monthly cost balances in a semi-colon delimited field, where each delimiter represents a month of the year. The data in the field looks like this:

100;100;100;100;100;100;100;100;100;100;0;50

We need a fast function to sum the balances for all months up to the specified period.  For example, when month 6 is specified it should sum to \$600, when month 12 is specified it should sum to \$1050.

We currently sum this up by running a basic split function that parses each part of the string and then adds up those parsed numbers.  The table that stores these values contains many many records and the result is becoming very slow.  The function we're currently running is below.

Does anyone have ideas for a FASTER method for splitting and summing the string values?  Any suggestions are appreciated.

``````CREATE FUNCTION [dbo].[Split] ( @CharacterExpression VARCHAR(8000), @Delimiter CHAR(1), @Position INTEGER)

RETURNS VARCHAR(8000)
AS
BEGIN

If @Position<1 return null
if len(@Delimiter)<>1 return null
declare @Start integer
set @Start=1
while @Position>1

BEGIN

Set @Start=ISNULL(CHARINDEX(@Delimiter, @CharacterExpression, @Start),0)
IF @Start=0 return null
set @position= @position-1
set @Start=@Start+1

END

Declare @End INTEGER
Set @End= ISNULL(CHARINDEX(@Delimiter, @CharacterExpression, @Start),0)
If @End=0 Set @End=LEN(@CharacterExpression)+1
RETURN SUBSTRING(@CharacterExpression, @Start, @End-@Start)

END
``````
###### Who is Participating?

Commented:
Here's a little thing I put together and it works very well...

select all
select * from [dbo].[fn_DelimitedToTable]('100;100;100;100;100;100;100;100;100;100;0;50',';')

sum all
select sum(cast(theValue as int)) from [dbo].[fn_DelimitedToTable]('100;100;100;100;100;100;100;100;100;100;0;50',';')

sum First X (6 in this case)
select sum(cast(theValue as int)) from [dbo].[fn_DelimitedToTable]('100;100;100;100;100;100;100;100;100;100;0;50',';')
where ident <=6
``````if object_id('[dbo].[fn_DelimitedToTable]') is not null
drop function [dbo].[fn_DelimitedToTable]
go
create function [dbo].[fn_DelimitedToTable](@DelimitedString nvarchar(max), @Delimiter nvarchar(32))
returns @Values TABLE
(ident         int not null identity primary key clustered
,thePosition   int not null
,theValue      nvarchar(max)
)
as
begin

insert into @Values (thePosition,theValue)
select n, substring(@delimiter + @DelimitedString + @delimiter, n + (datalength(@delimiter)/2), charindex(@delimiter, @delimiter + @DelimitedString + @delimiter, n + len(@delimiter)) - n - len(@delimiter)) as string_value
from	dbo.vw_Nums
where
n <= (datalength(@delimiter + @DelimitedString + @delimiter)/2) - (datalength(@delimiter)/2)
and substring(@delimiter + @DelimitedString + @delimiter, n, (datalength(@delimiter)/2)) = @delimiter

return
end
/*

Requires:
create view vw_Nums
as
with
cte0 as (select 1 as c union all select 1), -- 2
cte1 as (select 1 as c from cte0 a, cte0 b), -- 4
cte2 as (select 1 as c from cte1 a, cte1 b), -- 16
cte3 as (select 1 as c from cte2 a, cte2 b), -- 256
cte4 as (select 1 as c from cte3 a, cte3 b), -- 65,536
cte5 as (select 1 as c from cte4 a, cte4 b), -- 4,294,967,296 --four BILLION, not million
nums as (select row_number() over (order by c) as n from cte5)
select n from nums

select * from [dbo].[fn_DelimitedToTable]('a|%25basdf|%25c|%25d','|%25')

select theValue from [dbo].[fn_DelimitedToTable]('a','|')
*/
GO
``````
0

Chief Technology OfficerCommented:
You can go with a simplier split modified from your above like http://crossedlogic.blogspot.com/2008/09/converting-delimited-string-to-separate.html -- can use while loop instead of recursion -- and would add an column in table that signified the order in the original string which wll translate to month like 1, 2, 3 ... 12.  I saw an example in a previous question that I will post to shorten development time.

You can then do sum outside of function.
``````SELECT SUM(CAST(entry AS INT))
FROM dbo.split('100;100;100;100;100;100;100;100;100;100;0;50', ';')
WHERE indexColumn <= 6
``````
0

Chief Technology OfficerCommented:
I couldn't find the question I saw with an example, but had a rough one in my own database that may give you the jist of what I mean.
``````ALTER FUNCTION [dbo].[split]
(
@String VARCHAR(200),
@Delimiter VARCHAR(5)
)
RETURNS @SplittedValues TABLE
(
OccurenceId SMALLINT IDENTITY(1,1),
SplitValue VARCHAR(200)
)
AS
BEGIN
DECLARE @SplitLength INT

WHILE LEN(@String) > 0
BEGIN
SELECT @SplitLength = (CASE CHARINDEX(@Delimiter,@String) WHEN 0 THEN
LEN(@String) ELSE CHARINDEX(@Delimiter,@String) -1  END)

INSERT INTO @SplittedValues
SELECT ltrim(rtrim(replace(SUBSTRING(@String,1,@SplitLength),'''','')))

SELECT @String = (CASE (LEN(@String) - @SplitLength) WHEN 0 THEN ''
ELSE RIGHT(@String, LEN(@String) - @SplitLength -1 ) END)
END
RETURN
END
``````
0

Commented:
Yes, the bottome line is that there really isn't any "good" way to get at that data.  You can see how much code is required.  If you want fast your schema should allow the data to be accessed without any procedural code.  I don't know if you are able to make a schema change but that would be the way to go if you are having performance problems.

0

Commented:
My split function is REALLY fast.  To speed it up, create a physical table in place of vw_nums with a clustered index on the number.  That's what we have in our system.
0

Chief Technology OfficerCommented:
I agree with Brandon as that is what I was showing user in my comments.  The point of the question I think was creating code that allows calculation/split to occur fast during processing NOT what is fastest way to code it if that makes sense.
0

Commented:
create table #t
(
id int identity,
val int
)
insert  #t
SELECT CAST(data AS INT)
FROM dbo.split(@CharacterExpression , @Delimiter )

select sum(val) from #t
where id < @Position
0

Commented:
jindalankush:

I'm not saying that yours won't work in the same way, but yours uses looping to find all the individual members and that is not an efficient way to do it.
0

Commented:
Trust me I know because my first delimited string parser that I wrote worked exactly the same way :).
0

Commented:
>>The point of the question I think was creating code that allows calculation/split to occur fast during processing NOT what is fastest way to code it if that makes sense.

Yes, but seeing that we like to refer to ourselves as the "Experts" around here, I think we have an obligation to point things like that out.  To simply provide a solution and ignore an obvious problem is doing a disservice to the person asking the question.  Now granted, the OP has not stated whether he/she has the ability (or desire) to make a change to the schema.  They could be working with a 3rd party product and not have that luxury.  Nonetheless, the point should still be made.

That said, I find Brandon's solution very interesting.  I haven't dug into it yet to totally understand what it is doing but I like the fact that it does not use any looping.  For those times when we do need to parse data out of a column, it looks like a very clean solution.  I am definitely going to build that table and give it a try.  Very nice!
0

Chief Technology OfficerCommented:
Yes, common table expressions are very cool.  The link in my first post doesn't use loops either but uses recursion within function.  Also very fast and low amount of code for what it is doing; however, it doesn't match up to the coolness that is CTE. :)
0

Commented:
WOOHOOO!  I LOVE IT!!  I just built the numbers table and tested the function.  Very nice the way the output table gives the the ident, position of the value and the value itself.  This is going to be a handy function to have in the arsenal.  Thanks Brandon!
0

Commented:
np.

It has gone through several iterations.  From single character wildcards to multi.  from varchar to nvarchar.  I don't know if this is the most recent that supports space(s) as a delimiter but I have a rendition somewhere that does.  It was used a while back when someone needed to could the number of times each word appeared in a body of text.  I replaced all ,.() with spaces, stripped out double spaces, then used that function as a way to split each body of text into a table of words, sum each and store a counter.  I called it wordify.sql :)

The performance of a static table vs. a cte is big.  While the CTE to generate the #'s is fast, nothing compared to a real table.  But, the CTE I have above is a great way to populate that table if #s.

as a side not adlink_la.  If you do stash that goodie away in case you intend to re-post it here, can you throw my name in somewhere as the author.  I'd appreciate the little bit of advertisement for writing something useful.
0

Commented:
hi BrandonGalderisi:
you are right,
i hv read ur solution before my post,
i hv tried to simplfy ur process,
sorry for inconvience
0

Commented:
Brandon,
If it ever comes up, rather than reposting the code, I will likely just refer back to this thread.

At any rate, I have been digging into the code a bit to see exactly what it is doing.  It looks to me like you might want to replace the two occurances of:

len(@delimiter)

with:

(datalength(@delimiter)/2)

The reason I mention it is the LEN() function ignores trailing blanks, so if the delimiter is like *   then the function returns incorrect results.  However, the above change fixes that problem.

Do you agree?

0

Commented:
I guess my quotation marks got replaced by those little boxes.  I was trying to show astrisk, space, space as the delimiter.
0

Commented:
You're right.  At first it was all len().  I thought I replaced them all.
0

Author Commented:
You guys rock.  Thanks.
0

Commented:

Useful?  You deserve an award for this little nugget my friend.  One thing that I try to impress upon my fellow developers is to try to approach database problems from a database standpoint rather than from a programmer standpoint.  Often the database can do things much more efficiently.  This is a perfect example of that.  What better way to get at the values of a delimited string than to just select them out in one shot!  Absolutely brilliant!  Very nicely done!
0

Commented:
Just know that the CTE in the view I use I found online, I just don't recall where.  But that is easily replaced with a real numbers table which is what we have in our system.

The numbers is very useful for displaying date ranges as well.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.