?
Solved

Function for FAST String Parse and Summation

Posted on 2008-10-03
20
Medium Priority
?
1,252 Views
Last Modified: 2012-05-05
We've got a table that stores monthly cost balances in a semi-colon delimited field, where each delimiter represents a month of the year. The data in the field looks like this:

100;100;100;100;100;100;100;100;100;100;0;50

We need a fast function to sum the balances for all months up to the specified period.  For example, when month 6 is specified it should sum to $600, when month 12 is specified it should sum to $1050.  

We currently sum this up by running a basic split function that parses each part of the string and then adds up those parsed numbers.  The table that stores these values contains many many records and the result is becoming very slow.  The function we're currently running is below.

Does anyone have ideas for a FASTER method for splitting and summing the string values?  Any suggestions are appreciated.

CREATE FUNCTION [dbo].[Split] ( @CharacterExpression VARCHAR(8000), @Delimiter CHAR(1), @Position INTEGER)
 
RETURNS VARCHAR(8000)
AS
BEGIN
 
If @Position<1 return null
if len(@Delimiter)<>1 return null
declare @Start integer
set @Start=1
while @Position>1
 
BEGIN
 
Set @Start=ISNULL(CHARINDEX(@Delimiter, @CharacterExpression, @Start),0)
IF @Start=0 return null
set @position= @position-1
set @Start=@Start+1
 
END
 
Declare @End INTEGER
Set @End= ISNULL(CHARINDEX(@Delimiter, @CharacterExpression, @Start),0)
If @End=0 Set @End=LEN(@CharacterExpression)+1
RETURN SUBSTRING(@CharacterExpression, @Start, @End-@Start)
 
END

Open in new window

0
Comment
Question by:jobprojn
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 6
  • 4
  • +2
20 Comments
 
LVL 60

Expert Comment

by:Kevin Cross
ID: 22638850
You can go with a simplier split modified from your above like http://crossedlogic.blogspot.com/2008/09/converting-delimited-string-to-separate.html -- can use while loop instead of recursion -- and would add an column in table that signified the order in the original string which wll translate to month like 1, 2, 3 ... 12.  I saw an example in a previous question that I will post to shorten development time.

You can then do sum outside of function.
SELECT SUM(CAST(entry AS INT))
FROM dbo.split('100;100;100;100;100;100;100;100;100;100;0;50', ';')
WHERE indexColumn <= 6

Open in new window

0
 
LVL 60

Assisted Solution

by:Kevin Cross
Kevin Cross earned 400 total points
ID: 22638870
I couldn't find the question I saw with an example, but had a rough one in my own database that may give you the jist of what I mean.
ALTER FUNCTION [dbo].[split]
(
        @String VARCHAR(200),
        @Delimiter VARCHAR(5)
)
RETURNS @SplittedValues TABLE
(
  OccurenceId SMALLINT IDENTITY(1,1),
  SplitValue VARCHAR(200)
)
AS
BEGIN
DECLARE @SplitLength INT
 
WHILE LEN(@String) > 0
BEGIN
        SELECT @SplitLength = (CASE CHARINDEX(@Delimiter,@String) WHEN 0 THEN
LEN(@String) ELSE CHARINDEX(@Delimiter,@String) -1  END)
 
        INSERT INTO @SplittedValues
        SELECT ltrim(rtrim(replace(SUBSTRING(@String,1,@SplitLength),'''','')))
 
        SELECT @String = (CASE (LEN(@String) - @SplitLength) WHEN 0 THEN ''
ELSE RIGHT(@String, LEN(@String) - @SplitLength -1 ) END)
END
RETURN
END

Open in new window

0
 
LVL 39

Accepted Solution

by:
BrandonGalderisi earned 600 total points
ID: 22639391
Here's a little thing I put together and it works very well...


select all
select * from [dbo].[fn_DelimitedToTable]('100;100;100;100;100;100;100;100;100;100;0;50',';')

sum all
select sum(cast(theValue as int)) from [dbo].[fn_DelimitedToTable]('100;100;100;100;100;100;100;100;100;100;0;50',';')

sum First X (6 in this case)
select sum(cast(theValue as int)) from [dbo].[fn_DelimitedToTable]('100;100;100;100;100;100;100;100;100;100;0;50',';')
where ident <=6
if object_id('[dbo].[fn_DelimitedToTable]') is not null
     drop function [dbo].[fn_DelimitedToTable]
go
create function [dbo].[fn_DelimitedToTable](@DelimitedString nvarchar(max), @Delimiter nvarchar(32))
returns @Values TABLE
     (ident         int not null identity primary key clustered
     ,thePosition   int not null
     ,theValue      nvarchar(max)
     )
as
begin
 
insert into @Values (thePosition,theValue)
		select n, substring(@delimiter + @DelimitedString + @delimiter, n + (datalength(@delimiter)/2), charindex(@delimiter, @delimiter + @DelimitedString + @delimiter, n + len(@delimiter)) - n - len(@delimiter)) as string_value
		from	dbo.vw_Nums
		where
			n <= (datalength(@delimiter + @DelimitedString + @delimiter)/2) - (datalength(@delimiter)/2)
			and substring(@delimiter + @DelimitedString + @delimiter, n, (datalength(@delimiter)/2)) = @delimiter
 
 
 
return
end
/*
 
Requires:
create view vw_Nums
as
with
       cte0 as (select 1 as c union all select 1), -- 2
       cte1 as (select 1 as c from cte0 a, cte0 b), -- 4
       cte2 as (select 1 as c from cte1 a, cte1 b), -- 16
       cte3 as (select 1 as c from cte2 a, cte2 b), -- 256
       cte4 as (select 1 as c from cte3 a, cte3 b), -- 65,536
       cte5 as (select 1 as c from cte4 a, cte4 b), -- 4,294,967,296 --four BILLION, not million
       nums as (select row_number() over (order by c) as n from cte5)
       select n from nums 
 
 
select * from [dbo].[fn_DelimitedToTable]('a|%25basdf|%25c|%25d','|%25')
 
select theValue from [dbo].[fn_DelimitedToTable]('a','|')
*/
GO

Open in new window

0
Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

 
LVL 5

Expert Comment

by:adlink_la
ID: 22639506
Yes, the bottome line is that there really isn't any "good" way to get at that data.  You can see how much code is required.  If you want fast your schema should allow the data to be accessed without any procedural code.  I don't know if you are able to make a schema change but that would be the way to go if you are having performance problems.

0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22639513
My split function is REALLY fast.  To speed it up, create a physical table in place of vw_nums with a clustered index on the number.  That's what we have in our system.
0
 
LVL 60

Expert Comment

by:Kevin Cross
ID: 22639619
I agree with Brandon as that is what I was showing user in my comments.  The point of the question I think was creating code that allows calculation/split to occur fast during processing NOT what is fastest way to code it if that makes sense.
0
 
LVL 4

Expert Comment

by:jindalankush
ID: 22639966
create table #t
(
      id int identity,
      val int
)
insert  #t
SELECT CAST(data AS INT)
FROM dbo.split(@CharacterExpression , @Delimiter )

select sum(val) from #t
where id < @Position
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22641138
jindalankush:

I'm not saying that yours won't work in the same way, but yours uses looping to find all the individual members and that is not an efficient way to do it.
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22641143
Trust me I know because my first delimited string parser that I wrote worked exactly the same way :).
0
 
LVL 5

Expert Comment

by:adlink_la
ID: 22641773
>>The point of the question I think was creating code that allows calculation/split to occur fast during processing NOT what is fastest way to code it if that makes sense.

Yes, but seeing that we like to refer to ourselves as the "Experts" around here, I think we have an obligation to point things like that out.  To simply provide a solution and ignore an obvious problem is doing a disservice to the person asking the question.  Now granted, the OP has not stated whether he/she has the ability (or desire) to make a change to the schema.  They could be working with a 3rd party product and not have that luxury.  Nonetheless, the point should still be made.

That said, I find Brandon's solution very interesting.  I haven't dug into it yet to totally understand what it is doing but I like the fact that it does not use any looping.  For those times when we do need to parse data out of a column, it looks like a very clean solution.  I am definitely going to build that table and give it a try.  Very nice!
0
 
LVL 60

Expert Comment

by:Kevin Cross
ID: 22641803
Yes, common table expressions are very cool.  The link in my first post doesn't use loops either but uses recursion within function.  Also very fast and low amount of code for what it is doing; however, it doesn't match up to the coolness that is CTE. :)
0
 
LVL 5

Expert Comment

by:adlink_la
ID: 22642490
WOOHOOO!  I LOVE IT!!  I just built the numbers table and tested the function.  Very nice the way the output table gives the the ident, position of the value and the value itself.  This is going to be a handy function to have in the arsenal.  Thanks Brandon!  
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22643333
np.

It has gone through several iterations.  From single character wildcards to multi.  from varchar to nvarchar.  I don't know if this is the most recent that supports space(s) as a delimiter but I have a rendition somewhere that does.  It was used a while back when someone needed to could the number of times each word appeared in a body of text.  I replaced all ,.() with spaces, stripped out double spaces, then used that function as a way to split each body of text into a table of words, sum each and store a counter.  I called it wordify.sql :)

The performance of a static table vs. a cte is big.  While the CTE to generate the #'s is fast, nothing compared to a real table.  But, the CTE I have above is a great way to populate that table if #s.


as a side not adlink_la.  If you do stash that goodie away in case you intend to re-post it here, can you throw my name in somewhere as the author.  I'd appreciate the little bit of advertisement for writing something useful.
0
 
LVL 4

Expert Comment

by:jindalankush
ID: 22648766
hi BrandonGalderisi:
you are right,
i hv read ur solution before my post,
i hv tried to simplfy ur process,
sorry for inconvience
0
 
LVL 5

Expert Comment

by:adlink_la
ID: 22651822
Brandon,
If it ever comes up, rather than reposting the code, I will likely just refer back to this thread.  

At any rate, I have been digging into the code a bit to see exactly what it is doing.  It looks to me like you might want to replace the two occurances of:

len(@delimiter)

with:

(datalength(@delimiter)/2)

The reason I mention it is the LEN() function ignores trailing blanks, so if the delimiter is like *   then the function returns incorrect results.  However, the above change fixes that problem.  

Do you agree?

0
 
LVL 5

Expert Comment

by:adlink_la
ID: 22651859
I guess my quotation marks got replaced by those little boxes.  I was trying to show astrisk, space, space as the delimiter.
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22651959
You're right.  At first it was all len().  I thought I replaced them all.
0
 

Author Closing Comment

by:jobprojn
ID: 31502951
You guys rock.  Thanks.
0
 
LVL 5

Expert Comment

by:adlink_la
ID: 22654237
>>I'd appreciate the little bit of advertisement for writing something useful.

Useful?  You deserve an award for this little nugget my friend.  One thing that I try to impress upon my fellow developers is to try to approach database problems from a database standpoint rather than from a programmer standpoint.  Often the database can do things much more efficiently.  This is a perfect example of that.  What better way to get at the values of a delimited string than to just select them out in one shot!  Absolutely brilliant!  Very nicely done!
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22654256
Just know that the CTE in the view I use I found online, I just don't recall where.  But that is easily replaced with a real numbers table which is what we have in our system.

The numbers is very useful for displaying date ranges as well.  
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

PL/SQL can be a very powerful tool for working directly with database tables. Being able to loop will allow you to perform more complex operations, but can be a little tricky to write correctly. This article will provide examples of basic loops alon…
If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …

719 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question