Solved

Function for FAST String Parse and Summation

Posted on 2008-10-03
20
1,224 Views
Last Modified: 2012-05-05
We've got a table that stores monthly cost balances in a semi-colon delimited field, where each delimiter represents a month of the year. The data in the field looks like this:

100;100;100;100;100;100;100;100;100;100;0;50

We need a fast function to sum the balances for all months up to the specified period.  For example, when month 6 is specified it should sum to $600, when month 12 is specified it should sum to $1050.  

We currently sum this up by running a basic split function that parses each part of the string and then adds up those parsed numbers.  The table that stores these values contains many many records and the result is becoming very slow.  The function we're currently running is below.

Does anyone have ideas for a FASTER method for splitting and summing the string values?  Any suggestions are appreciated.

CREATE FUNCTION [dbo].[Split] ( @CharacterExpression VARCHAR(8000), @Delimiter CHAR(1), @Position INTEGER)
 

RETURNS VARCHAR(8000)

AS

BEGIN
 

If @Position<1 return null

if len(@Delimiter)<>1 return null

declare @Start integer

set @Start=1

while @Position>1
 

BEGIN
 

Set @Start=ISNULL(CHARINDEX(@Delimiter, @CharacterExpression, @Start),0)

IF @Start=0 return null

set @position= @position-1

set @Start=@Start+1
 

END
 

Declare @End INTEGER

Set @End= ISNULL(CHARINDEX(@Delimiter, @CharacterExpression, @Start),0)

If @End=0 Set @End=LEN(@CharacterExpression)+1

RETURN SUBSTRING(@CharacterExpression, @Start, @End-@Start)
 

END

Open in new window

0
Comment
Question by:jobprojn
  • 7
  • 6
  • 4
  • +2
20 Comments
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
You can go with a simplier split modified from your above like http://crossedlogic.blogspot.com/2008/09/converting-delimited-string-to-separate.html -- can use while loop instead of recursion -- and would add an column in table that signified the order in the original string which wll translate to month like 1, 2, 3 ... 12.  I saw an example in a previous question that I will post to shorten development time.

You can then do sum outside of function.
SELECT SUM(CAST(entry AS INT))

FROM dbo.split('100;100;100;100;100;100;100;100;100;100;0;50', ';')

WHERE indexColumn <= 6

Open in new window

0
 
LVL 59

Assisted Solution

by:Kevin Cross
Kevin Cross earned 100 total points
Comment Utility
I couldn't find the question I saw with an example, but had a rough one in my own database that may give you the jist of what I mean.
ALTER FUNCTION [dbo].[split]

(

        @String VARCHAR(200),

        @Delimiter VARCHAR(5)

)

RETURNS @SplittedValues TABLE

(

  OccurenceId SMALLINT IDENTITY(1,1),

  SplitValue VARCHAR(200)

)

AS

BEGIN

DECLARE @SplitLength INT
 

WHILE LEN(@String) > 0

BEGIN

        SELECT @SplitLength = (CASE CHARINDEX(@Delimiter,@String) WHEN 0 THEN

LEN(@String) ELSE CHARINDEX(@Delimiter,@String) -1  END)
 

        INSERT INTO @SplittedValues

        SELECT ltrim(rtrim(replace(SUBSTRING(@String,1,@SplitLength),'''','')))
 

        SELECT @String = (CASE (LEN(@String) - @SplitLength) WHEN 0 THEN ''

ELSE RIGHT(@String, LEN(@String) - @SplitLength -1 ) END)

END

RETURN

END

Open in new window

0
 
LVL 39

Accepted Solution

by:
BrandonGalderisi earned 150 total points
Comment Utility
Here's a little thing I put together and it works very well...


select all
select * from [dbo].[fn_DelimitedToTable]('100;100;100;100;100;100;100;100;100;100;0;50',';')

sum all
select sum(cast(theValue as int)) from [dbo].[fn_DelimitedToTable]('100;100;100;100;100;100;100;100;100;100;0;50',';')

sum First X (6 in this case)
select sum(cast(theValue as int)) from [dbo].[fn_DelimitedToTable]('100;100;100;100;100;100;100;100;100;100;0;50',';')
where ident <=6
if object_id('[dbo].[fn_DelimitedToTable]') is not null

     drop function [dbo].[fn_DelimitedToTable]

go

create function [dbo].[fn_DelimitedToTable](@DelimitedString nvarchar(max), @Delimiter nvarchar(32))

returns @Values TABLE

     (ident         int not null identity primary key clustered

     ,thePosition   int not null

     ,theValue      nvarchar(max)

     )

as

begin
 

insert into @Values (thePosition,theValue)

		select n, substring(@delimiter + @DelimitedString + @delimiter, n + (datalength(@delimiter)/2), charindex(@delimiter, @delimiter + @DelimitedString + @delimiter, n + len(@delimiter)) - n - len(@delimiter)) as string_value

		from	dbo.vw_Nums

		where

			n <= (datalength(@delimiter + @DelimitedString + @delimiter)/2) - (datalength(@delimiter)/2)

			and substring(@delimiter + @DelimitedString + @delimiter, n, (datalength(@delimiter)/2)) = @delimiter
 
 
 

return

end

/*
 

Requires:

create view vw_Nums

as

with

       cte0 as (select 1 as c union all select 1), -- 2

       cte1 as (select 1 as c from cte0 a, cte0 b), -- 4

       cte2 as (select 1 as c from cte1 a, cte1 b), -- 16

       cte3 as (select 1 as c from cte2 a, cte2 b), -- 256

       cte4 as (select 1 as c from cte3 a, cte3 b), -- 65,536

       cte5 as (select 1 as c from cte4 a, cte4 b), -- 4,294,967,296 --four BILLION, not million

       nums as (select row_number() over (order by c) as n from cte5)

       select n from nums 
 
 

select * from [dbo].[fn_DelimitedToTable]('a|%25basdf|%25c|%25d','|%25')
 

select theValue from [dbo].[fn_DelimitedToTable]('a','|')

*/

GO

Open in new window

0
 
LVL 5

Expert Comment

by:adlink_la
Comment Utility
Yes, the bottome line is that there really isn't any "good" way to get at that data.  You can see how much code is required.  If you want fast your schema should allow the data to be accessed without any procedural code.  I don't know if you are able to make a schema change but that would be the way to go if you are having performance problems.

0
 
LVL 39

Expert Comment

by:BrandonGalderisi
Comment Utility
My split function is REALLY fast.  To speed it up, create a physical table in place of vw_nums with a clustered index on the number.  That's what we have in our system.
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
I agree with Brandon as that is what I was showing user in my comments.  The point of the question I think was creating code that allows calculation/split to occur fast during processing NOT what is fastest way to code it if that makes sense.
0
 
LVL 4

Expert Comment

by:jindalankush
Comment Utility
create table #t
(
      id int identity,
      val int
)
insert  #t
SELECT CAST(data AS INT)
FROM dbo.split(@CharacterExpression , @Delimiter )

select sum(val) from #t
where id < @Position
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
Comment Utility
jindalankush:

I'm not saying that yours won't work in the same way, but yours uses looping to find all the individual members and that is not an efficient way to do it.
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
Comment Utility
Trust me I know because my first delimited string parser that I wrote worked exactly the same way :).
0
 
LVL 5

Expert Comment

by:adlink_la
Comment Utility
>>The point of the question I think was creating code that allows calculation/split to occur fast during processing NOT what is fastest way to code it if that makes sense.

Yes, but seeing that we like to refer to ourselves as the "Experts" around here, I think we have an obligation to point things like that out.  To simply provide a solution and ignore an obvious problem is doing a disservice to the person asking the question.  Now granted, the OP has not stated whether he/she has the ability (or desire) to make a change to the schema.  They could be working with a 3rd party product and not have that luxury.  Nonetheless, the point should still be made.

That said, I find Brandon's solution very interesting.  I haven't dug into it yet to totally understand what it is doing but I like the fact that it does not use any looping.  For those times when we do need to parse data out of a column, it looks like a very clean solution.  I am definitely going to build that table and give it a try.  Very nice!
0
Complete Microsoft Windows PC® & Mac Backup

Backup and recovery solutions to protect all your PCs & Mac– on-premises or in remote locations. Acronis backs up entire PC or Mac with patented reliable disk imaging technology and you will be able to restore workstations to a new, dissimilar hardware in minutes.

 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
Yes, common table expressions are very cool.  The link in my first post doesn't use loops either but uses recursion within function.  Also very fast and low amount of code for what it is doing; however, it doesn't match up to the coolness that is CTE. :)
0
 
LVL 5

Expert Comment

by:adlink_la
Comment Utility
WOOHOOO!  I LOVE IT!!  I just built the numbers table and tested the function.  Very nice the way the output table gives the the ident, position of the value and the value itself.  This is going to be a handy function to have in the arsenal.  Thanks Brandon!  
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
Comment Utility
np.

It has gone through several iterations.  From single character wildcards to multi.  from varchar to nvarchar.  I don't know if this is the most recent that supports space(s) as a delimiter but I have a rendition somewhere that does.  It was used a while back when someone needed to could the number of times each word appeared in a body of text.  I replaced all ,.() with spaces, stripped out double spaces, then used that function as a way to split each body of text into a table of words, sum each and store a counter.  I called it wordify.sql :)

The performance of a static table vs. a cte is big.  While the CTE to generate the #'s is fast, nothing compared to a real table.  But, the CTE I have above is a great way to populate that table if #s.


as a side not adlink_la.  If you do stash that goodie away in case you intend to re-post it here, can you throw my name in somewhere as the author.  I'd appreciate the little bit of advertisement for writing something useful.
0
 
LVL 4

Expert Comment

by:jindalankush
Comment Utility
hi BrandonGalderisi:
you are right,
i hv read ur solution before my post,
i hv tried to simplfy ur process,
sorry for inconvience
0
 
LVL 5

Expert Comment

by:adlink_la
Comment Utility
Brandon,
If it ever comes up, rather than reposting the code, I will likely just refer back to this thread.  

At any rate, I have been digging into the code a bit to see exactly what it is doing.  It looks to me like you might want to replace the two occurances of:

len(@delimiter)

with:

(datalength(@delimiter)/2)

The reason I mention it is the LEN() function ignores trailing blanks, so if the delimiter is like *   then the function returns incorrect results.  However, the above change fixes that problem.  

Do you agree?

0
 
LVL 5

Expert Comment

by:adlink_la
Comment Utility
I guess my quotation marks got replaced by those little boxes.  I was trying to show astrisk, space, space as the delimiter.
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
Comment Utility
You're right.  At first it was all len().  I thought I replaced them all.
0
 

Author Closing Comment

by:jobprojn
Comment Utility
You guys rock.  Thanks.
0
 
LVL 5

Expert Comment

by:adlink_la
Comment Utility
>>I'd appreciate the little bit of advertisement for writing something useful.

Useful?  You deserve an award for this little nugget my friend.  One thing that I try to impress upon my fellow developers is to try to approach database problems from a database standpoint rather than from a programmer standpoint.  Often the database can do things much more efficiently.  This is a perfect example of that.  What better way to get at the values of a delimited string than to just select them out in one shot!  Absolutely brilliant!  Very nicely done!
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
Comment Utility
Just know that the CTE in the view I use I found online, I just don't recall where.  But that is easily replaced with a real numbers table which is what we have in our system.

The numbers is very useful for displaying date ranges as well.  
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

INTRODUCTION: While tying your database objects into builds and your enterprise source control system takes a third-party product (like Visual Studio Database Edition or Red-Gate's SQL Source Control), you can achieve some protection using a sing…
In this article I will describe the Backup & Restore method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
Illustrator's Shape Builder tool will let you combine shapes visually and interactively. This video shows the Mac version, but the tool works the same way in Windows. To follow along with this video, you can draw your own shapes or download the file…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now