Function for FAST String Parse and Summation

You can go with a simplier split modified from your above like http://crossedlogic.blogspot.com/2008/09/converting-delimited-string-to-separate.html -- can use while loop instead of recursion -- and would add an column in table that signified the order in the original string which wll translate to month like 1, 2, 3 ... 12. I saw an example in a previous question that I will post to shorten development time.

You can then do sum outside of function.

SELECT SUM(CAST(entry AS INT))
FROM dbo.split('100;100;100;100;100;100;100;100;100;100;0;50', ';')
WHERE indexColumn <= 6

Open in new window

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Yes, the bottome line is that there really isn't any "good" way to get at that data. You can see how much code is required. If you want fast your schema should allow the data to be accessed without any procedural code. I don't know if you are able to make a schema change but that would be the way to go if you are having performance problems.

My split function is REALLY fast. To speed it up, create a physical table in place of vw_nums with a clustered index on the number. That's what we have in our system.

I agree with Brandon as that is what I was showing user in my comments. The point of the question I think was creating code that allows calculation/split to occur fast during processing NOT what is fastest way to code it if that makes sense.

jindalankush

create table #t
(
id int identity,
val int
)
insert #t
SELECT CAST(data AS INT)
FROM dbo.split(@CharacterExpression , @Delimiter )

select sum(val) from #t
where id < @Position

jindalankush:

I'm not saying that yours won't work in the same way, but yours uses looping to find all the individual members and that is not an efficient way to do it.

Trust me I know because my first delimited string parser that I wrote worked exactly the same way :).

>>The point of the question I think was creating code that allows calculation/split to occur fast during processing NOT what is fastest way to code it if that makes sense.

Yes, but seeing that we like to refer to ourselves as the "Experts" around here, I think we have an obligation to point things like that out. To simply provide a solution and ignore an obvious problem is doing a disservice to the person asking the question. Now granted, the OP has not stated whether he/she has the ability (or desire) to make a change to the schema. They could be working with a 3rd party product and not have that luxury. Nonetheless, the point should still be made.

That said, I find Brandon's solution very interesting. I haven't dug into it yet to totally understand what it is doing but I like the fact that it does not use any looping. For those times when we do need to parse data out of a column, it looks like a very clean solution. I am definitely going to build that table and give it a try. Very nice!

Yes, common table expressions are very cool. The link in my first post doesn't use loops either but uses recursion within function. Also very fast and low amount of code for what it is doing; however, it doesn't match up to the coolness that is CTE. :)

WOOHOOO! I LOVE IT!! I just built the numbers table and tested the function. Very nice the way the output table gives the the ident, position of the value and the value itself. This is going to be a handy function to have in the arsenal. Thanks Brandon!

np.

It has gone through several iterations. From single character wildcards to multi. from varchar to nvarchar. I don't know if this is the most recent that supports space(s) as a delimiter but I have a rendition somewhere that does. It was used a while back when someone needed to could the number of times each word appeared in a body of text. I replaced all ,.() with spaces, stripped out double spaces, then used that function as a way to split each body of text into a table of words, sum each and store a counter. I called it wordify.sql :)

The performance of a static table vs. a cte is big. While the CTE to generate the #'s is fast, nothing compared to a real table. But, the CTE I have above is a great way to populate that table if #s.

as a side not adlink_la. If you do stash that goodie away in case you intend to re-post it here, can you throw my name in somewhere as the author. I'd appreciate the little bit of advertisement for writing something useful.

jindalankush

hi BrandonGalderisi:
you are right,
i hv read ur solution before my post,
i hv tried to simplfy ur process,
sorry for inconvience

Brandon,
If it ever comes up, rather than reposting the code, I will likely just refer back to this thread.

At any rate, I have been digging into the code a bit to see exactly what it is doing. It looks to me like you might want to replace the two occurances of:

len(@delimiter)

with:

(datalength(@delimiter)/2)

The reason I mention it is the LEN() function ignores trailing blanks, so if the delimiter is like * then the function returns incorrect results. However, the above change fixes that problem.

Do you agree?

I guess my quotation marks got replaced by those little boxes. I was trying to show astrisk, space, space as the delimiter.

You're right. At first it was all len(). I thought I replaced them all.

jobprojn

ASKER

You guys rock. Thanks.

>>I'd appreciate the little bit of advertisement for writing something useful.

Useful? You deserve an award for this little nugget my friend. One thing that I try to impress upon my fellow developers is to try to approach database problems from a database standpoint rather than from a programmer standpoint. Often the database can do things much more efficiently. This is a perfect example of that. What better way to get at the values of a delimited string than to just select them out in one shot! Absolutely brilliant! Very nicely done!