Solved

Scalar Functions Caching

Posted on 2013-06-08
10
334 Views
Last Modified: 2013-06-12
Hello my peers,

A question for Top Experts I always wandered but never dared to ask:  Apart from rewriting the scalar functions as table-valued function, how can one guarantee that an inline function is not row per row processed but processed only once for all set ?

In other words, how can one properly cache inline functions ?  I wander if any of you has found a way around this inherent limitation ?  

PS: Please do not suggest use of anything else than functions.  The entire point of this question is about optimal caching.

Thanks for your input.
0
Comment
Question by:Racim BOUDJAKDJI
  • 6
  • 4
10 Comments
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39232444
You want to know how a scalar function can be forced to NOT work row-by-row?
(and not re-write this/these as table-valued functions)

You can't, a scalar function returns one value only, and will be executed once per row in a resultset. (For caching to be relevant you would need to persist the result somehow.)

Hope I interpreted the question correctly. The heading is"Scalar Functions Caching", but then you go on to ask "how can one guarantee that an inline function is not row per row processed ...".
Inline user-defined functions are a subset of user-defined functions that return a table. Inline functions can be used to achieve the functionality of parameterized views.
http://msdn.microsoft.com/en-us/library/aa214762(v=sql.80).aspx
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39232448
>>The entire point of this question is about optimal caching.

If I have misinterpreted, and the question is really about caching inline functions, then one method would be to employ a common table expression (CTE) of the inline function result (or of a result that involves the inline function). Not sure if this suggestion however breaks with your request to stay strictly within functions.
0
 
LVL 23

Author Comment

by:Racim BOUDJAKDJI
ID: 39232583
If I have misinterpreted, and the question is really about caching inline functions
The exact question is :

how can one guarantee that an inline function is not row per row processed but processed only once for all set ?

To my knowledge, use of CTE do not prevent row-per-row processing even though it promotes reuse of data being cached.  But I'd be happy to be wrong about that.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39232612
If you call a function per row then it will behave that way - per row. CTEs don't prevent that behavior and as you say they simply promote re-use and was mentioned as a method to optimize caching.

this example from the msdn url given before:

CREATE FUNCTION fn_QuarterlySalesByStore
     (
      @StoreID int
     )
RETURNS table
AS
RETURN (
        SELECT *
        FROM SalesDB.dbo.vw_QuarterlySales
        WHERE StoreID = @StoreID
       )

/* this inline function is not operating row-by-row  and returns a (virtual) "table" */      
SELECT *
FROM fn_QuarterlySalesByStore( 14432 )
0
 
LVL 23

Author Comment

by:Racim BOUDJAKDJI
ID: 39232800
I see.  When I mentioned not row by row processing, I was thinking more about a function who uses a column as an input parameter.  I apologize for not being specific enough.  I will retake your example and mention what I am looking for.  Consider the following query

SELECT dbo.fn_QuarterlySalesByStore(column1)
FROM sometable or some view

Now let's suppose some sometable or someview has 2000 rows.  The typical behavior of functions is to prepare and execute the function 2000 times, one for each row returned.  What I want is to have this prepared/cached only once as table valued functions do.  I suspect there maybe important CPU gains from this but I am looking for another way than by using a table valued function.  

I hope this clarifies.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 23

Author Comment

by:Racim BOUDJAKDJI
ID: 39232803
0
 
LVL 48

Accepted Solution

by:
PortletPaul earned 500 total points
ID: 39232849
I would draw the conclusion that the inline function isn't being used effectively, and that although there is a conceptual "neatness" in a function returning a maximum it simply MUST perform more work because it is being called for each row than could be performed by a simple - single - query. Recall that the rows made available to the select clause are provided after both the from and where clauses are completed, and that in effect a correlated subquery in those rows is like looping through a cursor after the bulk of the set operations have done their job.

Placing a subquery in the select clause is the least desirable (i.e. subqueries in the WHERE clause are better, and subqueries in the FROM clause better still - rule of thumb only).

Caching is not the solution I'm afraid - the solution lies in not calling functions under the select clause.

/* equivalent to the scalar function, an explicit correlated subquery performed in the select clause */
select
       productid
    , (
        SELECT MAX(sod.OrderQty)
        FROM Sales.SalesOrderDetail sod
        WHERE sod.ProductId = pod.ProductId
        )
FROM Production.Product pod
    
/* the 'inline function example' as provided, an implict correlated subquery in the select clause */
SELECT
    ProductId,
    (
        SELECT MaxQty
        FROM dbo.GetMaxProductQty_Inline(ProductId)
    ) MaxQty
FROM Production.Product
ORDER BY ProductId

/* how to do it (in my view), without any function at all */
select
  pod.productid
, max(sod.OrderQty)
FROM Production.Product pod
INNER JOIN Sales.SalesOrderDetail sod ON sod.ProductId = pod.ProductId
GROUP BY
  pod.productid

Open in new window

I know there may be other far more valid examples where scalar or inline functions are needed or at least a good idea - but if you call a function per row - it will be performed per row.
0
 
LVL 23

Author Comment

by:Racim BOUDJAKDJI
ID: 39232871
Thanks for bringing perspective.  

You made me realize that I definitely should not have referred to caching but rather optimizer evaluation.  

In normal behavior, each UDF scalar function calls internally behave like a new query issued for each row in terms of IO and CPU.  Rewriting the UDF as table-valued function OTOH, allows the optimizer to physically do less IO's and CPU operations by doing physical operations (IO logical and physical reads) a set level rather than tuple level.  

but if you call a function per row - it will be performed per row.

Logically you are correct.  But the underlying point of this thread is that a function does not have to do that at physical level.  I was looking for alternative to table-value functions to wrap up this possibility.  I will leave this thread open for a few days to give an opportunity to other experts to participate then I will close.

I look at this as a tricky question that can give us a chance to learn new stuff.

Regards.
0
 
LVL 23

Author Comment

by:Racim BOUDJAKDJI
ID: 39240523
Since nobody seems interested in this question I am closing it down.  Thanks for contributing.
0
 
LVL 23

Author Closing Comment

by:Racim BOUDJAKDJI
ID: 39240527
I hope for more input but thanks for the help.
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Truncate vs Delete 63 88
Getting certain data from a string 1 24
Powershell SMO script not working. 18 101
Stored procedure 23 0
When you hear the word proxy, you may become apprehensive. This article will help you to understand Proxy and when it is useful. Let's talk Proxy for SQL Server. (Not in terms of Internet access.) Typically, you'll run into this type of problem w…
JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now