Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 362
  • Last Modified:

Scalar Functions Caching

Hello my peers,

A question for Top Experts I always wandered but never dared to ask:  Apart from rewriting the scalar functions as table-valued function, how can one guarantee that an inline function is not row per row processed but processed only once for all set ?

In other words, how can one properly cache inline functions ?  I wander if any of you has found a way around this inherent limitation ?  

PS: Please do not suggest use of anything else than functions.  The entire point of this question is about optimal caching.

Thanks for your input.
0
Racim BOUDJAKDJI
Asked:
Racim BOUDJAKDJI
  • 6
  • 4
1 Solution
 
PortletPaulCommented:
You want to know how a scalar function can be forced to NOT work row-by-row?
(and not re-write this/these as table-valued functions)

You can't, a scalar function returns one value only, and will be executed once per row in a resultset. (For caching to be relevant you would need to persist the result somehow.)

Hope I interpreted the question correctly. The heading is"Scalar Functions Caching", but then you go on to ask "how can one guarantee that an inline function is not row per row processed ...".
Inline user-defined functions are a subset of user-defined functions that return a table. Inline functions can be used to achieve the functionality of parameterized views.
http://msdn.microsoft.com/en-us/library/aa214762(v=sql.80).aspx
0
 
PortletPaulCommented:
>>The entire point of this question is about optimal caching.

If I have misinterpreted, and the question is really about caching inline functions, then one method would be to employ a common table expression (CTE) of the inline function result (or of a result that involves the inline function). Not sure if this suggestion however breaks with your request to stay strictly within functions.
0
 
Racim BOUDJAKDJIAuthor Commented:
If I have misinterpreted, and the question is really about caching inline functions
The exact question is :

how can one guarantee that an inline function is not row per row processed but processed only once for all set ?

To my knowledge, use of CTE do not prevent row-per-row processing even though it promotes reuse of data being cached.  But I'd be happy to be wrong about that.
0
NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

 
PortletPaulCommented:
If you call a function per row then it will behave that way - per row. CTEs don't prevent that behavior and as you say they simply promote re-use and was mentioned as a method to optimize caching.

this example from the msdn url given before:

CREATE FUNCTION fn_QuarterlySalesByStore
     (
      @StoreID int
     )
RETURNS table
AS
RETURN (
        SELECT *
        FROM SalesDB.dbo.vw_QuarterlySales
        WHERE StoreID = @StoreID
       )

/* this inline function is not operating row-by-row  and returns a (virtual) "table" */      
SELECT *
FROM fn_QuarterlySalesByStore( 14432 )
0
 
Racim BOUDJAKDJIAuthor Commented:
I see.  When I mentioned not row by row processing, I was thinking more about a function who uses a column as an input parameter.  I apologize for not being specific enough.  I will retake your example and mention what I am looking for.  Consider the following query

SELECT dbo.fn_QuarterlySalesByStore(column1)
FROM sometable or some view

Now let's suppose some sometable or someview has 2000 rows.  The typical behavior of functions is to prepare and execute the function 2000 times, one for each row returned.  What I want is to have this prepared/cached only once as table valued functions do.  I suspect there maybe important CPU gains from this but I am looking for another way than by using a table valued function.  

I hope this clarifies.
0
 
Racim BOUDJAKDJIAuthor Commented:
0
 
PortletPaulCommented:
I would draw the conclusion that the inline function isn't being used effectively, and that although there is a conceptual "neatness" in a function returning a maximum it simply MUST perform more work because it is being called for each row than could be performed by a simple - single - query. Recall that the rows made available to the select clause are provided after both the from and where clauses are completed, and that in effect a correlated subquery in those rows is like looping through a cursor after the bulk of the set operations have done their job.

Placing a subquery in the select clause is the least desirable (i.e. subqueries in the WHERE clause are better, and subqueries in the FROM clause better still - rule of thumb only).

Caching is not the solution I'm afraid - the solution lies in not calling functions under the select clause.

/* equivalent to the scalar function, an explicit correlated subquery performed in the select clause */
select
       productid
    , (
        SELECT MAX(sod.OrderQty)
        FROM Sales.SalesOrderDetail sod
        WHERE sod.ProductId = pod.ProductId
        )
FROM Production.Product pod
    
/* the 'inline function example' as provided, an implict correlated subquery in the select clause */
SELECT
    ProductId,
    (
        SELECT MaxQty
        FROM dbo.GetMaxProductQty_Inline(ProductId)
    ) MaxQty
FROM Production.Product
ORDER BY ProductId

/* how to do it (in my view), without any function at all */
select
  pod.productid
, max(sod.OrderQty)
FROM Production.Product pod
INNER JOIN Sales.SalesOrderDetail sod ON sod.ProductId = pod.ProductId
GROUP BY
  pod.productid

Open in new window

I know there may be other far more valid examples where scalar or inline functions are needed or at least a good idea - but if you call a function per row - it will be performed per row.
0
 
Racim BOUDJAKDJIAuthor Commented:
Thanks for bringing perspective.  

You made me realize that I definitely should not have referred to caching but rather optimizer evaluation.  

In normal behavior, each UDF scalar function calls internally behave like a new query issued for each row in terms of IO and CPU.  Rewriting the UDF as table-valued function OTOH, allows the optimizer to physically do less IO's and CPU operations by doing physical operations (IO logical and physical reads) a set level rather than tuple level.  

but if you call a function per row - it will be performed per row.

Logically you are correct.  But the underlying point of this thread is that a function does not have to do that at physical level.  I was looking for alternative to table-value functions to wrap up this possibility.  I will leave this thread open for a few days to give an opportunity to other experts to participate then I will close.

I look at this as a tricky question that can give us a chance to learn new stuff.

Regards.
0
 
Racim BOUDJAKDJIAuthor Commented:
Since nobody seems interested in this question I am closing it down.  Thanks for contributing.
0
 
Racim BOUDJAKDJIAuthor Commented:
I hope for more input but thanks for the help.
0

Featured Post

Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

  • 6
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now