?
Solved

Simple UDF Slowing Query Massively - SQL Server 2000

Posted on 2006-06-08
10
Medium Priority
?
454 Views
Last Modified: 2012-06-27
I'm not a database novice, but this problem has me very confused.

I had a sp with query something like this:
Note that this is a large table, but the subset_code significantly limits the rows returned.

SELECT * FROM mytable a
WHERE a.subset_code = 301451
AND ( LTRIM(a.filterstring) like 'AAA B%'
   OR LTRIM(a.filterstring) like 'ABB A%'
   OR LTRIM(a.filterstring) like 'DFF%' )

I need to do this same test on the 'filterstring' in many different SPs and these filterstrings often change which is obviously a maintenance nightmare so I thought I would encapsulate that in a nice simple UDF:
(I have reduced the list of strings significantly for illustrative purposes)

CREATE FUNCTION [dbo].[fn_check_filter_string]
(
      @filter_string varchar(20)
)
RETURNS int
AS
BEGIN
DECLARE @return int
SET @return = 0

IF ( LTRIM(@filter_string) like 'AAA B%'
   OR LTRIM(@filter_string) like 'ABB A%'
   OR LTRIM(@filter_string) like 'DFF%')
BEGIN
      SET @return = 1 -- Yes, it's a match
END
return @return
END

And so my query changes slightly to:

SELECT * FROM mytable a
WHERE a.subset_code = 301451
AND 1 = dbo.fn_check_filter_string(a.filterstring)

Here comes the problem. Even though the UDF is mind-numbingly simple and it's essentially entirely inlinable, on /some/ databases the second (new) version of the query takes massively longer. Where the original query is sub-second, the new version could be in the order of 10 seconds. On other databases they run at the same speed.

Perhaps SQL Server is evaluating my where clause in a stupid order such that the function is being called to do string matching on the entire table before the limiting factor of the subset_code is evaluated? Why is that the case when the code is in a function instead of directly inlined? Is SQL Server really that stupid? How can I force it to evaluate the set limiting criteria in the correct order (i.e. subset_code first, then the function)?
Updating stats seems to have no effect on SQL Servers choice of execution plan.

As a software developer I like the encapsulation and re-use that UDFs offer but such big problems with simple UDFs but I can't use them unless I can prevent such unexpected performace impacts.
0
Comment
Question by:eeriksson
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 16863045
the main difference is that when using the function, the funcation is called again and again, and has to initialize the list of strings you compare to again and again.

now, when you say that on other databases the speed with/without the udf is the same, then there is something on your server diffent. The first thing I would check is if SQL Server was not using all RAM, and potentially being swapping to the pagefile...
0
 

Author Comment

by:eeriksson
ID: 16863155
angelIII,
The databases are on the same server. To be honest I don't think resources are the issue here. The difference is too significant and is much more likely to be to do with the way SQL Server generates an execution plan to optimise the query.
0
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 16863179
>To be honest I don't think resources are the issue here.

what about the transaction log files of the 2 databases? are they "full", resp both backed up regulary, or are the 2 databases in simple recovery mode?

There MUST be a difference.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 50

Expert Comment

by:Lowfatspread
ID: 16863457
you are right to like encapsulation ...

however perhaps the correct place for this encapsulation is in a view ... rather than a udf...

you could also look to create a table of these conditions and do a joined search on those...

i'd also suggest that rather than Ltriming  (if it can be avoided) a straight  like '%xxx%' may perform better..
0
 

Author Comment

by:eeriksson
ID: 16864338
I think the 2 databases is a bit of a red herring.
The real question is why, on the same database, does the UDF version of the query take up to 100 times longer to run than the non-UDF version?

I had thought of a view-based solution but that doesn't feel right because it's introducing an extra set merge where there really isn't one - it should be a simple selection criteria on the one table.

I think theoretically the UDF is the correct solution but practically speaking I may have to abandon it and go back to duplicated (but fast) SQL code or an encapsulated (but unnecessary) view/secondary table unless someone can get to the real heart of this question and explain why the UDF makes the query SOOO much slower.
0
 
LVL 7

Expert Comment

by:LandyJ
ID: 16864436
Angel is correct.  The reason why using the function is slower is because you are running the code:

SELECT * FROM mytable a
WHERE a.subset_code = 301451
AND ( LTRIM(a.filterstring) like 'AAA B%'
   OR LTRIM(a.filterstring) like 'ABB A%'
   OR LTRIM(a.filterstring) like 'DFF%' )

for EVERY record in the table.  Not sure if this will work for your situation, but you can try having your function return a table.  Something like (I don't have QA available at the moment, so syntax may be off):

CREATE FUNCTION [dbo].[fn_check_filter_string]
(
     @subset_code int,
    @filter_string varchar(20)
)
RETURNS @tbl TABLE (recID int)
AS
BEGIN

declare @sql varchar(1000)
SET @sql = 'INSERT INTO @tbl
   SELECT recID FROM mytable a
   WHERE a.subset_code = ' + cast(@subset_code as varchar(10)) +  
   'AND ( LTRIM(a.filterstring) like ''' + @filter_string + '%')'

EXEC @sql @subset_code, @filter_string, @tbl = @tbl OUTPUT

RETURN
END


Then your query becomes:

SELECT * FROM mytable a
   inner join dbo.fn_check_filter_string(a.subset_code,'AAA B') t1 on t1.recID = a.recID
   inner join dbo.fn_check_filter_string(a.subset_code, 'ABB A') t2 on t2.recID = a.recID
   inner join dbo.fn_check_filter_string(a.subset_code, 'DFF') t3 on t3.recID = a.recID
   

hth,
Landy
0
 

Author Comment

by:eeriksson
ID: 16864517
Ok. Perhaps In trying to provide lots of detail I wasn't clear enough..

This query runs *very* quickly:

SELECT * FROM mytable a
WHERE a.subset_code = 301451
AND ( LTRIM(a.filterstring) like 'AAA B%'
   OR LTRIM(a.filterstring) like 'ABB A%'
   OR LTRIM(a.filterstring) like 'DFF%' )

This query runs *very* slowly:

SELECT * FROM mytable a
WHERE a.subset_code = 301451
AND 1 = dbo.fn_check_filter_string(a.filterstring)

Why?

The idea of replacing my WHERE clause with n joins fills me with horror.
0
 
LVL 7

Expert Comment

by:LandyJ
ID: 16864997
Are the databases identical in data?  The function runs once for each record, so if in one db there are only 10 records with that subset_code, the function will only run 10 times, but if you have 10,000 records, you're executing that function 10,000 times. If the db's are identical or similar, have you checked indexes?

And looking closer, the inner joins won't work in this instance.  You are looking for OR, not AND on the filters. Sorry about that.

Do you know the list of filters you need to check against? Are they in a table, or can you build a delimited string of them all?  Hopefully this might work or point you in the right direction:

declare @filter_str varchar(1000)
select @filter_str = ''
Select @filter_str = @filter_str + ',' + filter_string from tblFilterStrings
--Strip the leading ','
Select @filter_str = Right(@filter_str, len(@fileter_str) - 1)

SELECT * FROM mytable a
WHERE a.subset_code = 301451
AND
ltrim(a.filterstring) in (Select val from dbo.udf_strSplit(@filter_str, ',') )


Which uses:
CREATE FUNCTION  dbo.udf_strSplit (
                  @string varchar(8000),
                  @splitter char( 1)
                  )
RETURNS @res TABLE (val varchar( 8000))
AS
Begin
      IF SUBSTRING (@string, len ( @string), 1) <> @splitter
            SET @string= @string+@splitter

      Declare @start bigint,
            @word varchar( 8000),
            @charindex bigint,
            @i bigint
      SELECT @i=1, @start=1, @charindex= charindex( @splitter, @string, @start)

      WHILE (@charindex <> 0)
        Begin
            SET @word= substring( @string, @start, @charindex - @start)
            SET @start= @charindex +1
            SET @charindex= charindex( @splitter, @string, @start)
            INSERT INTO @res  VALUES (@word)
            SET @i=@i+1
      End

      RETURN
end



hth,
Landy






0
 
LVL 7

Accepted Solution

by:
LandyJ earned 500 total points
ID: 16865144
Sounds like a good bluff: The execution plan for the first query is probably selecting the set of subset_code = 301451 first, then filters the stings from that.  The query with the function probably generates the set of filter_strings, then finds all the subset_code = 301451 second.  

Going down that road, try to force the subset_code to execute first (I don't have anything similar to test this with, but it sounds plausible):
 
SELECT a.* FROM mytable a
   inner join (Select subset_code = 301451) t  on t.subset_code = a.subset_code
WHERE
  1 = dbo.fn_check_filter_string(a.filterstring)
 
???,
Landy
0
 

Author Comment

by:eeriksson
ID: 16886451
Landy,
I think you are right about the explanation and it is what I suspected in the first place.
The workaround you suggested does indeed work for simple queries such as the one I posted although I have more complex versions where it is very difficult to write it like this. In the end I had to use a HINT to force SQL server to use the index on subset_code to force it to evaluate that first. This worked but I'm unhappy that I ended up having to do it that way.

It seems to me that the fundamental problem is SQL Server's inability to analyse the UDF into it's execution plan. Obviously it has to balance the cost of evaluating the UDF vs the gain to be made by doing so. However, if SQL Server cannot evaluate the UDF I would have thought it sensible to actually ignore it completely in the execution planning in which case it would have focussed on the index that already exists for subset_code and the query would run quickly.
The optimizer should be smart enough to realise that I'm only using 2 fields from the table - subset_code (an int with an index) and filterstring (a string with an index). All other things being equal any sensible optimiser that is unsure what you are doing with filterstring, should evaluate the subset_code first.
Instead, for some reason it is deciding that it must be more efficient to evaluate the UDF predicate on filterstring first. The only way to make it behave sensibly is to either re-write your query as suggested or use an index HINT which is what I chose to do in the end.

I'm not sure that UDFs are ready for the big time. Not in SQL 2000 anyway.
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently we ran in to an issue while running some SQL jobs where we were trying to process the cubes.  We got an error saying failure stating 'NT SERVICE\SQLSERVERAGENT does not have access to Analysis Services. So this is a way to automate that wit…
An alternative to the "For XML" way of pivoting and concatenating result sets into strings, and an easy introduction to "common table expressions" (CTEs). Being someone who is always looking for alternatives to "work your data", I came across this …
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question