Link to home
Start Free TrialLog in
Avatar of eeriksson
eeriksson

asked on

Simple UDF Slowing Query Massively - SQL Server 2000

I'm not a database novice, but this problem has me very confused.

I had a sp with query something like this:
Note that this is a large table, but the subset_code significantly limits the rows returned.

SELECT * FROM mytable a
WHERE a.subset_code = 301451
AND ( LTRIM(a.filterstring) like 'AAA B%'
   OR LTRIM(a.filterstring) like 'ABB A%'
   OR LTRIM(a.filterstring) like 'DFF%' )

I need to do this same test on the 'filterstring' in many different SPs and these filterstrings often change which is obviously a maintenance nightmare so I thought I would encapsulate that in a nice simple UDF:
(I have reduced the list of strings significantly for illustrative purposes)

CREATE FUNCTION [dbo].[fn_check_filter_string]
(
      @filter_string varchar(20)
)
RETURNS int
AS
BEGIN
DECLARE @return int
SET @return = 0

IF ( LTRIM(@filter_string) like 'AAA B%'
   OR LTRIM(@filter_string) like 'ABB A%'
   OR LTRIM(@filter_string) like 'DFF%')
BEGIN
      SET @return = 1 -- Yes, it's a match
END
return @return
END

And so my query changes slightly to:

SELECT * FROM mytable a
WHERE a.subset_code = 301451
AND 1 = dbo.fn_check_filter_string(a.filterstring)

Here comes the problem. Even though the UDF is mind-numbingly simple and it's essentially entirely inlinable, on /some/ databases the second (new) version of the query takes massively longer. Where the original query is sub-second, the new version could be in the order of 10 seconds. On other databases they run at the same speed.

Perhaps SQL Server is evaluating my where clause in a stupid order such that the function is being called to do string matching on the entire table before the limiting factor of the subset_code is evaluated? Why is that the case when the code is in a function instead of directly inlined? Is SQL Server really that stupid? How can I force it to evaluate the set limiting criteria in the correct order (i.e. subset_code first, then the function)?
Updating stats seems to have no effect on SQL Servers choice of execution plan.

As a software developer I like the encapsulation and re-use that UDFs offer but such big problems with simple UDFs but I can't use them unless I can prevent such unexpected performace impacts.
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

the main difference is that when using the function, the funcation is called again and again, and has to initialize the list of strings you compare to again and again.

now, when you say that on other databases the speed with/without the udf is the same, then there is something on your server diffent. The first thing I would check is if SQL Server was not using all RAM, and potentially being swapping to the pagefile...
Avatar of eeriksson
eeriksson

ASKER

angelIII,
The databases are on the same server. To be honest I don't think resources are the issue here. The difference is too significant and is much more likely to be to do with the way SQL Server generates an execution plan to optimise the query.
>To be honest I don't think resources are the issue here.

what about the transaction log files of the 2 databases? are they "full", resp both backed up regulary, or are the 2 databases in simple recovery mode?

There MUST be a difference.
you are right to like encapsulation ...

however perhaps the correct place for this encapsulation is in a view ... rather than a udf...

you could also look to create a table of these conditions and do a joined search on those...

i'd also suggest that rather than Ltriming  (if it can be avoided) a straight  like '%xxx%' may perform better..
I think the 2 databases is a bit of a red herring.
The real question is why, on the same database, does the UDF version of the query take up to 100 times longer to run than the non-UDF version?

I had thought of a view-based solution but that doesn't feel right because it's introducing an extra set merge where there really isn't one - it should be a simple selection criteria on the one table.

I think theoretically the UDF is the correct solution but practically speaking I may have to abandon it and go back to duplicated (but fast) SQL code or an encapsulated (but unnecessary) view/secondary table unless someone can get to the real heart of this question and explain why the UDF makes the query SOOO much slower.
Angel is correct.  The reason why using the function is slower is because you are running the code:

SELECT * FROM mytable a
WHERE a.subset_code = 301451
AND ( LTRIM(a.filterstring) like 'AAA B%'
   OR LTRIM(a.filterstring) like 'ABB A%'
   OR LTRIM(a.filterstring) like 'DFF%' )

for EVERY record in the table.  Not sure if this will work for your situation, but you can try having your function return a table.  Something like (I don't have QA available at the moment, so syntax may be off):

CREATE FUNCTION [dbo].[fn_check_filter_string]
(
     @subset_code int,
    @filter_string varchar(20)
)
RETURNS @tbl TABLE (recID int)
AS
BEGIN

declare @sql varchar(1000)
SET @sql = 'INSERT INTO @tbl
   SELECT recID FROM mytable a
   WHERE a.subset_code = ' + cast(@subset_code as varchar(10)) +  
   'AND ( LTRIM(a.filterstring) like ''' + @filter_string + '%')'

EXEC @sql @subset_code, @filter_string, @tbl = @tbl OUTPUT

RETURN
END


Then your query becomes:

SELECT * FROM mytable a
   inner join dbo.fn_check_filter_string(a.subset_code,'AAA B') t1 on t1.recID = a.recID
   inner join dbo.fn_check_filter_string(a.subset_code, 'ABB A') t2 on t2.recID = a.recID
   inner join dbo.fn_check_filter_string(a.subset_code, 'DFF') t3 on t3.recID = a.recID
   

hth,
Landy
Ok. Perhaps In trying to provide lots of detail I wasn't clear enough..

This query runs *very* quickly:

SELECT * FROM mytable a
WHERE a.subset_code = 301451
AND ( LTRIM(a.filterstring) like 'AAA B%'
   OR LTRIM(a.filterstring) like 'ABB A%'
   OR LTRIM(a.filterstring) like 'DFF%' )

This query runs *very* slowly:

SELECT * FROM mytable a
WHERE a.subset_code = 301451
AND 1 = dbo.fn_check_filter_string(a.filterstring)

Why?

The idea of replacing my WHERE clause with n joins fills me with horror.
Are the databases identical in data?  The function runs once for each record, so if in one db there are only 10 records with that subset_code, the function will only run 10 times, but if you have 10,000 records, you're executing that function 10,000 times. If the db's are identical or similar, have you checked indexes?

And looking closer, the inner joins won't work in this instance.  You are looking for OR, not AND on the filters. Sorry about that.

Do you know the list of filters you need to check against? Are they in a table, or can you build a delimited string of them all?  Hopefully this might work or point you in the right direction:

declare @filter_str varchar(1000)
select @filter_str = ''
Select @filter_str = @filter_str + ',' + filter_string from tblFilterStrings
--Strip the leading ','
Select @filter_str = Right(@filter_str, len(@fileter_str) - 1)

SELECT * FROM mytable a
WHERE a.subset_code = 301451
AND
ltrim(a.filterstring) in (Select val from dbo.udf_strSplit(@filter_str, ',') )


Which uses:
CREATE FUNCTION  dbo.udf_strSplit (
                  @string varchar(8000),
                  @splitter char( 1)
                  )
RETURNS @res TABLE (val varchar( 8000))
AS
Begin
      IF SUBSTRING (@string, len ( @string), 1) <> @splitter
            SET @string= @string+@splitter

      Declare @start bigint,
            @word varchar( 8000),
            @charindex bigint,
            @i bigint
      SELECT @i=1, @start=1, @charindex= charindex( @splitter, @string, @start)

      WHILE (@charindex <> 0)
        Begin
            SET @word= substring( @string, @start, @charindex - @start)
            SET @start= @charindex +1
            SET @charindex= charindex( @splitter, @string, @start)
            INSERT INTO @res  VALUES (@word)
            SET @i=@i+1
      End

      RETURN
end



hth,
Landy






ASKER CERTIFIED SOLUTION
Avatar of LandyJ
LandyJ
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Landy,
I think you are right about the explanation and it is what I suspected in the first place.
The workaround you suggested does indeed work for simple queries such as the one I posted although I have more complex versions where it is very difficult to write it like this. In the end I had to use a HINT to force SQL server to use the index on subset_code to force it to evaluate that first. This worked but I'm unhappy that I ended up having to do it that way.

It seems to me that the fundamental problem is SQL Server's inability to analyse the UDF into it's execution plan. Obviously it has to balance the cost of evaluating the UDF vs the gain to be made by doing so. However, if SQL Server cannot evaluate the UDF I would have thought it sensible to actually ignore it completely in the execution planning in which case it would have focussed on the index that already exists for subset_code and the query would run quickly.
The optimizer should be smart enough to realise that I'm only using 2 fields from the table - subset_code (an int with an index) and filterstring (a string with an index). All other things being equal any sensible optimiser that is unsure what you are doing with filterstring, should evaluate the subset_code first.
Instead, for some reason it is deciding that it must be more efficient to evaluate the UDF predicate on filterstring first. The only way to make it behave sensibly is to either re-write your query as suggested or use an index HINT which is what I chose to do in the end.

I'm not sure that UDFs are ready for the big time. Not in SQL 2000 anyway.