Solved

CALLING A FUNCTION FROM WHERE CLAUSE ... HOW TO REPLACE IT!

Posted on 2013-01-29
11
292 Views
Last Modified: 2013-02-15
The following function gets called from a query  - from the WHERE CLAUSE - (which is/must be generated dynamically by web page).
 As expected, the function is called millions of times by the app and is causing significant performance degradation.

I want to design the query so that it uses a join to a table (that I want to create ... how?  based on the function ...) rather than expect the result from the UDF.

How do I do so?  
The function basically just returns the max(student_app_id) which is the PKey (clustered) of the main table the UDF queries.

The UDF, the table structure, and the query that calls the UDF are included below, for completness.

---- UDF CODE
CREATE FUNCTION [dbo].[fnStAppId_Prospect](@iuserId int,@iappId int)
RETURNS int
AS
BEGIN
      DECLARE @rv int,@SID int
      Set @rv = 0

      --get latest student_app_id which has been modified.
      select @SID = max(student_app_id)
      from application_status
      where user_id = @iuserId and confirmation_number is null
      AND (date_last_modified = (SELECT MAX(date_last_modified)
                                   FROM dbo.application_status
                                   WHERE user_id = @iuserId))
      AND application_id = @iappId

      IF @SID > 0
            BEGIN
                  IF EXISTS(select student_app_id from personal_info where student_app_id = @SID)
                        return @SID
                  ELSE IF EXISTS(select student_app_id from people_names where student_app_id = @SID)
                        return @SID
                  ELSE IF EXISTS(select student_app_id from special_cccapply_personal_info where student_app_id = @SID)
                        return @SID
            END
      
      RETURN 0

--*********** query
select distinct prospect_info.prospect_status_id as field_1_prospect_status_id, prospect_info.date_stamp as field_5_date_stamp, prospect_info.slots_downloaded
as field_6_slots_downloaded, table_1_people_names.last_name as field_2_last_name, table_1_people_names.first_name
as field_3_first_name, table_2_personal_info.ssn as field_4_ssn
into #org
from prospect_info left outer join system_xap.dbo.people_names as table_1_people_names
on table_1_people_names.user_id=prospect_info.user_id and table_1_people_names.application_id=0 and table_1_people_names.question_number=3
and (((dbo.fnStAppId_Prospect(table_1_people_names.user_id,5))>0 and table_1_people_names.student_app_id = dbo.fnStAppId_Prospect(table_1_people_names.user_id,5))
or ((dbo.fnStAppId_Prospect(table_1_people_names.user_id,5) = 0 and table_1_people_names.student_app_id is null)) )
left outer join system_xap.dbo.personal_info as table_2_personal_info on table_2_personal_info.user_id=prospect_info.user_id
and table_2_personal_info.application_id=0 and (((dbo.fnStAppId_Prospect(table_2_personal_info.user_id,5))>0
and table_2_personal_info.student_app_id = dbo.fnStAppId_Prospect(table_2_personal_info.user_id,5))
or ((dbo.fnStAppId_Prospect(table_2_personal_info.user_id,5) = 0 and table_2_personal_info.student_app_id is null)) )
where 1=1 and prospect_info.application_id=5 and prospect_info.prospect_status_id in
(SELECT prospect_info.prospect_status_id
FROM prospect_info LEFT OUTER JOIN system_xap.dbo.application_status AS application_status
ON prospect_info.user_id=application_status.user_id and prospect_info.application_id=application_status.application_id
WHERE (prospect_info.user_id NOT IN
(SELECT user_id FROM application_status aps WHERE (aps.application_id = 5) AND (confirmation_number IS NOT NULL)))
AND (prospect_info.application_id = 5)) and CAST(prospect_info.date_stamp AS DATETIME) between '1/1/2013' and '1/31/2013 23:59:59'

--- *********** TABLE STRUCTURE
CREATE TABLE [dbo].[application_status](
      [student_app_id] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
      [user_id] [int] NOT NULL,
      [application_id] [int] NOT NULL,
      [date_started] [smalldatetime] NULL,
      [date_last_modified] [smalldatetime] NULL,
      [application_type] [char](1) NULL,
      [on_page] [int] NULL,
      [confirmation_number] [char](40) NULL,
      [order_id] [int] NULL,
      [payment_type_id] [int] NULL,
      [date_completed] [smalldatetime] NULL,
      [date_batched] [smalldatetime] NULL,
      [date_downloaded] [smalldatetime] NULL,
      [certification_info] [varchar](64) NULL,
      [date_submitted] [smalldatetime] NULL,
      [submitted_term_year] [varchar](12) NULL,
      [app_method] [char](3) NULL,
      [intended_term] [varchar](12) NULL,
      [date_replica_downloaded] [smalldatetime] NULL,
      [slots_downloaded] [int] NULL,
      [submitted_major] [varchar](96) NULL,
      [date_downloaded2] [smalldatetime] NULL,
 CONSTRAINT [PK_StudAppID] PRIMARY KEY CLUSTERED
(
      [student_app_id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
0
Comment
Question by:DBA2000
  • 5
  • 4
  • 2
11 Comments
 
LVL 51

Expert Comment

by:Ted Bouskill
ID: 38833487
The overhead for calling the function is minimal.  For example, in this article the overhead averages out to an additional 18 milliseconds.

For one, you are calling a MAX() function in the query and that is an expensive operation especially if you don't have the right indexes created.  I think you need to use SQL's analyzer tools to find the true performance bottlenecks.
0
 

Author Comment

by:DBA2000
ID: 38833937
Thank you, TedBilly

... I forgot to mention that the function gets called over 40 million times in two hour! ... since the function is being called from a WHERE clause of the query, which means that is called for every record that the query returns, thus, the tens of millions of calls.
0
 
LVL 51

Expert Comment

by:Ted Bouskill
ID: 38834062
However, my answer is still valid.  If you move that query within the function out into the WHERE clause it will still execute millions of times using the MAX function.  Trust me, I've been down this road.
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 38836525
The following function gets called from a query  - from the WHERE CLAUSE -
What makes you think the function is called from the WHERE clause?  The way I see it fnStAppId_Prospect() is in the JOIN.  

Mind you I am not saying that is good in fact I think it could be re-written, just wanted to get the facts straight.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 38836549
For example one thing you could do is re-write this:
      select @SID = max(student_app_id) 
      from application_status 
      where user_id = @iuserId and confirmation_number is null 
      AND (date_last_modified = (SELECT MAX(date_last_modified)
                                   FROM dbo.application_status
                                   WHERE user_id = @iuserId))
      AND application_id = @iappId

Open in new window

Like this:
SELECT  @SID = MAX(student_app_id)
FROM    application_status
WHERE   user_id = @iuserId
        AND confirmation_number IS NULL
        AND application_id = @iappId
HAVING  date_last_modified = MAX(date_last_modified)

Open in new window

0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 38836624
You may actually be able to change the Multi-Statement UDF to a more efficient inline Function as in something like this:
CREATE FUNCTION [dbo].[fnStAppId_Prospect] (
		@iuserId int,
		@iappId int)

RETURNS int

AS 
BEGIN 
    RETURN (
    SELECT  CASE 
		WHEN a.[SID] <= 0 THEN 0
		ELSE COALESCE(i.student_app_id, n.student_app_id, s.student_app_id)
	     END
    FROM	(
	    SELECT  MAX(student_app_id) [SID]
	    FROM    application_status
	    WHERE   user_id = @iuserId
		    AND confirmation_number IS NULL
		    AND application_id = @iappId
	    HAVING  date_last_modified = MAX(date_last_modified)
	    ) a
	    LEFT JOIN personal_info i ON a.[SID] = i.student_app_id
	    LEFT JOIN people_names n ON a.[SID] = n.student_app_id
	    LEFT JOIN special_cccapply_personal_info s ON a.[SID] = s.student_app_id

	)   
END

Open in new window


But I agree with you that there are far better ways of skinning this cat other than using a UDF in a WHERE clause or even a JOIN.
0
 

Author Comment

by:DBA2000
ID: 38839315
Hi, acperkins

one more question,
... given that SQL does set operations better than function calls, do you think it makes sense to replace the call to the UDF by the body of the modified UDF that you came up with?  (... I, of course, have to workout the syntax when I replace the call to UDF)

as follows ...
 
... and (((dbo.fnStAppId_Prospect(table_1_people_names.user_id,5))>0 and table_1_people_names.student_app_id = dbo.fnStAppId_Prospect(table_1_people_names.user_id,5)) or ((dbo.fnStAppId_Prospect(table_1_people_names.user_id,5) = 0 ...

... with the following ...

... and (((SELECT  CASE WHEN a.[SID] <= 0 THEN 0       ELSE COALESCE(i.student_app_id, n.student_app_id, s.student_app_id)      END      FROM      (     SELECT  MAX(student_app_id) [SID]     FROM    application_status     WHERE   user_id = @iuserId
                AND confirmation_number IS NULL
                AND application_id = @iappId
          HAVING  date_last_modified = MAX(date_last_modified)
          ) a
          LEFT JOIN personal_info i ON a.[SID] = i.student_app_id
          LEFT JOIN people_names n ON a.[SID] = n.student_app_id
          LEFT JOIN special_cccapply_personal_info s ON a.[SID] = s.student_app_id )   > 0
and table_1_people_names.student_app_id = (SELECT  CASE WHEN a.[SID] <= 0 THEN 0       ELSE COALESCE(i.student_app_id, n.student_app_id, s.student_app_id)      END      FROM      (     SELECT  MAX(student_app_id) [SID]     FROM    application_status     WHERE   user_id = @iuserId
                AND confirmation_number IS NULL
                AND application_id = @iappId
          HAVING  date_last_modified = MAX(date_last_modified)
          ) a
          LEFT JOIN personal_info i ON a.[SID] = i.student_app_id
          LEFT JOIN people_names n ON a.[SID] = n.student_app_id
          LEFT JOIN special_cccapply_personal_info s ON a.[SID] = s.student_app_id )
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 38839367
do you think it makes sense to replace the call to the UDF by the body of the modified UDF that you came up with?
In general, yes.  However, I am not necessarily advocating that.  I believe a better approach may be to insert all the values into a temp table and join against that.  That way you will not have to join against the function (or derived table) six times.

Simply put the entire query and not just the function part needs to be re-evaluated and re-written.  It is a mess as it is.
0
 

Author Comment

by:DBA2000
ID: 38842039
Hi, acperkin

The UDF you wrote,  gave me the following error when I execute it.  I tried a number of changes, but none worked.  

Please help.
Thanks,

Msg 8121, Level 16, State 1, Procedure fnStAppId_Prospect_Optim_V1.1, Line 21

Column 'application_status.date_last_modified' is invalid in the HAVING clause because

it is not contained in either an aggregate function or the GROUP BY clause.
0
 
LVL 75

Accepted Solution

by:
Anthony Perkins earned 500 total points
ID: 38845721
You are right, not sure what I was thinking.  And now I am not exactly sure what you are trying to achieve.  So I would first try getting rid of the HAVING clause entirely as in:
CREATE FUNCTION [dbo].[fnStAppId_Prospect]
    (
     @iuserId int,
     @iappId int
    )
RETURNS int
AS 
    BEGIN 
        RETURN (
            SELECT  CASE WHEN a.[SID] <= 0 THEN 0
                         ELSE COALESCE(i.student_app_id, n.student_app_id, s.student_app_id)
                    END
            FROM    (SELECT MAX(student_app_id) [SID]
                     FROM   application_status
                     WHERE  user_id = @iuserId
                            AND confirmation_number IS NULL
                            AND application_id = @iappId
                    ) a
                    LEFT JOIN personal_info i ON a.[SID] = i.student_app_id
                    LEFT JOIN people_names n ON a.[SID] = n.student_app_id
                    LEFT JOIN special_cccapply_personal_info s ON a.[SID] = s.student_app_id

			)   
    END

Open in new window


If that does not produce the right results, try this instead:
CREATE FUNCTION [dbo].[fnStAppId_Prospect]
    (
     @iuserId int,
     @iappId int
    )
RETURNS int
AS 
BEGIN 
        RETURN (
        SELECT  CASE WHEN a.[SID] <= 0 THEN 0
                     ELSE COALESCE(i.student_app_id, n.student_app_id, s.student_app_id)
                END
        FROM    application_status s
                INNER JOIN (SELECT  MAX(student_app_id) [SID],
                                    MAX(date_last_modified) MAXdate_last_modified
                            FROM    application_status
                            WHERE   user_id = @iuserId
                                    AND confirmation_number IS NULL
                                    AND application_id = @iappId
                           ) a ON s.date_last_modified = a.MAXdate_last_modified
                LEFT JOIN personal_info i ON a.[SID] = i.student_app_id
                LEFT JOIN people_names n ON a.[SID] = n.student_app_id
                LEFT JOIN special_cccapply_personal_info s ON a.[SID] = s.student_app_id
        WHERE   [user_id] = @iuserId
                AND confirmation_number IS NULL
                AND application_id = @iappId

	)   
    END

Open in new window

0
 

Author Closing Comment

by:DBA2000
ID: 38895249
Thank you so much.  
The later version of the UDF works!
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SQL server 2008 and after encryption method 32 61
Text file into sql server 5 32
SSAS Hierarchy with columns with folder names 10 19
get_systemdrive info from tsql? 1 18
Hi all, It is important and often overlooked to understand “Database properties”. Often we see questions about "log files" or "where is the database" and one of the easiest ways to get general information about your database is to use “Database p…
How to leverage one TLS certificate to encrypt Microsoft SQL traffic and Remote Desktop Services, versus creating multiple tickets for the same server.
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

789 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question