Link to home
Start Free TrialLog in
Avatar of Greg_L_WER
Greg_L_WERFlag for Canada

asked on

SQL Store Procedure - Slow Performance

Hi all,

I've got a SQL stored procedure that is working but I'm hoping that the performance could be improved.  Note the 2 WHERE clause sections... there's about a 20 second difference in response times when the IN is being drawn from the declared table that is populated from the XML compared to being hardcoded with the same values.  It is returning about 50 records (out of tables containing over 1.5M records)  Any help improving the procedure would be appreciated.

Thanks,
Greg

ALTER PROC [dbo].[ViewByDH]
	@ParameterXML xml

as set nocount on

-- STEP 1
DECLARE @StatusesIDs TABLE (StatusID int)
INSERT INTO @StatusesIDs (StatusID) SELECT ParamValues.StatusID.value('(.)[1]', 'int') FROM @ParameterXML.nodes('/Parameters/StatusesIDs/StatusID') as ParamValues(StatusID)
OPTION (OPTIMIZE FOR (@ParameterXML = NULL))

-- STEP 2
DECLARE @CSStatusesIDs TABLE (CSStatusID int)
INSERT INTO @CSStatusesIDs (CSStatusID) SELECT ParamValues.CSStatusID.value('.','int') FROM @ParameterXML.nodes('/Parameters/CSStatusesIDs/CSStatusID') as ParamValues(CSStatusID)
OPTION (OPTIMIZE FOR (@ParameterXML = NULL))

-- STEP 3
DECLARE @StartDate DateTime
Set @StartDate = (SELECT ParamValues.StartDate.value('(StartDate/text())[1]','DateTime') FROM @ParameterXML.nodes('/Parameters') as ParamValues(StartDate))

-- STEP 4
DECLARE @EndDate DateTime
Set @EndDate = (SELECT ParamValues.EndDate.value('(EndDate/text())[1]','DateTime') FROM @ParameterXML.nodes('/Parameters') as ParamValues(EndDate))

-- STEP 5
DECLARE @QueuedDate DateTime
Set @QueuedDate = (SELECT ParamValues.QueuedDate.value('(QueuedDate/text())[1]','DateTime') FROM @ParameterXML.nodes('/Parameters') as ParamValues(QueuedDate))

SELECT 
	r.Received,
	r.ReferenceNumber,
	rdh.CSStatus as CSStatus,
	rc.Name as RcName,
	rcrt.Name as RcrtName,
	r.Status,
	r.OrderNumber,
	r.Rid,
	r.ServiceDetails_Exist,
	r.Requests_Files_Count,
	r.Requests_Notes_Count_Client,
	r.Requests_Notes_Count_Internal,
	r.SignedOut_Uid

FROM Requests r
JOIN Requests_DH rdh on r.Rid = rdh.Rid
JOIN Request_Category_Request_Type rcrt on r.Rcrtid = rcrt.Rcrtid
JOIN Request_Category rc on rcrt.Rcid = rc.Rcid


-- USING THIS WHERE SECTION IT EXECUTES IN ABOUT 20 SECONDS
WHERE ((r.Received <= @QueuedDate and r.StatusID = 3) OR r.StatusID in (Select s.StatusID from @StatusesIDs s))
AND   (rdh.CSStatusID in (Select cs.CSStatusID from @CSStatusesIDs cs)) 
AND   (r.Received >= @StartDate and r.Received <= @EndDate)


-- USING THIS WHERE SECTION IT EXECUTES ALMOST IMMEDIATELY (BOTH USING THE XML BELOW)
WHERE ((r.Received <= @QueuedDate and r.StatusID = 3) OR r.StatusID in (1,2,6,7))
AND   (rdh.CSStatusID in (1,2,3,4,5)) 
AND   (r.Received >= @StartDate and r.Received <= @EndDate)



-- RUNNING PROCEDURE HERE...
[ViewByDH2] 
'<Parameters>
  <StatusesIDs>
    <StatusID>1</StatusID><StatusID>2</StatusID><StatusID>6</StatusID><StatusID>7</StatusID>
  </StatusesIDs>
  <CSStatusesIDs>
<CSStatusID>1</CSStatusID><CSStatusID>2</CSStatusID><CSStatusID>3</CSStatusID><CSStatusID>4</CSStatusID><CSStatusID>5</CSStatusID>
  </CSStatusesIDs>
  <StartDate>1/1/1900 12:00:00 AM</StartDate>
  <EndDate>4/27/2017 12:00:00 AM</EndDate>
  <QueuedDate>4/26/2017 8:35:38 AM</QueuedDate>
</Parameters>'

Open in new window

Avatar of HainKurt
HainKurt
Flag of Canada image

what about this
SELECT ...
FROM (select * from Requests 
where ((r.Received <= @QueuedDate and r.StatusID = 3) OR r.StatusID in (Select s.StatusID from @StatusesIDs s)
AND   (r.Received >= @StartDate and r.Received <= @EndDate)
) r
JOIN Requests_DH rdh on r.Rid = rdh.Rid and (rdh.CSStatusID in (Select cs.CSStatusID from @CSStatusesIDs cs))
JOIN Request_Category_Request_Type rcrt on r.Rcrtid = rcrt.Rcrtid
JOIN Request_Category rc on rcrt.Rcid = rc.Rcid

Open in new window


where ... is line 29-41
Avatar of Vitor Montalvão
It's always more faster if you don't need to query tables, even a table variable.
I've made some changes in your SELECT. Can you try this version and let me know if the performance improved?
SELECT 
	r.Received,
	r.ReferenceNumber,
	rdh.CSStatus as CSStatus,
	rc.Name as RcName,
	rcrt.Name as RcrtName,
	r.Status,
	r.OrderNumber,
	r.Rid,
	r.ServiceDetails_Exist,
	r.Requests_Files_Count,
	r.Requests_Notes_Count_Client,
	r.Requests_Notes_Count_Internal,
	r.SignedOut_Uid
FROM Requests r
	JOIN Requests_DH rdh on r.Rid = rdh.Rid
	JOIN Request_Category_Request_Type rcrt on r.Rcrtid = rcrt.Rcrtid
	JOIN Request_Category rc on rcrt.Rcid = rc.Rcid
	JOIN @CSStatusesIDs cs on rdh.CSStatusID = cs.CSStatusID
WHERE ((r.Received <= @QueuedDate and r.StatusID = 3) 
	OR EXISTS (Select 1 
		from @StatusesIDs s
		where s.StatusID = r.StatusID))
	AND (r.Received >= @StartDate and r.Received <= @EndDate)

Open in new window

@Vitor Montalvão
WHERE ((r.Received <= @QueuedDate and r.StatusID = 3) 
	OR EXISTS (Select 1 
		from @StatusesIDs s
		where s.StatusID = r.StatusID))

Open in new window


the exists part will run for every and each records after join, which will be bad imo
Huseyin, how that can be worst compared to your proposed solution?

SELECT ...
FROM (select * from Requests
where ((r.Received <= @QueuedDate and r.StatusID = 3) OR r.StatusID in (Select s.StatusID from @StatusesIDs s)
AND   (r.Received >= @StartDate and r.Received <= @EndDate)
    JOIN Requests_DH rdh on r.Rid = rdh.Rid and (rdh.CSStatusID in (Select cs.CSStatusID from @CSStatusesIDs cs))
    JOIN Request_Category_Request_Type rcrt on r.Rcrtid = rcrt.Rcrtid
    JOIN Request_Category rc on rcrt.Rcid = rc.Rcid
one is even before join, limiting the joined records
the other is after join and will run for every/each record... which will be very slow...

need to test...
Don't get fooled by the JOIN statements. Internally SQL Server will perform the same operations. You can test and check it by yourself. Don't forget to enable the query plan so you can see it better.
@Vitor Montalvão

do you think all will perform same?

select *
from a inner join b on a.i=b.id
where a.id in (select id from c)

select *
from a inner join b on a.i=b.id and a.id in (select id from c)

select *
from (select * from a where id in select id from c) a inner join b on a.i=b.id

with t as ( select * from a where id in select id from c) 
select *
from t inner join b on t.id=b.id

Open in new window

I would say the first 2 should provide the same query execution plan and the last 2 also with the 1st pair of selects with better performance than the 2nd pair of selects.
Avatar of Greg_L_WER

ASKER

Good morning all,

Thanks so much for your input.  I probably should have prefaced my question with a bit more information.  I’m all self-taught in SQL so there’s a good possibility that there could be a completely different/better way to get the result that I am needing.  As for testing the 2 options...


This option returned 32 records in 19 seconds.  I did have to modify it slightly as it was giving me an “Incorrect syntax near 'r'.” error initially so I switched ‘r’ to ‘r2’ in the initial select.

SELECT ...
FROM (select * from Requests r2
where ((r2.Received <= @QueuedDate and r2.StatusID = 3) OR r2.StatusID in (Select s.StatusID from @StatusesIDs s))
AND   (r2.Received >= @StartDate and r2.Received <= @EndDate)
) r
JOIN Requests_DH rdh on r.Rid = rdh.Rid and (rdh.CSStatusID in (Select cs.CSStatusID from @CSStatusesIDs cs))
JOIN Request_Category_Request_Type rcrt on r.Rcrtid = rcrt.Rcrtid
JOIN Request_Category rc on rcrt.Rcid = rc.Rcid

Open in new window


The other option is giving me very similar results.  It returned 32 records in 22 seconds.

SELECT ...
FROM Requests r
	JOIN Requests_DH rdh on r.Rid = rdh.Rid
	JOIN Request_Category_Request_Type rcrt on r.Rcrtid = rcrt.Rcrtid
	JOIN Request_Category rc on rcrt.Rcid = rc.Rcid
	JOIN @CSStatusesIDs cs on rdh.CSStatusID = cs.CSStatusID
WHERE ((r.Received <= @QueuedDate and r.StatusID = 3) 
	OR EXISTS (Select 1 
		from @StatusesIDs s
		where s.StatusID = r.StatusID))
	AND (r.Received >= @StartDate and r.Received <= @EndDate)

Open in new window


Is there possibly a different way to send a dynamic list of statuses that could be incorporated into the procedure?  I tried to pass the status lists as strings and assign them to a variable and use the variable in the IN statement but wasn’t successful.  Is there some way of doing this?

ALTER PROC [dbo].ViewByDH2_IDTest_V3
	@ParameterXML xml

as set nocount on

-- STEP 1
DECLARE @StatusesString varchar(MAX)
Set @StatusesString = (SELECT ParamValues.Statuses.value('(Statuses/text())[1]','varchar(MAX)') FROM @ParameterXML.nodes('/Parameters') as ParamValues(Statuses))


SELECT ... (Simplified version) 
FROM Requests r
WHERE r.StatusID in (@StatusesString)


ViewByDH2_IDTest_V3 '
<Parameters>
  <Statuses>1,2,3,4</Statuses>
</Parameters>'

Open in new window


Running it gives me this error which would be expected…

Msg 245, Level 16, State 1, Procedure ViewByDH2_IDTest_V3, Line 11
Conversion failed when converting the varchar value '1,2,3,4' to data type int.

can the procedure be “tricked” into taking the string as a list of integers?

Thanks,
Greg
ASKER CERTIFIED SOLUTION
Avatar of Scott Pletcher
Scott Pletcher
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Huge difference Scott! Assuming it must be the primary key that’s making the difference. It runs about 5 times faster than before :)  Returning 163 records in 0-5 seconds compared to around 25 seconds.  For future reference the full procedure is:

ALTER PROC [dbo].[ViewByDH2_FINAL]
	@ParameterXML xml

as set nocount on

-- STEP 1
CREATE TABLE #StatusesIDs (StatusID int PRIMARY KEY)
INSERT INTO #StatusesIDs (StatusID) SELECT DISTINCT ParamValues.StatusID.value('(.)[1]', 'int') FROM @ParameterXML.nodes('/Parameters/StatusesIDs/StatusID') as ParamValues(StatusID)

-- STEP 2
CREATE TABLE #CSStatusesIDs (CSStatusID int PRIMARY KEY)
INSERT INTO #CSStatusesIDs (CSStatusID) SELECT DISTINCT ParamValues.CSStatusID.value('.','int') FROM @ParameterXML.nodes('/Parameters/CSStatusesIDs/CSStatusID') as ParamValues(CSStatusID)

-- STEP 3
DECLARE @StartDate DateTime
Set @StartDate = (SELECT ParamValues.StartDate.value('(StartDate/text())[1]','DateTime') FROM @ParameterXML.nodes('/Parameters') as ParamValues(StartDate))

-- STEP 4
DECLARE @EndDate DateTime
Set @EndDate = (SELECT ParamValues.EndDate.value('(EndDate/text())[1]','DateTime') FROM @ParameterXML.nodes('/Parameters') as ParamValues(EndDate))

-- STEP 5
DECLARE @QueuedDate DateTime
Set @QueuedDate = (SELECT ParamValues.QueuedDate.value('(QueuedDate/text())[1]','DateTime') FROM @ParameterXML.nodes('/Parameters') as ParamValues(QueuedDate))

SELECT 
	r.Received,
	r.ReferenceNumber,
	rdh.CSStatus as CSStatus,
	rc.Name as RcName,
	rcrt.Name as RcrtName,
	r.Status,
	r.OrderNumber,
	r.Rid,
	r.ServiceDetails_Exist,
	r.Requests_Files_Count,
	r.Requests_Notes_Count_Client,
	r.Requests_Notes_Count_Internal,
	r.SignedOut_Uid

FROM Requests r
	JOIN Requests_DH rdh on r.Rid = rdh.Rid
	JOIN Request_Category_Request_Type rcrt on r.Rcrtid = rcrt.Rcrtid
	JOIN Request_Category rc on rcrt.Rcid = rc.Rcid
	JOIN #CSStatusesIDs cs on rdh.CSStatusID = cs.CSStatusID
WHERE ((r.Received <= @QueuedDate and r.StatusID = 3) OR EXISTS(SELECT 1 FROM #StatusesIDs s WHERE s.StatusID = r.StatusID))
AND   (EXISTS(SELECT 1 FROM #CSStatusesIDs cs WHERE rdh.CSStatusID = cs.CSStatusID)) 
 AND   (r.Received >= @StartDate and r.Received <= @EndDate)

Open in new window


Also, just for curiosity sake I tried the other option for the where statement with similar responsiveness.

WHERE ((r.Received <= @QueuedDate and r.StatusID = 3) OR (r.StatusID in (SELECT s.StatusID FROM #StatusesIDs s)))

Open in new window


This was the XML passed for these last tests

[ViewByDH2_IDTest_V4] '
<Parameters>
  <StatusesIDs>
    <StatusID>1</StatusID>
    <StatusID>2</StatusID>
    <StatusID>3</StatusID>
    <StatusID>6</StatusID>
    <StatusID>7</StatusID>
  </StatusesIDs>
  <CSStatusesIDs>
    <CSStatusID>1</CSStatusID>
    <CSStatusID>2</CSStatusID>
    <CSStatusID>3</CSStatusID>
    <CSStatusID>4</CSStatusID>
    <CSStatusID>5</CSStatusID>
  </CSStatusesIDs>
  <StartDate>1/1/1900 12:00:00 AM</StartDate>
  <EndDate>5/3/2017 12:00:00 AM</EndDate>
  <QueuedDate>4/26/2017 8:35:38 AM</QueuedDate>
</Parameters>'

Open in new window


Thanks to all who contributed!
The PRIMARY KEY helps, combined with changing to a temp table.

One quirk of table variables is that SQL always uses a row count estimate of 1 for them.  That in turn means that SQL uses a loop join to join to them, since that type of join works great for 1 row.  However, if there are a lot of rows -- esp. a whole lot rows -- a loop join can be terribly inefficient.  But for a temp table, SQL will update stats and use the actual row count, often producing a better join.  You could check on this by looking at the query plans for both versions of the query, temp table version and table variable version.
Thanks for the explanation Scott.  I'll definitely play around with the 2 versions to better understand.