Greg_L_WER
asked on
SQL Store Procedure - Slow Performance
Hi all,
I've got a SQL stored procedure that is working but I'm hoping that the performance could be improved. Note the 2 WHERE clause sections... there's about a 20 second difference in response times when the IN is being drawn from the declared table that is populated from the XML compared to being hardcoded with the same values. It is returning about 50 records (out of tables containing over 1.5M records) Any help improving the procedure would be appreciated.
Thanks,
Greg
I've got a SQL stored procedure that is working but I'm hoping that the performance could be improved. Note the 2 WHERE clause sections... there's about a 20 second difference in response times when the IN is being drawn from the declared table that is populated from the XML compared to being hardcoded with the same values. It is returning about 50 records (out of tables containing over 1.5M records) Any help improving the procedure would be appreciated.
Thanks,
Greg
ALTER PROC [dbo].[ViewByDH]
@ParameterXML xml
as set nocount on
-- STEP 1
DECLARE @StatusesIDs TABLE (StatusID int)
INSERT INTO @StatusesIDs (StatusID) SELECT ParamValues.StatusID.value('(.)[1]', 'int') FROM @ParameterXML.nodes('/Parameters/StatusesIDs/StatusID') as ParamValues(StatusID)
OPTION (OPTIMIZE FOR (@ParameterXML = NULL))
-- STEP 2
DECLARE @CSStatusesIDs TABLE (CSStatusID int)
INSERT INTO @CSStatusesIDs (CSStatusID) SELECT ParamValues.CSStatusID.value('.','int') FROM @ParameterXML.nodes('/Parameters/CSStatusesIDs/CSStatusID') as ParamValues(CSStatusID)
OPTION (OPTIMIZE FOR (@ParameterXML = NULL))
-- STEP 3
DECLARE @StartDate DateTime
Set @StartDate = (SELECT ParamValues.StartDate.value('(StartDate/text())[1]','DateTime') FROM @ParameterXML.nodes('/Parameters') as ParamValues(StartDate))
-- STEP 4
DECLARE @EndDate DateTime
Set @EndDate = (SELECT ParamValues.EndDate.value('(EndDate/text())[1]','DateTime') FROM @ParameterXML.nodes('/Parameters') as ParamValues(EndDate))
-- STEP 5
DECLARE @QueuedDate DateTime
Set @QueuedDate = (SELECT ParamValues.QueuedDate.value('(QueuedDate/text())[1]','DateTime') FROM @ParameterXML.nodes('/Parameters') as ParamValues(QueuedDate))
SELECT
r.Received,
r.ReferenceNumber,
rdh.CSStatus as CSStatus,
rc.Name as RcName,
rcrt.Name as RcrtName,
r.Status,
r.OrderNumber,
r.Rid,
r.ServiceDetails_Exist,
r.Requests_Files_Count,
r.Requests_Notes_Count_Client,
r.Requests_Notes_Count_Internal,
r.SignedOut_Uid
FROM Requests r
JOIN Requests_DH rdh on r.Rid = rdh.Rid
JOIN Request_Category_Request_Type rcrt on r.Rcrtid = rcrt.Rcrtid
JOIN Request_Category rc on rcrt.Rcid = rc.Rcid
-- USING THIS WHERE SECTION IT EXECUTES IN ABOUT 20 SECONDS
WHERE ((r.Received <= @QueuedDate and r.StatusID = 3) OR r.StatusID in (Select s.StatusID from @StatusesIDs s))
AND (rdh.CSStatusID in (Select cs.CSStatusID from @CSStatusesIDs cs))
AND (r.Received >= @StartDate and r.Received <= @EndDate)
-- USING THIS WHERE SECTION IT EXECUTES ALMOST IMMEDIATELY (BOTH USING THE XML BELOW)
WHERE ((r.Received <= @QueuedDate and r.StatusID = 3) OR r.StatusID in (1,2,6,7))
AND (rdh.CSStatusID in (1,2,3,4,5))
AND (r.Received >= @StartDate and r.Received <= @EndDate)
-- RUNNING PROCEDURE HERE...
[ViewByDH2]
'<Parameters>
<StatusesIDs>
<StatusID>1</StatusID><StatusID>2</StatusID><StatusID>6</StatusID><StatusID>7</StatusID>
</StatusesIDs>
<CSStatusesIDs>
<CSStatusID>1</CSStatusID><CSStatusID>2</CSStatusID><CSStatusID>3</CSStatusID><CSStatusID>4</CSStatusID><CSStatusID>5</CSStatusID>
</CSStatusesIDs>
<StartDate>1/1/1900 12:00:00 AM</StartDate>
<EndDate>4/27/2017 12:00:00 AM</EndDate>
<QueuedDate>4/26/2017 8:35:38 AM</QueuedDate>
</Parameters>'
It's always more faster if you don't need to query tables, even a table variable.
I've made some changes in your SELECT. Can you try this version and let me know if the performance improved?
I've made some changes in your SELECT. Can you try this version and let me know if the performance improved?
SELECT
r.Received,
r.ReferenceNumber,
rdh.CSStatus as CSStatus,
rc.Name as RcName,
rcrt.Name as RcrtName,
r.Status,
r.OrderNumber,
r.Rid,
r.ServiceDetails_Exist,
r.Requests_Files_Count,
r.Requests_Notes_Count_Client,
r.Requests_Notes_Count_Internal,
r.SignedOut_Uid
FROM Requests r
JOIN Requests_DH rdh on r.Rid = rdh.Rid
JOIN Request_Category_Request_Type rcrt on r.Rcrtid = rcrt.Rcrtid
JOIN Request_Category rc on rcrt.Rcid = rc.Rcid
JOIN @CSStatusesIDs cs on rdh.CSStatusID = cs.CSStatusID
WHERE ((r.Received <= @QueuedDate and r.StatusID = 3)
OR EXISTS (Select 1
from @StatusesIDs s
where s.StatusID = r.StatusID))
AND (r.Received >= @StartDate and r.Received <= @EndDate)
@Vitor Montalvão
the exists part will run for every and each records after join, which will be bad imo
WHERE ((r.Received <= @QueuedDate and r.StatusID = 3)
OR EXISTS (Select 1
from @StatusesIDs s
where s.StatusID = r.StatusID))
the exists part will run for every and each records after join, which will be bad imo
Huseyin, how that can be worst compared to your proposed solution?
SELECT ...
FROM (select * from Requests
where ((r.Received <= @QueuedDate and r.StatusID = 3) OR r.StatusID in (Select s.StatusID from @StatusesIDs s)
AND (r.Received >= @StartDate and r.Received <= @EndDate)
JOIN Requests_DH rdh on r.Rid = rdh.Rid and (rdh.CSStatusID in (Select cs.CSStatusID from @CSStatusesIDs cs))
JOIN Request_Category_Request_T ype rcrt on r.Rcrtid = rcrt.Rcrtid
JOIN Request_Category rc on rcrt.Rcid = rc.Rcid
SELECT ...
FROM (select * from Requests
where ((r.Received <= @QueuedDate and r.StatusID = 3) OR r.StatusID in (Select s.StatusID from @StatusesIDs s)
AND (r.Received >= @StartDate and r.Received <= @EndDate)
JOIN Requests_DH rdh on r.Rid = rdh.Rid and (rdh.CSStatusID in (Select cs.CSStatusID from @CSStatusesIDs cs))
JOIN Request_Category_Request_T
JOIN Request_Category rc on rcrt.Rcid = rc.Rcid
one is even before join, limiting the joined records
the other is after join and will run for every/each record... which will be very slow...
need to test...
the other is after join and will run for every/each record... which will be very slow...
need to test...
Don't get fooled by the JOIN statements. Internally SQL Server will perform the same operations. You can test and check it by yourself. Don't forget to enable the query plan so you can see it better.
@Vitor Montalvão
do you think all will perform same?
do you think all will perform same?
select *
from a inner join b on a.i=b.id
where a.id in (select id from c)
select *
from a inner join b on a.i=b.id and a.id in (select id from c)
select *
from (select * from a where id in select id from c) a inner join b on a.i=b.id
with t as ( select * from a where id in select id from c)
select *
from t inner join b on t.id=b.id
I would say the first 2 should provide the same query execution plan and the last 2 also with the 1st pair of selects with better performance than the 2nd pair of selects.
ASKER
Good morning all,
Thanks so much for your input. I probably should have prefaced my question with a bit more information. I’m all self-taught in SQL so there’s a good possibility that there could be a completely different/better way to get the result that I am needing. As for testing the 2 options...
This option returned 32 records in 19 seconds. I did have to modify it slightly as it was giving me an “Incorrect syntax near 'r'.” error initially so I switched ‘r’ to ‘r2’ in the initial select.
The other option is giving me very similar results. It returned 32 records in 22 seconds.
Is there possibly a different way to send a dynamic list of statuses that could be incorporated into the procedure? I tried to pass the status lists as strings and assign them to a variable and use the variable in the IN statement but wasn’t successful. Is there some way of doing this?
Running it gives me this error which would be expected…
Msg 245, Level 16, State 1, Procedure ViewByDH2_IDTest_V3, Line 11
Conversion failed when converting the varchar value '1,2,3,4' to data type int.
can the procedure be “tricked” into taking the string as a list of integers?
Thanks,
Greg
Thanks so much for your input. I probably should have prefaced my question with a bit more information. I’m all self-taught in SQL so there’s a good possibility that there could be a completely different/better way to get the result that I am needing. As for testing the 2 options...
This option returned 32 records in 19 seconds. I did have to modify it slightly as it was giving me an “Incorrect syntax near 'r'.” error initially so I switched ‘r’ to ‘r2’ in the initial select.
SELECT ...
FROM (select * from Requests r2
where ((r2.Received <= @QueuedDate and r2.StatusID = 3) OR r2.StatusID in (Select s.StatusID from @StatusesIDs s))
AND (r2.Received >= @StartDate and r2.Received <= @EndDate)
) r
JOIN Requests_DH rdh on r.Rid = rdh.Rid and (rdh.CSStatusID in (Select cs.CSStatusID from @CSStatusesIDs cs))
JOIN Request_Category_Request_Type rcrt on r.Rcrtid = rcrt.Rcrtid
JOIN Request_Category rc on rcrt.Rcid = rc.Rcid
The other option is giving me very similar results. It returned 32 records in 22 seconds.
SELECT ...
FROM Requests r
JOIN Requests_DH rdh on r.Rid = rdh.Rid
JOIN Request_Category_Request_Type rcrt on r.Rcrtid = rcrt.Rcrtid
JOIN Request_Category rc on rcrt.Rcid = rc.Rcid
JOIN @CSStatusesIDs cs on rdh.CSStatusID = cs.CSStatusID
WHERE ((r.Received <= @QueuedDate and r.StatusID = 3)
OR EXISTS (Select 1
from @StatusesIDs s
where s.StatusID = r.StatusID))
AND (r.Received >= @StartDate and r.Received <= @EndDate)
Is there possibly a different way to send a dynamic list of statuses that could be incorporated into the procedure? I tried to pass the status lists as strings and assign them to a variable and use the variable in the IN statement but wasn’t successful. Is there some way of doing this?
ALTER PROC [dbo].ViewByDH2_IDTest_V3
@ParameterXML xml
as set nocount on
-- STEP 1
DECLARE @StatusesString varchar(MAX)
Set @StatusesString = (SELECT ParamValues.Statuses.value('(Statuses/text())[1]','varchar(MAX)') FROM @ParameterXML.nodes('/Parameters') as ParamValues(Statuses))
SELECT ... (Simplified version)
FROM Requests r
WHERE r.StatusID in (@StatusesString)
ViewByDH2_IDTest_V3 '
<Parameters>
<Statuses>1,2,3,4</Statuses>
</Parameters>'
Running it gives me this error which would be expected…
Msg 245, Level 16, State 1, Procedure ViewByDH2_IDTest_V3, Line 11
Conversion failed when converting the varchar value '1,2,3,4' to data type int.
can the procedure be “tricked” into taking the string as a list of integers?
Thanks,
Greg
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Huge difference Scott! Assuming it must be the primary key that’s making the difference. It runs about 5 times faster than before :) Returning 163 records in 0-5 seconds compared to around 25 seconds. For future reference the full procedure is:
Also, just for curiosity sake I tried the other option for the where statement with similar responsiveness.
This was the XML passed for these last tests
Thanks to all who contributed!
ALTER PROC [dbo].[ViewByDH2_FINAL]
@ParameterXML xml
as set nocount on
-- STEP 1
CREATE TABLE #StatusesIDs (StatusID int PRIMARY KEY)
INSERT INTO #StatusesIDs (StatusID) SELECT DISTINCT ParamValues.StatusID.value('(.)[1]', 'int') FROM @ParameterXML.nodes('/Parameters/StatusesIDs/StatusID') as ParamValues(StatusID)
-- STEP 2
CREATE TABLE #CSStatusesIDs (CSStatusID int PRIMARY KEY)
INSERT INTO #CSStatusesIDs (CSStatusID) SELECT DISTINCT ParamValues.CSStatusID.value('.','int') FROM @ParameterXML.nodes('/Parameters/CSStatusesIDs/CSStatusID') as ParamValues(CSStatusID)
-- STEP 3
DECLARE @StartDate DateTime
Set @StartDate = (SELECT ParamValues.StartDate.value('(StartDate/text())[1]','DateTime') FROM @ParameterXML.nodes('/Parameters') as ParamValues(StartDate))
-- STEP 4
DECLARE @EndDate DateTime
Set @EndDate = (SELECT ParamValues.EndDate.value('(EndDate/text())[1]','DateTime') FROM @ParameterXML.nodes('/Parameters') as ParamValues(EndDate))
-- STEP 5
DECLARE @QueuedDate DateTime
Set @QueuedDate = (SELECT ParamValues.QueuedDate.value('(QueuedDate/text())[1]','DateTime') FROM @ParameterXML.nodes('/Parameters') as ParamValues(QueuedDate))
SELECT
r.Received,
r.ReferenceNumber,
rdh.CSStatus as CSStatus,
rc.Name as RcName,
rcrt.Name as RcrtName,
r.Status,
r.OrderNumber,
r.Rid,
r.ServiceDetails_Exist,
r.Requests_Files_Count,
r.Requests_Notes_Count_Client,
r.Requests_Notes_Count_Internal,
r.SignedOut_Uid
FROM Requests r
JOIN Requests_DH rdh on r.Rid = rdh.Rid
JOIN Request_Category_Request_Type rcrt on r.Rcrtid = rcrt.Rcrtid
JOIN Request_Category rc on rcrt.Rcid = rc.Rcid
JOIN #CSStatusesIDs cs on rdh.CSStatusID = cs.CSStatusID
WHERE ((r.Received <= @QueuedDate and r.StatusID = 3) OR EXISTS(SELECT 1 FROM #StatusesIDs s WHERE s.StatusID = r.StatusID))
AND (EXISTS(SELECT 1 FROM #CSStatusesIDs cs WHERE rdh.CSStatusID = cs.CSStatusID))
AND (r.Received >= @StartDate and r.Received <= @EndDate)
Also, just for curiosity sake I tried the other option for the where statement with similar responsiveness.
WHERE ((r.Received <= @QueuedDate and r.StatusID = 3) OR (r.StatusID in (SELECT s.StatusID FROM #StatusesIDs s)))
This was the XML passed for these last tests
[ViewByDH2_IDTest_V4] '
<Parameters>
<StatusesIDs>
<StatusID>1</StatusID>
<StatusID>2</StatusID>
<StatusID>3</StatusID>
<StatusID>6</StatusID>
<StatusID>7</StatusID>
</StatusesIDs>
<CSStatusesIDs>
<CSStatusID>1</CSStatusID>
<CSStatusID>2</CSStatusID>
<CSStatusID>3</CSStatusID>
<CSStatusID>4</CSStatusID>
<CSStatusID>5</CSStatusID>
</CSStatusesIDs>
<StartDate>1/1/1900 12:00:00 AM</StartDate>
<EndDate>5/3/2017 12:00:00 AM</EndDate>
<QueuedDate>4/26/2017 8:35:38 AM</QueuedDate>
</Parameters>'
Thanks to all who contributed!
The PRIMARY KEY helps, combined with changing to a temp table.
One quirk of table variables is that SQL always uses a row count estimate of 1 for them. That in turn means that SQL uses a loop join to join to them, since that type of join works great for 1 row. However, if there are a lot of rows -- esp. a whole lot rows -- a loop join can be terribly inefficient. But for a temp table, SQL will update stats and use the actual row count, often producing a better join. You could check on this by looking at the query plans for both versions of the query, temp table version and table variable version.
One quirk of table variables is that SQL always uses a row count estimate of 1 for them. That in turn means that SQL uses a loop join to join to them, since that type of join works great for 1 row. However, if there are a lot of rows -- esp. a whole lot rows -- a loop join can be terribly inefficient. But for a temp table, SQL will update stats and use the actual row count, often producing a better join. You could check on this by looking at the query plans for both versions of the query, temp table version and table variable version.
ASKER
Thanks for the explanation Scott. I'll definitely play around with the 2 versions to better understand.
Open in new window
where ... is line 29-41