Greg Watkins
asked on
SQL query takes a long time to run
I have SQL query that calculates some totals based upon entries between two tables. The problem I have is that the query takes a very long time to run. One of the tables is the rate_deck table and the other one is the sip_calls table. The rate_deck table has about 180,000 records and the sip_calls table has about 140,000 records. In this scenario it took over 3 hours for the query to run. I have tried creating indexes on both tables but it did not help. I am importing the data for both tables from csv files so I have full control over the tables schema. Below I have the current schema for both tables and some sample data and a copy of the query. What I need is some advice on what can be done to speed up the query.
CREATE TABLE sip_calls(
CalledNumber BIGINT NOT NULL
,Duration BIGINT NOT NULL
);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (13178268627,265);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (8884254800,24);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (7088505327,66);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (13132311580,0);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (4373413793,7);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (8004254800,24);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (19991234560,123);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (9991234567,12);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (13132311580,48);
CREATE TABLE rate_deck(
NPANXX INT NOT NULL PRIMARY KEY
,XS NUMERIC(7,5) NOT NULL
,ST NUMERIC(7,5) NOT NULL
,XX NUMERIC(7,5) NOT NULL
);
INSERT INTO rate_deck(NPANXX,XS,ST,XX) VALUES (317826,0.0033,0.00243,0.00243);
INSERT INTO rate_deck(NPANXX,XS,ST,XX) VALUES (313231,0.0031,0.00243,0.00243);
INSERT INTO rate_deck(NPANXX,XS,ST,XX) VALUES (708850,0.0019,0.00243,0.00243);
INSERT INTO rate_deck(NPANXX,XS,ST,XX) VALUES (888,0.001,0.00165,0.00165);
INSERT INTO rate_deck(NPANXX,XS,ST,XX) VALUES (800,0.001,0.00165,0.00165);
select
coalesce(NPANXX,0) as NPANXX
, CalledNumber as CalledNumber
, count(*) as Connects
, sum(duration) as Seconds
, round((sum(duration)/60.0),2,1) as Minutes
, XS as Rate
, sum((duration/60.0) * XS) as Rated
from sip_calls
inner join Rate_Deck on (CalledNumber / 10000) = (NPANXX + 1000000)
or (CalledNumber / 10000) = NPANXX
or (CalledNumber / 10000000) = NPANXX
WHERE DURATION > 0
group by
NPANXX, XS, CalledNumber
order by NPANXX asc
Given the computations in the JOIN, I'd suggest trying a separation of the requests and UNIONing the results:
select
coalesce(NPANXX,0) as NPANXX
, CalledNumber as CalledNumber
, count(*) as Connects
, sum(duration) as Seconds
, round((sum(duration)/60.0) ,2,1) as Minutes
, XS as Rate
, sum((duration/60.0) * XS) as Rated
from sip_calls
inner join Rate_Deck on NPANXX = CalledNumber / 10000 - 1000000 /* I changed the + 1000000 to - 1000000 on the other side */
WHERE DURATION > 0
group by
NPANXX, XS, CalledNumber
union all
select
coalesce(NPANXX,0) as NPANXX
, CalledNumber as CalledNumber
, count(*) as Connects
, sum(duration) as Seconds
, round((sum(duration)/60.0) ,2,1) as Minutes
, XS as Rate
, sum((duration/60.0) * XS) as Rated
from sip_calls
inner join Rate_Deck on NPANXX = CalledNumber / 10000
WHERE DURATION > 0
group by
NPANXX, XS, CalledNumber
union all
select
coalesce(NPANXX,0) as NPANXX
, CalledNumber as CalledNumber
, count(*) as Connects
, sum(duration) as Seconds
, round((sum(duration)/60.0) ,2,1) as Minutes
, XS as Rate
, sum((duration/60.0) * XS) as Rated
from sip_calls
inner join Rate_Deck on NPANXX = CalledNumber / 1000000
WHERE DURATION > 0
group by
NPANXX, XS, CalledNumber
order by NPANXX asc
select
coalesce(NPANXX,0) as NPANXX
, CalledNumber as CalledNumber
, count(*) as Connects
, sum(duration) as Seconds
, round((sum(duration)/60.0)
, XS as Rate
, sum((duration/60.0) * XS) as Rated
from sip_calls
inner join Rate_Deck on NPANXX = CalledNumber / 10000 - 1000000 /* I changed the + 1000000 to - 1000000 on the other side */
WHERE DURATION > 0
group by
NPANXX, XS, CalledNumber
union all
select
coalesce(NPANXX,0) as NPANXX
, CalledNumber as CalledNumber
, count(*) as Connects
, sum(duration) as Seconds
, round((sum(duration)/60.0)
, XS as Rate
, sum((duration/60.0) * XS) as Rated
from sip_calls
inner join Rate_Deck on NPANXX = CalledNumber / 10000
WHERE DURATION > 0
group by
NPANXX, XS, CalledNumber
union all
select
coalesce(NPANXX,0) as NPANXX
, CalledNumber as CalledNumber
, count(*) as Connects
, sum(duration) as Seconds
, round((sum(duration)/60.0)
, XS as Rate
, sum((duration/60.0) * XS) as Rated
from sip_calls
inner join Rate_Deck on NPANXX = CalledNumber / 1000000
WHERE DURATION > 0
group by
NPANXX, XS, CalledNumber
order by NPANXX asc
ASKER
Sorry David. I should have included the indexes that I built which didn't seem to help. Here they are. Maybe there is some problem with how I created the indexes?
USE [Netrio]
GO
/****** Object: Index [PK_NetrioSIPRateDeck] Script Date: 4/23/2019 10:22:07 AM ******/
ALTER TABLE [dbo].[Rate_Deck] ADD CONSTRAINT [PK_NetrioSIPRateDeck] PRIMARY KEY CLUSTERED
(
[NPANXX] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
/****** Object: Index [NonClusteredIndex-20190418-133232] Script Date: 4/23/2019 10:18:35 AM ******/
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20190418-133232] ON [dbo].[Rate_Deck]
(
[NPANXX] ASC
)
INCLUDE ( [XS]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
USE [Netrio]
GO
/****** Object: Index [ClusteredIndex-20190418-133309] Script Date: 4/23/2019 10:21:00 AM ******/
CREATE CLUSTERED INDEX [ClusteredIndex-20190418-133309] ON [dbo].[Sip_Calls]
(
[CalledNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
/****** Object: Index [NonClusteredIndex-20190418-133325] Script Date: 4/23/2019 10:21:33 AM ******/
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20190418-133325] ON [dbo].[Sip_Calls]
(
[CalledNumber] ASC
)
INCLUDE ( [Duration]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
Hi Greg,
Do you have an explain plan of the query? For this to run for 3 hours on such a relatively small number of rows suggests that the indexes aren't being used, a Cartesian Product is producing a huge number of rows, or both.
Do you have an explain plan of the query? For this to run for 3 hours on such a relatively small number of rows suggests that the indexes aren't being used, a Cartesian Product is producing a huge number of rows, or both.
ASKER
Hi Kent. I'm s not sure what you mean by an "explain plan" but in general the query looks at the called number field in the sip calls table and tries to find a matching NPANXX value in the rate deck table for the first part of the called number and if it finds a match then it sums the duration of the calls with that called number, converts the duration seconds to minutes, and then multiplies it by the XS field value to come up with a Rated value (cost of the calls). I got the query from another expert on a separate question.
Hi Greg,
An explain plan will produce a description of how the query is broken down and run, with statistics to show where most of the resources are being consumed. It can be generated by executing the query, or by estimating what the query is expected to do.
In SSMS, run this statement:
SET SHOWPLAN_ALL ON
Then run your query. SQL Server should estimate the execution. If you can post the results here we'll have a better handle on what can be done to improve it.
When you're done, either close that window or in that same window run:
SET SHOWPLAN_ALL OFF
An explain plan will produce a description of how the query is broken down and run, with statistics to show where most of the resources are being consumed. It can be generated by executing the query, or by estimating what the query is expected to do.
In SSMS, run this statement:
SET SHOWPLAN_ALL ON
Then run your query. SQL Server should estimate the execution. If you can post the results here we'll have a better handle on what can be done to improve it.
When you're done, either close that window or in that same window run:
SET SHOWPLAN_ALL OFF
Again, with OR clauses in the JOIN, SQL must typically do a table scan of the second table for every row in the first table, rather than doing some type of lookup. That will result in ~140K-160K table scans, which will be very, very slow.
If you break apart the OR clauses into separate queries, SQL should be able to, hopefully, do some type of index seek/match on the rate_deck table. That would mean only 3 scans of the sip_calls table. Needless to say, 3 scans should be massively faster than 100K+ scans.
Given the ORs and the computations in the JOIN conditions, I wouldn't expect any index(es) to be able to be used for the original query.
If you're willing to try the UNIONed SQL, I posted it as the second post to this q.
If you break apart the OR clauses into separate queries, SQL should be able to, hopefully, do some type of index seek/match on the rate_deck table. That would mean only 3 scans of the sip_calls table. Needless to say, 3 scans should be massively faster than 100K+ scans.
Given the ORs and the computations in the JOIN conditions, I wouldn't expect any index(es) to be able to be used for the original query.
If you're willing to try the UNIONed SQL, I posted it as the second post to this q.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks John. I will give it try.
ASKER
Wow! John, that made a huge difference. It completed in about 4 seconds!
Thanks everyone!
Thanks everyone!
ASKER
John, I have one more similar query that I could use your help with. It uses the same tables but it calculates the data for calls in the sip_calls table that do not have a matching NPANXX in the rate_table. Here is the current query.
select
CalledNumber as CalledNumber
, count(*) as Connects
, sum(duration) as Seconds
, round((sum(duration)/60.0),2,1) as Minutes
from sip_calls
left join Rate_Deck on (CalledNumber / 10000) = (NPANXX + 1000000)
or (CalledNumber / 10000) = NPANXX
or (CalledNumber / 10000000) = NPANXX
WHERE sip_calls.DURATION > 0
AND Rate_Deck.NPANXX IS NULL
group by
CalledNumber
order by CalledNumber asc
As Scott already wrote the problem is for sure the OR in the JOIN.
I would add persisted computed columns to shift the calculus to the import. Then try indexing those.
And just for clean code, do the arithmetic's on only one side ( the table with lesser rows).
But the given sample seems not to be significant:
I would add persisted computed columns to shift the calculus to the import. Then try indexing those.
And just for clean code, do the arithmetic's on only one side ( the table with lesser rows).
ON CalledNumber / 10000 - 1000000 = NPANXX
OR CalledNumber / 10000 = NPANXX
OR CalledNumber / 10000000 = NPANXX
Persist these three call number derivates and index them (covering indices). Using then the UNION approach should be the fastest.But the given sample seems not to be significant:
USE tempdb;
GO
CREATE TABLE dbo.sip_calls (
CalledNumber BIGINT NOT NULL ,
Duration BIGINT NOT NULL ,
CN1 AS CalledNumber / 10000 - 1000000 PERSISTED ,
CN2 AS CalledNumber / 10000 PERSISTED ,
CN3 AS CalledNumber / 10000000 PERSISTED ,
HasDuration AS CAST(IIF(Duration > 0, 1, 0) AS BIT)PERSISTED
);
CREATE INDEX IX_CN1
ON dbo.sip_calls (
HasDuration DESC ,
CN1 ,
Duration ,
CalledNumber );
CREATE INDEX IX_CN2
ON dbo.sip_calls (
HasDuration DESC ,
CN2 ,
Duration ,
CalledNumber );
CREATE INDEX IX_CN3
ON dbo.sip_calls (
HasDuration DESC ,
CN3 ,
Duration ,
CalledNumber );
CREATE TABLE dbo.rate_deck (
NPANXX INT NOT NULL ,
XS NUMERIC(7, 5) NOT NULL ,
ST NUMERIC(7, 5) NOT NULL ,
XX NUMERIC(7, 5) NOT NULL ,
CONSTRAINT PK_RateDeack
PRIMARY KEY ( NPANXX )
);
INSERT INTO dbo.sip_calls ( CalledNumber ,
Duration )
VALUES ( 13178268627, 265 ) ,
( 8884254800, 24 ) ,
( 7088505327, 66 ) ,
( 13132311580, 0 ) ,
( 4373413793, 7 ) ,
( 8004254800, 24 ) ,
( 19991234560, 123 ) ,
( 9991234567, 12 ) ,
( 13132311580, 48 );
INSERT INTO dbo.rate_deck ( NPANXX ,
XS ,
ST ,
XX )
VALUES ( 317826, 0.0033, 0.00243, 0.00243 ) ,
( 313231, 0.0031, 0.00243, 0.00243 ) ,
( 708850, 0.0019, 0.00243, 0.00243 ) ,
( 888, 0.001, 0.00165, 0.00165 ) ,
( 800, 0.001, 0.00165, 0.00165 );
GO
SET STATISTICS IO ON;
SELECT COALESCE(NPANXX, 0) AS NPANXX ,
CalledNumber AS CalledNumber ,
COUNT(*) AS Connects ,
SUM(Duration) AS Seconds ,
ROUND(( SUM(Duration) / 60.0 ), 2, 1) AS Minutes ,
XS AS Rate ,
SUM(( Duration / 60.0 ) * XS) AS Rated
FROM dbo.sip_calls
INNER JOIN dbo.rate_deck ON ( CalledNumber / 10000 ) = ( NPANXX + 1000000 )
OR ( CalledNumber / 10000 ) = NPANXX
OR ( CalledNumber / 10000000 ) = NPANXX
WHERE Duration > 0
GROUP BY NPANXX ,
XS ,
CalledNumber
ORDER BY NPANXX ASC;
SELECT COALESCE(NPANXX, 0) AS NPANXX ,
CalledNumber AS CalledNumber ,
COUNT(*) AS Connects ,
SUM(Duration) AS Seconds ,
ROUND(( SUM(Duration) / 60.0 ), 2, 1) AS Minutes ,
XS AS Rate ,
SUM(( Duration / 60.0 ) * XS) AS Rated
FROM dbo.sip_calls
INNER JOIN dbo.rate_deck ON NPANXX IN ( CN1, CN2, CN3 )
WHERE HasDuration = 1
GROUP BY NPANXX ,
XS ,
CalledNumber
ORDER BY NPANXX ASC;
SELECT COALESCE(NPANXX, 0) AS NPANXX ,
CalledNumber AS CalledNumber ,
COUNT(*) AS Connects ,
SUM(Duration) AS Seconds ,
ROUND(( SUM(Duration) / 60.0 ), 2, 1) AS Minutes ,
XS AS Rate ,
SUM(( Duration / 60.0 ) * XS) AS Rated
FROM dbo.sip_calls
INNER JOIN dbo.rate_deck ON NPANXX = CN1
WHERE HasDuration = 1
GROUP BY NPANXX ,
XS ,
CalledNumber
UNION
SELECT COALESCE(NPANXX, 0) AS NPANXX ,
CalledNumber AS CalledNumber ,
COUNT(*) AS Connects ,
SUM(Duration) AS Seconds ,
ROUND(( SUM(Duration) / 60.0 ), 2, 1) AS Minutes ,
XS AS Rate ,
SUM(( Duration / 60.0 ) * XS) AS Rated
FROM dbo.sip_calls
INNER JOIN dbo.rate_deck ON NPANXX = CN2
WHERE HasDuration = 1
GROUP BY NPANXX ,
XS ,
CalledNumber
UNION
SELECT COALESCE(NPANXX, 0) AS NPANXX ,
CalledNumber AS CalledNumber ,
COUNT(*) AS Connects ,
SUM(Duration) AS Seconds ,
ROUND(( SUM(Duration) / 60.0 ), 2, 1) AS Minutes ,
XS AS Rate ,
SUM(( Duration / 60.0 ) * XS) AS Rated
FROM dbo.sip_calls
INNER JOIN dbo.rate_deck ON NPANXX = CN3
WHERE HasDuration = 1
GROUP BY NPANXX ,
XS ,
CalledNumber
ORDER BY NPANXX ASC;
SET STATISTICS IO OFF;
GO
DROP TABLE dbo.sip_calls;
DROP TABLE dbo.rate_deck;
GO
Worries me that a 3 hour execution can suddenly run in 4 seconds just by massaging the query, but if the results are correct then we've made excellent progress. I would use a similar technique for the rows that have no matching rate by changing to a right-join:
WITH SipCallsCTE AS
( SELECT CalledNumber
, CalledNumber1 = CalledNumber / 10000 - 1000000
, CalledNumber2 = CalledNumber / 10000
, CalledNumber3 = CalledNumber / 10000000
, duration
FROM sip_calls
WHERE DURATION > 0
)
SELECT CalledNumber
, Connects = count(*)
, Seconds = sum(duration)
, Minutes = round((sum(duration)/60.0),2,1)
FROM Rate_Deck
RIGHT
JOIN SipCallsCTE ON NPANXX IN ( CalledNumber1
, CalledNumber2
, CalledNumber3
)
WHERE NPANXX IS NULL
GROUP BY CalledNumber
ORDER BY CalledNumber
So with indexes... lookup will be fast...
With no indexes, 180K + 140K records must be read for each SELECT... which will take a very long time.
Fix: Add indexes for every column used as a lookup SELECT criteria.