Link to home
Start Free TrialLog in
Avatar of Greg Watkins
Greg Watkins

asked on

SQL query takes a long time to run

I have SQL query that calculates some totals based upon entries between two tables. The problem I have is that the query takes a very long time to run.  One of the tables is the rate_deck table and the other one is the sip_calls table.  The rate_deck table has about 180,000 records and the sip_calls table has about 140,000 records. In this scenario it took over 3 hours for the query to run. I have tried creating indexes on both tables but it did not help. I am importing the data for both tables from csv files so I have full control over the tables schema. Below I have the current schema for both tables and some sample data and a copy of the query. What I need is some advice on what can be done to speed up the query.

CREATE TABLE sip_calls(
  CalledNumber       BIGINT  NOT NULL
  ,Duration           BIGINT  NOT NULL  
);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (13178268627,265);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (8884254800,24);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (7088505327,66);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (13132311580,0);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (4373413793,7);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (8004254800,24);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (19991234560,123);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (9991234567,12);
INSERT INTO sip_calls(CalledNumber,Duration) VALUES (13132311580,48);

CREATE TABLE rate_deck(
   NPANXX INT  NOT NULL PRIMARY KEY
  ,XS     NUMERIC(7,5) NOT NULL
  ,ST     NUMERIC(7,5) NOT NULL
  ,XX     NUMERIC(7,5) NOT NULL
);
INSERT INTO rate_deck(NPANXX,XS,ST,XX) VALUES (317826,0.0033,0.00243,0.00243);
INSERT INTO rate_deck(NPANXX,XS,ST,XX) VALUES (313231,0.0031,0.00243,0.00243);
INSERT INTO rate_deck(NPANXX,XS,ST,XX) VALUES (708850,0.0019,0.00243,0.00243);
INSERT INTO rate_deck(NPANXX,XS,ST,XX) VALUES (888,0.001,0.00165,0.00165);
INSERT INTO rate_deck(NPANXX,XS,ST,XX) VALUES (800,0.001,0.00165,0.00165);

Open in new window


select
       coalesce(NPANXX,0)   as NPANXX
     , CalledNumber			as CalledNumber
     , count(*)             as Connects
     , sum(duration)        as Seconds
     , round((sum(duration)/60.0),2,1) as Minutes
     , XS					as Rate
     , sum((duration/60.0) * XS) as Rated
from sip_calls
inner join Rate_Deck on (CalledNumber / 10000) = (NPANXX + 1000000) 
                    or (CalledNumber / 10000) = NPANXX
                    or (CalledNumber / 10000000) = NPANXX
WHERE DURATION > 0
group by
      NPANXX, XS, CalledNumber 
order by NPANXX asc

Open in new window

Avatar of David Favor
David Favor
Flag of United States of America image

Looks like you have no indexes, so a full table scan must be done.

So with indexes... lookup will be fast...

With no indexes, 180K + 140K records must be read for each SELECT... which will take a very long time.

Fix: Add indexes for every column used as a lookup SELECT criteria.
Given the computations in the JOIN, I'd suggest trying a separation of the requests and UNIONing the results:


select
       coalesce(NPANXX,0)   as NPANXX
     , CalledNumber                  as CalledNumber
     , count(*)             as Connects
     , sum(duration)        as Seconds
     , round((sum(duration)/60.0),2,1) as Minutes
     , XS                              as Rate
     , sum((duration/60.0) * XS) as Rated
from sip_calls
inner join Rate_Deck on NPANXX = CalledNumber / 10000 - 1000000 /* I changed the + 1000000 to - 1000000 on the other side */
WHERE DURATION > 0
group by
      NPANXX, XS, CalledNumber
union all
select
       coalesce(NPANXX,0)   as NPANXX
     , CalledNumber                  as CalledNumber
     , count(*)             as Connects
     , sum(duration)        as Seconds
     , round((sum(duration)/60.0),2,1) as Minutes
     , XS                              as Rate
     , sum((duration/60.0) * XS) as Rated
from sip_calls
inner join Rate_Deck on NPANXX = CalledNumber / 10000
WHERE DURATION > 0
group by
      NPANXX, XS, CalledNumber
union all
select
       coalesce(NPANXX,0)   as NPANXX
     , CalledNumber                  as CalledNumber
     , count(*)             as Connects
     , sum(duration)        as Seconds
     , round((sum(duration)/60.0),2,1) as Minutes
     , XS                              as Rate
     , sum((duration/60.0) * XS) as Rated
from sip_calls
inner join Rate_Deck on NPANXX = CalledNumber / 1000000
WHERE DURATION > 0
group by
      NPANXX, XS, CalledNumber
order by NPANXX asc
Avatar of Greg Watkins
Greg Watkins

ASKER

Sorry David. I should have included the indexes that I built which didn't seem to help. Here they are. Maybe there is some problem with how I created the indexes?

USE [Netrio]
GO

/****** Object:  Index [PK_NetrioSIPRateDeck]    Script Date: 4/23/2019 10:22:07 AM ******/
ALTER TABLE [dbo].[Rate_Deck] ADD  CONSTRAINT [PK_NetrioSIPRateDeck] PRIMARY KEY CLUSTERED 
(
	[NPANXX] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

/****** Object:  Index [NonClusteredIndex-20190418-133232]    Script Date: 4/23/2019 10:18:35 AM ******/
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20190418-133232] ON [dbo].[Rate_Deck]
(
	[NPANXX] ASC
)
INCLUDE ( 	[XS]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

Open in new window


USE [Netrio]
GO

/****** Object:  Index [ClusteredIndex-20190418-133309]    Script Date: 4/23/2019 10:21:00 AM ******/
CREATE CLUSTERED INDEX [ClusteredIndex-20190418-133309] ON [dbo].[Sip_Calls]
(
	[CalledNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

/****** Object:  Index [NonClusteredIndex-20190418-133325]    Script Date: 4/23/2019 10:21:33 AM ******/
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20190418-133325] ON [dbo].[Sip_Calls]
(
	[CalledNumber] ASC
)
INCLUDE ( 	[Duration]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

Open in new window

Hi Greg,

Do you have an explain plan of the query?  For this to run for 3 hours on such a relatively small number of rows suggests that the indexes aren't being used, a Cartesian Product is producing a huge number of rows, or both.
Hi Kent. I'm s not sure what you mean by an "explain plan" but in general the query looks at the called number field in the sip calls table and tries to find a matching NPANXX value in the rate deck table for the first part of the called number and if it finds a match then it sums the duration of the calls with that called number, converts the duration seconds to minutes, and then multiplies it by the XS field value to come up with a Rated value (cost of the calls). I got the query from another expert on a separate question.
Hi Greg,

An explain plan will produce a description of how the query is broken down and run, with statistics to show where most of the resources are being consumed.  It can be generated by executing the query, or by estimating what the query is expected to do.

In SSMS, run this statement:

  SET SHOWPLAN_ALL ON

Then run your query.  SQL Server should estimate the execution.  If you can post the results here we'll have a better handle on what can be done to improve it.

When you're done, either close that window or in that same window run:

  SET SHOWPLAN_ALL OFF
Again, with OR clauses in the JOIN, SQL must typically do a table scan of the second table for every row in the first table, rather than doing some type of lookup.  That will result in ~140K-160K table scans, which will be very, very slow.

If you break apart the OR clauses into separate queries, SQL should be able to, hopefully, do some type of index seek/match on the rate_deck table.  That would mean only 3 scans of the sip_calls table.  Needless to say, 3 scans should be massively faster than 100K+ scans.

Given the ORs and the computations in the JOIN conditions, I wouldn't expect any index(es) to be able to be used for the original query.

If you're willing to try the UNIONed SQL, I posted it as the second post to this q.
ASKER CERTIFIED SOLUTION
Avatar of John_Vidmar
John_Vidmar
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks John. I will give it try.
Wow! John, that made a huge difference. It completed in about 4 seconds!

Thanks everyone!
John, I have one more similar query that I could use your help with. It uses the same tables but it calculates the data for calls in the sip_calls table that do not have a matching NPANXX in the rate_table. Here is the current query.

select
       CalledNumber			as CalledNumber
     , count(*)             as Connects
     , sum(duration)        as Seconds
     , round((sum(duration)/60.0),2,1) as Minutes
from sip_calls
left join Rate_Deck on (CalledNumber / 10000) = (NPANXX + 1000000) 
                    or (CalledNumber / 10000) = NPANXX
                    or (CalledNumber / 10000000) = NPANXX
WHERE sip_calls.DURATION > 0
  AND Rate_Deck.NPANXX IS NULL
group by
      CalledNumber 
order by CalledNumber asc

Open in new window

As Scott already wrote the problem is for sure the OR in the JOIN.

I would add persisted computed columns to shift the calculus to the import. Then try indexing those.

And just for clean code, do the arithmetic's on only one side ( the table with lesser rows).

ON CalledNumber / 10000 - 1000000 = NPANXX  
OR CalledNumber / 10000 = NPANXX
OR CalledNumber / 10000000 = NPANXX

Open in new window

Persist these three call number derivates and index them (covering indices). Using then the UNION approach should be the fastest.

But the given sample seems not to be significant:

USE tempdb;
GO

CREATE TABLE dbo.sip_calls (
    CalledNumber BIGINT NOT NULL ,
    Duration BIGINT NOT NULL ,
    CN1 AS CalledNumber / 10000 - 1000000 PERSISTED ,
    CN2 AS CalledNumber / 10000 PERSISTED ,
    CN3 AS CalledNumber / 10000000 PERSISTED ,
    HasDuration AS CAST(IIF(Duration > 0, 1, 0) AS BIT)PERSISTED
);

CREATE INDEX IX_CN1
    ON dbo.sip_calls (
    HasDuration DESC ,
    CN1 ,
    Duration ,
    CalledNumber );
CREATE INDEX IX_CN2
    ON dbo.sip_calls (
    HasDuration DESC ,
    CN2 ,
    Duration ,
    CalledNumber );
CREATE INDEX IX_CN3
    ON dbo.sip_calls (
    HasDuration DESC ,
    CN3 ,
    Duration ,
    CalledNumber );

CREATE TABLE dbo.rate_deck (
    NPANXX INT NOT NULL ,
    XS NUMERIC(7, 5) NOT NULL ,
    ST NUMERIC(7, 5) NOT NULL ,
    XX NUMERIC(7, 5) NOT NULL ,
    CONSTRAINT PK_RateDeack
        PRIMARY KEY ( NPANXX )
);

INSERT INTO dbo.sip_calls ( CalledNumber ,
                            Duration )
VALUES ( 13178268627, 265 ) ,
       ( 8884254800, 24 ) ,
       ( 7088505327, 66 ) ,
       ( 13132311580, 0 ) ,
       ( 4373413793, 7 ) ,
       ( 8004254800, 24 ) ,
       ( 19991234560, 123 ) ,
       ( 9991234567, 12 ) ,
       ( 13132311580, 48 );

INSERT INTO dbo.rate_deck ( NPANXX ,
                            XS ,
                            ST ,
                            XX )
VALUES ( 317826, 0.0033, 0.00243, 0.00243 ) ,
       ( 313231, 0.0031, 0.00243, 0.00243 ) ,
       ( 708850, 0.0019, 0.00243, 0.00243 ) ,
       ( 888, 0.001, 0.00165, 0.00165 ) ,
       ( 800, 0.001, 0.00165, 0.00165 );
GO

SET STATISTICS IO ON;
SELECT   COALESCE(NPANXX, 0) AS NPANXX ,
         CalledNumber AS CalledNumber ,
         COUNT(*) AS Connects ,
         SUM(Duration) AS Seconds ,
         ROUND(( SUM(Duration) / 60.0 ), 2, 1) AS Minutes ,
         XS AS Rate ,
         SUM(( Duration / 60.0 ) * XS) AS Rated
FROM     dbo.sip_calls
         INNER JOIN dbo.rate_deck ON ( CalledNumber / 10000 ) = ( NPANXX + 1000000 )
                                     OR ( CalledNumber / 10000 ) = NPANXX
                                     OR ( CalledNumber / 10000000 ) = NPANXX
WHERE    Duration > 0
GROUP BY NPANXX ,
         XS ,
         CalledNumber
ORDER BY NPANXX ASC;

SELECT   COALESCE(NPANXX, 0) AS NPANXX ,
         CalledNumber AS CalledNumber ,
         COUNT(*) AS Connects ,
         SUM(Duration) AS Seconds ,
         ROUND(( SUM(Duration) / 60.0 ), 2, 1) AS Minutes ,
         XS AS Rate ,
         SUM(( Duration / 60.0 ) * XS) AS Rated
FROM     dbo.sip_calls
         INNER JOIN dbo.rate_deck ON NPANXX IN ( CN1, CN2, CN3 )
WHERE    HasDuration = 1
GROUP BY NPANXX ,
         XS ,
         CalledNumber
ORDER BY NPANXX ASC;


SELECT   COALESCE(NPANXX, 0) AS NPANXX ,
         CalledNumber AS CalledNumber ,
         COUNT(*) AS Connects ,
         SUM(Duration) AS Seconds ,
         ROUND(( SUM(Duration) / 60.0 ), 2, 1) AS Minutes ,
         XS AS Rate ,
         SUM(( Duration / 60.0 ) * XS) AS Rated
FROM     dbo.sip_calls
         INNER JOIN dbo.rate_deck ON NPANXX = CN1
WHERE    HasDuration = 1
GROUP BY NPANXX ,
         XS ,
         CalledNumber
UNION
SELECT   COALESCE(NPANXX, 0) AS NPANXX ,
         CalledNumber AS CalledNumber ,
         COUNT(*) AS Connects ,
         SUM(Duration) AS Seconds ,
         ROUND(( SUM(Duration) / 60.0 ), 2, 1) AS Minutes ,
         XS AS Rate ,
         SUM(( Duration / 60.0 ) * XS) AS Rated
FROM     dbo.sip_calls
         INNER JOIN dbo.rate_deck ON NPANXX = CN2
WHERE    HasDuration = 1
GROUP BY NPANXX ,
         XS ,
         CalledNumber
UNION
SELECT   COALESCE(NPANXX, 0) AS NPANXX ,
         CalledNumber AS CalledNumber ,
         COUNT(*) AS Connects ,
         SUM(Duration) AS Seconds ,
         ROUND(( SUM(Duration) / 60.0 ), 2, 1) AS Minutes ,
         XS AS Rate ,
         SUM(( Duration / 60.0 ) * XS) AS Rated
FROM     dbo.sip_calls
         INNER JOIN dbo.rate_deck ON NPANXX = CN3
WHERE    HasDuration = 1
GROUP BY NPANXX ,
         XS ,
         CalledNumber
ORDER BY NPANXX ASC;

SET STATISTICS IO OFF;
GO

DROP TABLE dbo.sip_calls;
DROP TABLE dbo.rate_deck;
GO

Open in new window

Worries me that a 3 hour execution can suddenly run in 4 seconds just by massaging the query, but if the results are correct then we've made excellent progress.  I would use a similar technique for the rows that have no matching rate by changing to a right-join:
WITH SipCallsCTE AS
(	SELECT	CalledNumber
	,	CalledNumber1	= CalledNumber / 10000 - 1000000
	,	CalledNumber2	= CalledNumber / 10000
	,	CalledNumber3	= CalledNumber / 10000000
	,	duration
	FROM	sip_calls
	WHERE	DURATION > 0
)
SELECT	CalledNumber
,		Connects	= count(*)
,		Seconds		= sum(duration)
,		Minutes		= round((sum(duration)/60.0),2,1)
FROM	Rate_Deck
RIGHT
JOIN	SipCallsCTE	ON NPANXX IN	( CalledNumber1
						, CalledNumber2
						, CalledNumber3
						)
WHERE	NPANXX IS NULL
GROUP BY	CalledNumber 
ORDER BY	CalledNumber

Open in new window