Link to home
Start Free TrialLog in
Avatar of pzozulka
pzozulka

asked on

SQL Indexes on temp tables (best practices)

I came across an existing stored procedure that has performance issues.

The sproc below creates two temp tables, and creates clustered indexes for them.

My understanding was that clustered indexes are the same as the index in a textbook, which points directly at the data. Furthermore, it is my understanding that clustered indexes should only be used on unique columns such as primary keys.

Nonclustered indexes should be implemented on columns that are frequently used in WHERE clauses, and great for exact match queries.

1. Having said that, are the clustered indexes in the below sproc the correct choice of index?
2. Why is the clustered index being placed on so many columns? Shouldn't clustered indexes be only placed on a single primary key column?

The sproc creates a temp tables:
-- Accounts that match report filter criteria to be used as reference accounts to find duplicates   
CREATE TABLE #tempTableInRange      
(        
	CustomerAccountId INT NOT NULL,      
	CustomerAccountNumber INT,   
	AgentId INT,   
	StreetNumber VARCHAR(50),    
	PostalCode VARCHAR(50),    
	Address1 VARCHAR(50),    
	Address2 VARCHAR(50),    
	isReferenceAccount BIT ,    
	isPoBox BIT    
)     
CREATE CLUSTERED INDEX cx ON #tempTableInRange(CustomerAccountId, CustomerAccountNumber, AgentId, StreetNumber, PostalCode, Address1, Address2, isReferenceAccount, isPoBox)                  
  
-- Final result to find duplicate accounts 
CREATE TABLE #tempResult    
(      
CustomerAccountId INT NOT NULL,     
OriginalCustomerAccountId INT,      
OriginalCustomerAccountNumber INT,    
OriginalAgentId INT,
StreetNumber VARCHAR(50),    
PostalCode VARCHAR(50),    
Address1 VARCHAR(50),    
Address2 VARCHAR(50),    
isReferenceAccount BIT    
)   
CREATE CLUSTERED INDEX cx1 ON #tempResult(CustomerAccountId, OriginalCustomerAccountId, OriginalCustomerAccountNumber, OriginalAgentId, StreetNumber, PostalCode, Address1, Address2, isReferenceAccount)                  

Open in new window

Next, inserts records that match certain criteria into the temp tables.
 -- Street Address in Date Range  (Reference Accounts to which we are comparing) 
INSERT INTO #tempTableInRange    
SELECT ...
FROM [address] A    
	join TableB on ...
        join TableC on ...
WHERE  ... AND    
	(A.Address1 NOT LIKE '%PO BOX%'  AND      
	 A.Address1 NOT LIKE '%P O BOX%' AND      
	 A.Address1 NOT LIKE '%P.O.BOX%' AND      
	 A.Address1 NOT LIKE '%P.O. BOX%' AND    
	 A.Address1 NOT LIKE '%P. O. Box%'    
	) 

-- POBox in date range (Reference Accounts to which we are comparing)    
INSERT INTO #tempTableInRange    
SELECT ...    
FROM     
	(      
	SELECT ...
	FROM [address] A    
		join ...  
	WHERE   ...
                 AND    
		(A.Address1 LIKE '%PO BOX%' OR      
		 A.Address1 LIKE '%P O BOX%' OR      
		 A.Address1 LIKE '%P.O.BOX%' OR      
		 A.Address1 LIKE '%P.O. BOX%'   OR    
		 A.Address1 LIKE '%P. O. Box%'    --P.o.  Box 243    
		)             
	) TrimedAddressForPOBOX  


-- Find any duplicate street address  (Regardless of report criteria) 
INSERT INTO #tempResult    
SELECT DISTINCT OtherAccounts.Partyid,     
#tempTableInRange.CustomerAccountId AS OriginalCustomerAccountId,    
#tempTableInRange.CustomerAccountNumber AS OriginalCustomerAccountNumber, 
#tempTableInRange.AgentId AS OriginalAgentId,   
OtherAccounts.StreetNumber,     
... 
FROM     
(    
 select ...
 from [address] A    
 join...    
 WHERE ...    
) OtherAccounts    
JOIN #tempTableInRange ON  #tempTableInRange.isPoBox = 0 AND    
#tempTableInRange.StreetNumber = OtherAccounts.StreetNumber AND    
#tempTableInRange.PostalCode = OtherAccounts.PostalCode                     

WHERE   
OtherAccounts.PartyId NOT IN (SELECT CustomerAccountId FROM  #tempTableInRange)    
AND OtherAccounts.Address1 NOT LIKE '%PO BOX%'  AND      
    OtherAccounts.Address1 NOT LIKE '%P O BOX%' AND      
    OtherAccounts.Address1 NOT LIKE '%P.O.BOX%' AND      
    OtherAccounts.Address1 NOT LIKE '%P.O. BOX%' AND    
    OtherAccounts.Address1 NOT LIKE '%P. O. Box%'    
      
GROUP BY  #tempTableInRange.CustomerAccountId,            --Reference account    
#tempTableInRange.CustomerAccountNumber,        --Reference account    
#tempTableInRange.AgentId,						-- Reference account
OtherAccounts.Partyid,     
OtherAccounts.StreetNumber,      
OtherAccounts.PostalCode,      
AddressId,     
OtherAccounts.Address1,     
OtherAccounts.Address2  



-- Insert the reference accounts (the ones that we are comparing to)    
INSERT INTO #tempResult     
SELECT DISTINCT #tempTableInRange.CustomerAccountId,    
#tempTableInRange.CustomerAccountId AS OriginalCustomerAccountId,      
#tempTableInRange.CustomerAccountNumber AS OriginalCustomerAccountNumber,       
#tempTableInRange.AgentId AS OriginalAgentId,
#tempTableInRange.StreetNumber,     
#tempTableInRange.PostalCode,     
#tempTableInRange.Address1,     
#tempTableInRange.Address2,    
#tempTableInRange.isReferenceAccount      
FROM #tempTableInRange    
JOIN #tempResult ON     #tempResult.OriginalCustomerAccountId = #tempTableInRange.CustomerAccountId    
AND #tempResult.StreetNumber = #tempTableInRange.StreetNumber    
AND #tempResult.PostalCode = #tempTableInRange.PostalCode    
AND #tempResult.CustomerAccountId <> #tempTableInRange.CustomerAccountId 

Open in new window


Final Select
select *
from Entity
left join ...
left join ...
left join
(
   select #tempResult.OriginalAgentId, #tempResult.OriginalCustomerAccountNumber, ...
   from #tempResult
   group by #tempResult.OriginalAgentId, #tempResult.OriginalCustomerAccountNumber
   having sum(case when isReferenceAccount = 0 then 1 else 0 end) > 0
)  dups on ... = ... 

where .... 

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Scott Pletcher
Scott Pletcher
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of pzozulka
pzozulka

ASKER

1. Why No.

2. I've been doing a lot of research in the recent days, and read many stack overflow posts, and walked away with the understanding that you should use clustered indexes (one per table) on primary key column, and nonclustered indexes (one or many per table) on columns used for frequent exact searches (IDs/foreign keys, dates, etc.)

If the above is wrong, could you please point me to a resource you find helpful.

3. What is the rule on when to use a clustered index, and when to use a non clustered index.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
So from what I've gathered here, it appears that the clustered index was a fine choice (clus vs nonclus), but the columns were incorrect. Having said that, how would I correct the situation? Replace the columns  in that clus index with columns that are heavily used by the joins and where clause? Would you say this is a step in the right direction?

If yes, should I also add any nonclus indexes after fixing the above clus index?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'm curious, does the query perform better now?  How much?
The only thing I changed thus far was the indexes per your recommendation. While I can't get the true results from production environment, here are the results from the dev server.

Before: 14 seconds
After: 8 seconds

Does the order of keys matter when creating your index?
CREATE CLUSTERED INDEX cx ON #tempTbl ( PostalCode, StreetNumber, CustomerAccountId )
vs.
CREATE CLUSTERED INDEX cx ON #tempTbl ( CustomerAccountId , StreetNumber, PostalCode )

More details below in case you're curious. The huge scan count on one of the tables is freaking me out.

Before:
(347 row(s) affected)
Table '#tempResult_________________________________________________________________________________________________________00000000079A'. Scan count 0, logical reads 20725, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 83959, logical reads 317572, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tempTableInRange___________________________________________________________________________________________________000000000799'. Scan count 83961, logical reads 168776, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Address'. Scan count 1, logical reads 1411, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CustomerAccount'. Scan count 1, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(10150 row(s) affected)
Table '#tempResult_________________________________________________________________________________________________________00000000079A'. Scan count 0, logical reads 4768, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tempTableInRange___________________________________________________________________________________________________000000000799'. Scan count 3163, logical reads 72770, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CustomerAccount'. Scan count 0, logical reads 4204, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Address'. Scan count 1, logical reads 1411, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Open in new window

After
(347 row(s) affected)
Table '#tempResult_________________________________________________________________________________________________________00000000079E'. Scan count 0, logical reads 21142, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tempTableInRange___________________________________________________________________________________________________00000000079D'. Scan count 83961, logical reads 383550, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 84322, logical reads 184915, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Address'. Scan count 1, logical reads 1411, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CustomerAccount'. Scan count 1, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(10150 row(s) affected)
Table '#tempResult_________________________________________________________________________________________________________00000000079E'. Scan count 0, logical reads 3242, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tempTableInRange___________________________________________________________________________________________________00000000079D'. Scan count 3163, logical reads 39327, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CustomerAccount'. Scan count 0, logical reads 4204, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Address'. Scan count 1, logical reads 1411, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Open in new window

Yes, the order of keys is critical.  I put them in the correct order for the queries given.  The existence check for a CustomerAccountId will likely have to scan the table.

But SQL does love unique indexes, so if we can make them unique that could help:

CREATE TABLE #tempTableInRange      
(        
      Ident int IDENTITY(1, 1) NOT NULL,
      CustomerAccountId INT NOT NULL,      
      CustomerAccountNumber INT,  
      AgentId INT,  
      StreetNumber VARCHAR(50),    
      PostalCode VARCHAR(50),    
      Address1 VARCHAR(50),    
      Address2 VARCHAR(50),    
      isReferenceAccount BIT ,    
      isPoBox BIT    
)    
 
-- Final result to find duplicate accounts
CREATE TABLE #tempResult    
(      
Ident int IDENTITY(1, 1) NOT NULL,
CustomerAccountId INT NOT NULL,    
OriginalCustomerAccountId INT,      
OriginalCustomerAccountNumber INT,    
OriginalAgentId INT,
StreetNumber VARCHAR(50),    
PostalCode VARCHAR(50),    
Address1 VARCHAR(50),    
Address2 VARCHAR(50),    
isReferenceAccount BIT    
)  


CREATE UNIQUE CLUSTERED INDEX cx ON #tempTableInRange ( PostalCode, StreetNumber, CustomerAccountId, Ident )

CREATE UNIQUE CLUSTERED INDEX cx1 ON #tempResult ( PostalCode, StreetNumber, OriginalCustomerAccountId, Ident )
If order matters, I think the correct order should be StreetNumber, PostalCode, CustomerAccountId, Ident.

In the first Insert into #tempTableInRange, all the JOINs and WHERE conditions are happening on physical real columns, not relating to temp tables. So I'm ignoring this block of code.

In the second insert into #tempTableInRange, same story, all the data being JOINed is on real tables. Ignoring this.

Then when inserting into #rempResult, we JOIN on #tempTableInRange on the following:
JOIN
#tempTableInRange ON  #tempTableInRange.isPoBox = 0 AND    
#tempTableInRange.StreetNumber = OtherAccounts.StreetNumber AND    
#tempTableInRange.PostalCode = OtherAccounts.PostalCode

WHERE  
OtherAccounts.PartyId NOT IN (SELECT CustomerAccountId FROM  #tempTableInRange)    
AND OtherAccounts.StreetNumber <> ''    
  AND (    OtherAccounts.Address1 NOT LIKE '%PO BOX%'  AND      
    OtherAccounts.Address1 NOT LIKE '%P O BOX%' AND      
    OtherAccounts.Address1 NOT LIKE '%P.O.BOX%' AND      
    OtherAccounts.Address1 NOT LIKE '%P.O. BOX%' AND    
    OtherAccounts.Address1 NOT LIKE '%P. O. Box%'    
     )


Then on the next insert into #tempResult, again, the same JOIN happens as above, on StreetNumber first, then on PostalCode.


Finally, the last insert into #tempResult before we do a final select on everything. This insert has the following JOIN:
FROM
#tempTableInRange    
JOIN #tempResult ON     #tempResult.OriginalCustomerAccountId = #tempTableInRange.CustomerAccountId    
AND #tempResult.StreetNumber = #tempTableInRange.StreetNumber    
AND #tempResult.PostalCode = #tempTableInRange.PostalCode    
AND #tempResult.CustomerAccountId <> #tempTableInRange.CustomerAccountId

Shouldn't the index key order for #tempResult be (OriginalCustomerAccountId , StreetNumber , PostalCode , Ident)?


Either way, the Ident helped only slightly, here's the results:

Original: 13 sec
After applying the correct index: 8.74 sec
After making the index unique and adding Ident: 8.16 sec
Any further tuning would require the *full* SQL text and the resulting query plan.

PostalCode will group records better for processing, which is why it is the first key.  SQL can still test the PostalCode first as a key even if you code it second in the SQL query.

That is, if the table is: tableA ( col1 int, col2 varchar(30), primary key ( col1, col2 ))
if i write the query as:
select *
from tablea
where col2 = 'abc' and col1 = 1 /* rather than col1 = 1 and col2 = 'abc' */
SQL will still be able to do a single, keyed lookup for ( 1, 'abc' ) using the clus index.