SQL Indexes on temp tables (best practices)

I came across an existing stored procedure that has performance issues.

The sproc below creates two temp tables, and creates clustered indexes for them.

My understanding was that clustered indexes are the same as the index in a textbook, which points directly at the data. Furthermore, it is my understanding that clustered indexes should only be used on unique columns such as primary keys.

Nonclustered indexes should be implemented on columns that are frequently used in WHERE clauses, and great for exact match queries.

1. Having said that, are the clustered indexes in the below sproc the correct choice of index?
2. Why is the clustered index being placed on so many columns? Shouldn't clustered indexes be only placed on a single primary key column?

The sproc creates a temp tables:
-- Accounts that match report filter criteria to be used as reference accounts to find duplicates   
CREATE TABLE #tempTableInRange      
(        
	CustomerAccountId INT NOT NULL,      
	CustomerAccountNumber INT,   
	AgentId INT,   
	StreetNumber VARCHAR(50),    
	PostalCode VARCHAR(50),    
	Address1 VARCHAR(50),    
	Address2 VARCHAR(50),    
	isReferenceAccount BIT ,    
	isPoBox BIT    
)     
CREATE CLUSTERED INDEX cx ON #tempTableInRange(CustomerAccountId, CustomerAccountNumber, AgentId, StreetNumber, PostalCode, Address1, Address2, isReferenceAccount, isPoBox)                  
  
-- Final result to find duplicate accounts 
CREATE TABLE #tempResult    
(      
CustomerAccountId INT NOT NULL,     
OriginalCustomerAccountId INT,      
OriginalCustomerAccountNumber INT,    
OriginalAgentId INT,
StreetNumber VARCHAR(50),    
PostalCode VARCHAR(50),    
Address1 VARCHAR(50),    
Address2 VARCHAR(50),    
isReferenceAccount BIT    
)   
CREATE CLUSTERED INDEX cx1 ON #tempResult(CustomerAccountId, OriginalCustomerAccountId, OriginalCustomerAccountNumber, OriginalAgentId, StreetNumber, PostalCode, Address1, Address2, isReferenceAccount)                  

Open in new window

Next, inserts records that match certain criteria into the temp tables.
 -- Street Address in Date Range  (Reference Accounts to which we are comparing) 
INSERT INTO #tempTableInRange    
SELECT ...
FROM [address] A    
	join TableB on ...
        join TableC on ...
WHERE  ... AND    
	(A.Address1 NOT LIKE '%PO BOX%'  AND      
	 A.Address1 NOT LIKE '%P O BOX%' AND      
	 A.Address1 NOT LIKE '%P.O.BOX%' AND      
	 A.Address1 NOT LIKE '%P.O. BOX%' AND    
	 A.Address1 NOT LIKE '%P. O. Box%'    
	) 

-- POBox in date range (Reference Accounts to which we are comparing)    
INSERT INTO #tempTableInRange    
SELECT ...    
FROM     
	(      
	SELECT ...
	FROM [address] A    
		join ...  
	WHERE   ...
                 AND    
		(A.Address1 LIKE '%PO BOX%' OR      
		 A.Address1 LIKE '%P O BOX%' OR      
		 A.Address1 LIKE '%P.O.BOX%' OR      
		 A.Address1 LIKE '%P.O. BOX%'   OR    
		 A.Address1 LIKE '%P. O. Box%'    --P.o.  Box 243    
		)             
	) TrimedAddressForPOBOX  


-- Find any duplicate street address  (Regardless of report criteria) 
INSERT INTO #tempResult    
SELECT DISTINCT OtherAccounts.Partyid,     
#tempTableInRange.CustomerAccountId AS OriginalCustomerAccountId,    
#tempTableInRange.CustomerAccountNumber AS OriginalCustomerAccountNumber, 
#tempTableInRange.AgentId AS OriginalAgentId,   
OtherAccounts.StreetNumber,     
... 
FROM     
(    
 select ...
 from [address] A    
 join...    
 WHERE ...    
) OtherAccounts    
JOIN #tempTableInRange ON  #tempTableInRange.isPoBox = 0 AND    
#tempTableInRange.StreetNumber = OtherAccounts.StreetNumber AND    
#tempTableInRange.PostalCode = OtherAccounts.PostalCode                     

WHERE   
OtherAccounts.PartyId NOT IN (SELECT CustomerAccountId FROM  #tempTableInRange)    
AND OtherAccounts.Address1 NOT LIKE '%PO BOX%'  AND      
    OtherAccounts.Address1 NOT LIKE '%P O BOX%' AND      
    OtherAccounts.Address1 NOT LIKE '%P.O.BOX%' AND      
    OtherAccounts.Address1 NOT LIKE '%P.O. BOX%' AND    
    OtherAccounts.Address1 NOT LIKE '%P. O. Box%'    
      
GROUP BY  #tempTableInRange.CustomerAccountId,            --Reference account    
#tempTableInRange.CustomerAccountNumber,        --Reference account    
#tempTableInRange.AgentId,						-- Reference account
OtherAccounts.Partyid,     
OtherAccounts.StreetNumber,      
OtherAccounts.PostalCode,      
AddressId,     
OtherAccounts.Address1,     
OtherAccounts.Address2  



-- Insert the reference accounts (the ones that we are comparing to)    
INSERT INTO #tempResult     
SELECT DISTINCT #tempTableInRange.CustomerAccountId,    
#tempTableInRange.CustomerAccountId AS OriginalCustomerAccountId,      
#tempTableInRange.CustomerAccountNumber AS OriginalCustomerAccountNumber,       
#tempTableInRange.AgentId AS OriginalAgentId,
#tempTableInRange.StreetNumber,     
#tempTableInRange.PostalCode,     
#tempTableInRange.Address1,     
#tempTableInRange.Address2,    
#tempTableInRange.isReferenceAccount      
FROM #tempTableInRange    
JOIN #tempResult ON     #tempResult.OriginalCustomerAccountId = #tempTableInRange.CustomerAccountId    
AND #tempResult.StreetNumber = #tempTableInRange.StreetNumber    
AND #tempResult.PostalCode = #tempTableInRange.PostalCode    
AND #tempResult.CustomerAccountId <> #tempTableInRange.CustomerAccountId 

Open in new window


Final Select
select *
from Entity
left join ...
left join ...
left join
(
   select #tempResult.OriginalAgentId, #tempResult.OriginalCustomerAccountNumber, ...
   from #tempResult
   group by #tempResult.OriginalAgentId, #tempResult.OriginalCustomerAccountNumber
   having sum(case when isReferenceAccount = 0 then 1 else 0 end) > 0
)  dups on ... = ... 

where .... 

Open in new window

LVL 8
pzozulkaAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Scott PletcherSenior DBACommented:
1. No.
2. NO.  The clustered index key(s) should be based on how the table is actually read/searched, not on some arbitrary rule, actually a myth, about a single column key.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
pzozulkaAuthor Commented:
1. Why No.

2. I've been doing a lot of research in the recent days, and read many stack overflow posts, and walked away with the understanding that you should use clustered indexes (one per table) on primary key column, and nonclustered indexes (one or many per table) on columns used for frequent exact searches (IDs/foreign keys, dates, etc.)

If the above is wrong, could you please point me to a resource you find helpful.

3. What is the rule on when to use a clustered index, and when to use a non clustered index.
Scott PletcherSenior DBACommented:
1.  Because the indexes are too wide (too many columns) and the leading columns are not what are in WHERE clauses or JOINs.

2.  Again, NO.  The clustered index and the PK do not necessarily have anything to do with each other.  For some tables, they will be the same, but NOT for many tables.  I have many tables that do not have a PK at all (yes, relational purists don't like it, but functionally, I don't need one).

3.  If you have no clustered index, use a clus rather than nonclus.  Never create a non-clus index without a clus index (there might be a few extremely rare exceptions to this, but overall it's a very good rule).
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

pzozulkaAuthor Commented:
So from what I've gathered here, it appears that the clustered index was a fine choice (clus vs nonclus), but the columns were incorrect. Having said that, how would I correct the situation? Replace the columns  in that clus index with columns that are heavily used by the joins and where clause? Would you say this is a step in the right direction?

If yes, should I also add any nonclus indexes after fixing the above clus index?
Scott PletcherSenior DBACommented:
CREATE CLUSTERED INDEX cx ON #tempTableInRange ( PostalCode, StreetNumber, CustomerAccountId )

CREATE CLUSTERED INDEX cx1 ON #tempResult ( PostalCode, StreetNumber, OriginalCustomerAccountId )

As to nonclustered indexes, I wouldn't create any.  It's very rare that temp tables needs nonclustered indexes.  They would have to referenced multiple times for it to be worth the time to build them.


Also, the WHERE conditions bolded below should be moved into the query that creates the derived table "OtherAccounts".  It's wasted effort to have SQL output a bunch of rows just to immediately filter them out.

-- Find any duplicate street address  (Regardless of report criteria)
INSERT INTO #tempResult    
SELECT DISTINCT OtherAccounts.Partyid,    
#tempTableInRange.CustomerAccountId AS OriginalCustomerAccountId,    
#tempTableInRange.CustomerAccountNumber AS OriginalCustomerAccountNumber,
#tempTableInRange.AgentId AS OriginalAgentId,  
OtherAccounts.StreetNumber,    
...
FROM    
(    
 select ...
 from [address] A    
 join...    
 WHERE ...   --++Add WHERE conditions from below into here
) OtherAccounts    
JOIN #tempTableInRange ON  #tempTableInRange.isPoBox = 0 AND    
#tempTableInRange.StreetNumber = OtherAccounts.StreetNumber AND    
#tempTableInRange.PostalCode = OtherAccounts.PostalCode                    

WHERE  
OtherAccounts.PartyId NOT IN (SELECT CustomerAccountId FROM  #tempTableInRange)    
AND OtherAccounts.Address1 NOT LIKE '%PO BOX%'  AND      
    OtherAccounts.Address1 NOT LIKE '%P O BOX%' AND      
    OtherAccounts.Address1 NOT LIKE '%P.O.BOX%' AND      
    OtherAccounts.Address1 NOT LIKE '%P.O. BOX%' AND    
    OtherAccounts.Address1 NOT LIKE '%P. O. Box%'


GROUP BY ...
Scott PletcherSenior DBACommented:
I'm curious, does the query perform better now?  How much?
pzozulkaAuthor Commented:
The only thing I changed thus far was the indexes per your recommendation. While I can't get the true results from production environment, here are the results from the dev server.

Before: 14 seconds
After: 8 seconds

Does the order of keys matter when creating your index?
CREATE CLUSTERED INDEX cx ON #tempTbl ( PostalCode, StreetNumber, CustomerAccountId )
vs.
CREATE CLUSTERED INDEX cx ON #tempTbl ( CustomerAccountId , StreetNumber, PostalCode )

More details below in case you're curious. The huge scan count on one of the tables is freaking me out.

Before:
(347 row(s) affected)
Table '#tempResult_________________________________________________________________________________________________________00000000079A'. Scan count 0, logical reads 20725, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 83959, logical reads 317572, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tempTableInRange___________________________________________________________________________________________________000000000799'. Scan count 83961, logical reads 168776, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Address'. Scan count 1, logical reads 1411, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CustomerAccount'. Scan count 1, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(10150 row(s) affected)
Table '#tempResult_________________________________________________________________________________________________________00000000079A'. Scan count 0, logical reads 4768, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tempTableInRange___________________________________________________________________________________________________000000000799'. Scan count 3163, logical reads 72770, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CustomerAccount'. Scan count 0, logical reads 4204, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Address'. Scan count 1, logical reads 1411, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Open in new window

After
(347 row(s) affected)
Table '#tempResult_________________________________________________________________________________________________________00000000079E'. Scan count 0, logical reads 21142, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tempTableInRange___________________________________________________________________________________________________00000000079D'. Scan count 83961, logical reads 383550, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 84322, logical reads 184915, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Address'. Scan count 1, logical reads 1411, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CustomerAccount'. Scan count 1, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(10150 row(s) affected)
Table '#tempResult_________________________________________________________________________________________________________00000000079E'. Scan count 0, logical reads 3242, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tempTableInRange___________________________________________________________________________________________________00000000079D'. Scan count 3163, logical reads 39327, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CustomerAccount'. Scan count 0, logical reads 4204, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Address'. Scan count 1, logical reads 1411, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Open in new window

Scott PletcherSenior DBACommented:
Yes, the order of keys is critical.  I put them in the correct order for the queries given.  The existence check for a CustomerAccountId will likely have to scan the table.

But SQL does love unique indexes, so if we can make them unique that could help:

CREATE TABLE #tempTableInRange      
(        
      Ident int IDENTITY(1, 1) NOT NULL,
      CustomerAccountId INT NOT NULL,      
      CustomerAccountNumber INT,  
      AgentId INT,  
      StreetNumber VARCHAR(50),    
      PostalCode VARCHAR(50),    
      Address1 VARCHAR(50),    
      Address2 VARCHAR(50),    
      isReferenceAccount BIT ,    
      isPoBox BIT    
)    
 
-- Final result to find duplicate accounts
CREATE TABLE #tempResult    
(      
Ident int IDENTITY(1, 1) NOT NULL,
CustomerAccountId INT NOT NULL,    
OriginalCustomerAccountId INT,      
OriginalCustomerAccountNumber INT,    
OriginalAgentId INT,
StreetNumber VARCHAR(50),    
PostalCode VARCHAR(50),    
Address1 VARCHAR(50),    
Address2 VARCHAR(50),    
isReferenceAccount BIT    
)  


CREATE UNIQUE CLUSTERED INDEX cx ON #tempTableInRange ( PostalCode, StreetNumber, CustomerAccountId, Ident )

CREATE UNIQUE CLUSTERED INDEX cx1 ON #tempResult ( PostalCode, StreetNumber, OriginalCustomerAccountId, Ident )
pzozulkaAuthor Commented:
If order matters, I think the correct order should be StreetNumber, PostalCode, CustomerAccountId, Ident.

In the first Insert into #tempTableInRange, all the JOINs and WHERE conditions are happening on physical real columns, not relating to temp tables. So I'm ignoring this block of code.

In the second insert into #tempTableInRange, same story, all the data being JOINed is on real tables. Ignoring this.

Then when inserting into #rempResult, we JOIN on #tempTableInRange on the following:
JOIN
#tempTableInRange ON  #tempTableInRange.isPoBox = 0 AND    
#tempTableInRange.StreetNumber = OtherAccounts.StreetNumber AND    
#tempTableInRange.PostalCode = OtherAccounts.PostalCode

WHERE  
OtherAccounts.PartyId NOT IN (SELECT CustomerAccountId FROM  #tempTableInRange)    
AND OtherAccounts.StreetNumber <> ''    
  AND (    OtherAccounts.Address1 NOT LIKE '%PO BOX%'  AND      
    OtherAccounts.Address1 NOT LIKE '%P O BOX%' AND      
    OtherAccounts.Address1 NOT LIKE '%P.O.BOX%' AND      
    OtherAccounts.Address1 NOT LIKE '%P.O. BOX%' AND    
    OtherAccounts.Address1 NOT LIKE '%P. O. Box%'    
     )


Then on the next insert into #tempResult, again, the same JOIN happens as above, on StreetNumber first, then on PostalCode.


Finally, the last insert into #tempResult before we do a final select on everything. This insert has the following JOIN:
FROM
#tempTableInRange    
JOIN #tempResult ON     #tempResult.OriginalCustomerAccountId = #tempTableInRange.CustomerAccountId    
AND #tempResult.StreetNumber = #tempTableInRange.StreetNumber    
AND #tempResult.PostalCode = #tempTableInRange.PostalCode    
AND #tempResult.CustomerAccountId <> #tempTableInRange.CustomerAccountId

Shouldn't the index key order for #tempResult be (OriginalCustomerAccountId , StreetNumber , PostalCode , Ident)?


Either way, the Ident helped only slightly, here's the results:

Original: 13 sec
After applying the correct index: 8.74 sec
After making the index unique and adding Ident: 8.16 sec
Scott PletcherSenior DBACommented:
Any further tuning would require the *full* SQL text and the resulting query plan.

PostalCode will group records better for processing, which is why it is the first key.  SQL can still test the PostalCode first as a key even if you code it second in the SQL query.

That is, if the table is: tableA ( col1 int, col2 varchar(30), primary key ( col1, col2 ))
if i write the query as:
select *
from tablea
where col2 = 'abc' and col1 = 1 /* rather than col1 = 1 and col2 = 'abc' */
SQL will still be able to do a single, keyed lookup for ( 1, 'abc' ) using the clus index.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.