?
Solved

Help with index formulation

Posted on 2002-07-17
12
Medium Priority
?
160 Views
Last Modified: 2008-02-26
Hi, I have a query which goes something like,

select T1.A,T1.B,T1.C,T1.D,T1.E,T2.A
from T1 Left Outer Join T2 On T1.D = T2.M
where T1.A = 'zz' and T1.B = "yy"

Should I have a index on T1.A,T1.B and a seperate index on T1.D or should I combine them into one??
0
Comment
Question by:doshi_dipen
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 3
  • +2
12 Comments
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 7159758
At first glance i would say to keep the indexes as they are. You might check with the query analysis tools if another index is recommended...

CHeers
0
 

Author Comment

by:doshi_dipen
ID: 7159786
Will the sql server do the where condition first or will it left join first and than filter the result set based on the where clause??
0
 
LVL 32

Expert Comment

by:Brendt Hess
ID: 7159902
You can see what SQL Server will do by running the statement in Query Analyzer, and selecting Query/Show Execution Plan.  This will show you exactly how SQL Server runs the query.

You can also use Query/Display Estimated Execution Plan to get an idea of how SQL Server will run the query.
Based on your query, you only need an index on T1 on A and B, and an index on T2 on M
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 7160031
Whether or not you need an index requires a deeper look I think.

I agree on T2 -- you almost certainly need an index on M (unless T2 is a VERY SMALL table).

For T1, do you always, or almost always, look up values by columns A or B or is this just one isolated query?  If (almost) always, you should create an index; in fact, you should consider a clustered index.  If very rare to lookup that way, you  probably don't need an index.

Do you always check for the same values of A/B or does it vary?  If it's the same (for example, it's status columns of some kind), what % of total rows have that value?  If it's a small percentage (roughly 1-10%), the index will be used.  If it's a large percentage (roughly 10-15%+) it likely will not.

How often are A/B updated?  If it's often, an index will be more overhead because it will change often.

Next, when does this query run?  If it's in batch, during off-peak times, you may want to avoid creating an index even if it helps that one query because it will slighly slow down your prime-time inserts/updates.  If it's run as a critical process during the day, then obviously an index looks more appropriate.

Finally, as bhess1 mentioned, if you do create an index you should do a SHOWPLAN to make sure it's actually being used.
0
 

Expert Comment

by:kek1102
ID: 7160231
For your query I would have three indexes.

Two indexes to support the JOIN:  T1.D and T2.M

One index to support the WHERE:  T1.A, T1.B

I wouldn't try to outguess SQL as to which index(es) it will use to run the query.  Access decisions can be based on the number of rows in each table and the distribution of distinct values within columns (among other things, of course).  These can change over time and consequently so can the access path that SQL chooses.  
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 7160419
I see no reason for an index on T1.D, because I don't believe SQL would ever use such an index.

I wouldn't try to "outguess" SQL either.  I would simply see which method it decided to use.  I would also do tests to determine how long it takes using the index and not using the index (you can force this using appropriate hints; be sure to clear the data buffers (DBCC DROPCLEANBUFFERS) between each run so the results aren't skewed by data already residing in the buffer).
0
 

Expert Comment

by:kek1102
ID: 7160442
SQL can use the index on T1.D to find rows that are eligible to join to the other table.  
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 7160464
But SQL needs to know the value of columns A and B to determine the eligible rows.  Why waste time looking up all the Ds that match another table/table's index when it may only need a small part of them?  I won't believe SQL would do that until I see a SHOWPLAN that shows it actually doing it.  I can't picture that on routine-sized tables (perhaps if one table or the other were artifically small it might).
0
 

Expert Comment

by:kek1102
ID: 7160494
At some point SQL will have to match values to make the left outer join.  Why would it want to scan T1 table or a result set of T1 table for values in column D when it could simply match the two indexes on T1.D and T2.M

Perhaps you are right in wanting to actually SEE a plan that shows this taking place (perhaps on an artificially small table, etc).  As I said in my original reply, the characteristics of the data may change over a period of time and so may SQL's access decisions.  It's a tough call predicting what a data base may look like in a few months or a few years.  

For me, I would just as soon throw the index on and let SQL make the decision.
0
 
LVL 69

Accepted Solution

by:
Scott Pletcher earned 200 total points
ID: 7160549
Since these tables are related, I would expect most values in T2.M to be found somewhere in T1.D, so I wouldn't think matching the indexes first would reduce the number of rows to be considered much, if at all.  Thus, it would be much more overhead to read two indexes first and then read nearly every row from T1 using its index.  

It would make more sense, and follow SQL's normal approach, to satisfy the WHERE clause first.  The real question is whether or not SQL will use an index to match T1.A and T1.B.  If it does, and T1.D is a small column type (SMALLINT or perhaps INT), and T1 is a large table, and the query is important enough for an index, I would seriously consider making one index with T1.A, T1.B and T1.D -- then SQL could resolve the query on T1 entirely from one index and never have to go the main T1 row at all.
0
 

Author Comment

by:doshi_dipen
ID: 7166033
Guys, great discussion. This is what I wanted. Unfortunately, I can only give points to one person and I did test it with 2 indexes as well as 1 index of all 3 columns. Seems like the 1 index solution seems faster. But thanks, that was good.
0
 

Author Comment

by:doshi_dipen
ID: 7166036
Sorry, had to reject kek1102's answer to award points. :(
0

Featured Post

Back Up Your Microsoft Windows Server®

Back up all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.
Suggested Courses

800 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question