Link to home
Start Free TrialLog in
Avatar of doshi_dipen
doshi_dipen

asked on

Help with index formulation

Hi, I have a query which goes something like,

select T1.A,T1.B,T1.C,T1.D,T1.E,T2.A
from T1 Left Outer Join T2 On T1.D = T2.M
where T1.A = 'zz' and T1.B = "yy"

Should I have a index on T1.A,T1.B and a seperate index on T1.D or should I combine them into one??
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

At first glance i would say to keep the indexes as they are. You might check with the query analysis tools if another index is recommended...

CHeers
Avatar of doshi_dipen
doshi_dipen

ASKER

Will the sql server do the where condition first or will it left join first and than filter the result set based on the where clause??
You can see what SQL Server will do by running the statement in Query Analyzer, and selecting Query/Show Execution Plan.  This will show you exactly how SQL Server runs the query.

You can also use Query/Display Estimated Execution Plan to get an idea of how SQL Server will run the query.
Based on your query, you only need an index on T1 on A and B, and an index on T2 on M
Whether or not you need an index requires a deeper look I think.

I agree on T2 -- you almost certainly need an index on M (unless T2 is a VERY SMALL table).

For T1, do you always, or almost always, look up values by columns A or B or is this just one isolated query?  If (almost) always, you should create an index; in fact, you should consider a clustered index.  If very rare to lookup that way, you  probably don't need an index.

Do you always check for the same values of A/B or does it vary?  If it's the same (for example, it's status columns of some kind), what % of total rows have that value?  If it's a small percentage (roughly 1-10%), the index will be used.  If it's a large percentage (roughly 10-15%+) it likely will not.

How often are A/B updated?  If it's often, an index will be more overhead because it will change often.

Next, when does this query run?  If it's in batch, during off-peak times, you may want to avoid creating an index even if it helps that one query because it will slighly slow down your prime-time inserts/updates.  If it's run as a critical process during the day, then obviously an index looks more appropriate.

Finally, as bhess1 mentioned, if you do create an index you should do a SHOWPLAN to make sure it's actually being used.
For your query I would have three indexes.

Two indexes to support the JOIN:  T1.D and T2.M

One index to support the WHERE:  T1.A, T1.B

I wouldn't try to outguess SQL as to which index(es) it will use to run the query.  Access decisions can be based on the number of rows in each table and the distribution of distinct values within columns (among other things, of course).  These can change over time and consequently so can the access path that SQL chooses.  
I see no reason for an index on T1.D, because I don't believe SQL would ever use such an index.

I wouldn't try to "outguess" SQL either.  I would simply see which method it decided to use.  I would also do tests to determine how long it takes using the index and not using the index (you can force this using appropriate hints; be sure to clear the data buffers (DBCC DROPCLEANBUFFERS) between each run so the results aren't skewed by data already residing in the buffer).
SQL can use the index on T1.D to find rows that are eligible to join to the other table.  
But SQL needs to know the value of columns A and B to determine the eligible rows.  Why waste time looking up all the Ds that match another table/table's index when it may only need a small part of them?  I won't believe SQL would do that until I see a SHOWPLAN that shows it actually doing it.  I can't picture that on routine-sized tables (perhaps if one table or the other were artifically small it might).
At some point SQL will have to match values to make the left outer join.  Why would it want to scan T1 table or a result set of T1 table for values in column D when it could simply match the two indexes on T1.D and T2.M

Perhaps you are right in wanting to actually SEE a plan that shows this taking place (perhaps on an artificially small table, etc).  As I said in my original reply, the characteristics of the data may change over a period of time and so may SQL's access decisions.  It's a tough call predicting what a data base may look like in a few months or a few years.  

For me, I would just as soon throw the index on and let SQL make the decision.
ASKER CERTIFIED SOLUTION
Avatar of Scott Pletcher
Scott Pletcher
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Guys, great discussion. This is what I wanted. Unfortunately, I can only give points to one person and I did test it with 2 indexes as well as 1 index of all 3 columns. Seems like the 1 index solution seems faster. But thanks, that was good.
Sorry, had to reject kek1102's answer to award points. :(