Solved

Help with index formulation

Posted on 2002-07-17
12
153 Views
Last Modified: 2008-02-26
Hi, I have a query which goes something like,

select T1.A,T1.B,T1.C,T1.D,T1.E,T2.A
from T1 Left Outer Join T2 On T1.D = T2.M
where T1.A = 'zz' and T1.B = "yy"

Should I have a index on T1.A,T1.B and a seperate index on T1.D or should I combine them into one??
0
Comment
Question by:doshi_dipen
  • 4
  • 3
  • 3
  • +2
12 Comments
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
Comment Utility
At first glance i would say to keep the indexes as they are. You might check with the query analysis tools if another index is recommended...

CHeers
0
 

Author Comment

by:doshi_dipen
Comment Utility
Will the sql server do the where condition first or will it left join first and than filter the result set based on the where clause??
0
 
LVL 32

Expert Comment

by:bhess1
Comment Utility
You can see what SQL Server will do by running the statement in Query Analyzer, and selecting Query/Show Execution Plan.  This will show you exactly how SQL Server runs the query.

You can also use Query/Display Estimated Execution Plan to get an idea of how SQL Server will run the query.
Based on your query, you only need an index on T1 on A and B, and an index on T2 on M
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
Whether or not you need an index requires a deeper look I think.

I agree on T2 -- you almost certainly need an index on M (unless T2 is a VERY SMALL table).

For T1, do you always, or almost always, look up values by columns A or B or is this just one isolated query?  If (almost) always, you should create an index; in fact, you should consider a clustered index.  If very rare to lookup that way, you  probably don't need an index.

Do you always check for the same values of A/B or does it vary?  If it's the same (for example, it's status columns of some kind), what % of total rows have that value?  If it's a small percentage (roughly 1-10%), the index will be used.  If it's a large percentage (roughly 10-15%+) it likely will not.

How often are A/B updated?  If it's often, an index will be more overhead because it will change often.

Next, when does this query run?  If it's in batch, during off-peak times, you may want to avoid creating an index even if it helps that one query because it will slighly slow down your prime-time inserts/updates.  If it's run as a critical process during the day, then obviously an index looks more appropriate.

Finally, as bhess1 mentioned, if you do create an index you should do a SHOWPLAN to make sure it's actually being used.
0
 

Expert Comment

by:kek1102
Comment Utility
For your query I would have three indexes.

Two indexes to support the JOIN:  T1.D and T2.M

One index to support the WHERE:  T1.A, T1.B

I wouldn't try to outguess SQL as to which index(es) it will use to run the query.  Access decisions can be based on the number of rows in each table and the distribution of distinct values within columns (among other things, of course).  These can change over time and consequently so can the access path that SQL chooses.  
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
I see no reason for an index on T1.D, because I don't believe SQL would ever use such an index.

I wouldn't try to "outguess" SQL either.  I would simply see which method it decided to use.  I would also do tests to determine how long it takes using the index and not using the index (you can force this using appropriate hints; be sure to clear the data buffers (DBCC DROPCLEANBUFFERS) between each run so the results aren't skewed by data already residing in the buffer).
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Expert Comment

by:kek1102
Comment Utility
SQL can use the index on T1.D to find rows that are eligible to join to the other table.  
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
But SQL needs to know the value of columns A and B to determine the eligible rows.  Why waste time looking up all the Ds that match another table/table's index when it may only need a small part of them?  I won't believe SQL would do that until I see a SHOWPLAN that shows it actually doing it.  I can't picture that on routine-sized tables (perhaps if one table or the other were artifically small it might).
0
 

Expert Comment

by:kek1102
Comment Utility
At some point SQL will have to match values to make the left outer join.  Why would it want to scan T1 table or a result set of T1 table for values in column D when it could simply match the two indexes on T1.D and T2.M

Perhaps you are right in wanting to actually SEE a plan that shows this taking place (perhaps on an artificially small table, etc).  As I said in my original reply, the characteristics of the data may change over a period of time and so may SQL's access decisions.  It's a tough call predicting what a data base may look like in a few months or a few years.  

For me, I would just as soon throw the index on and let SQL make the decision.
0
 
LVL 69

Accepted Solution

by:
ScottPletcher earned 50 total points
Comment Utility
Since these tables are related, I would expect most values in T2.M to be found somewhere in T1.D, so I wouldn't think matching the indexes first would reduce the number of rows to be considered much, if at all.  Thus, it would be much more overhead to read two indexes first and then read nearly every row from T1 using its index.  

It would make more sense, and follow SQL's normal approach, to satisfy the WHERE clause first.  The real question is whether or not SQL will use an index to match T1.A and T1.B.  If it does, and T1.D is a small column type (SMALLINT or perhaps INT), and T1 is a large table, and the query is important enough for an index, I would seriously consider making one index with T1.A, T1.B and T1.D -- then SQL could resolve the query on T1 entirely from one index and never have to go the main T1 row at all.
0
 

Author Comment

by:doshi_dipen
Comment Utility
Guys, great discussion. This is what I wanted. Unfortunately, I can only give points to one person and I did test it with 2 indexes as well as 1 index of all 3 columns. Seems like the 1 index solution seems faster. But thanks, that was good.
0
 

Author Comment

by:doshi_dipen
Comment Utility
Sorry, had to reject kek1102's answer to award points. :(
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Viewers will learn how the fundamental information of how to create a table.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now