• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 625
  • Last Modified:

query huge table with WHERE on one column - index question

There is a table with 57M records. There is a date field and there are 50 distinct date values there. (so approx 1.13M/each date average)

there is a query that does a SELECT from this table with the only condition on the date field.
SELECT 6 columns from table where field = '2011-09-30'

will a non clustered key recommended on that field? will it help, since there are so many records and only so few dates on which the WHERE condition is based in the SELECT query?
0
25112
Asked:
25112
6 Solutions
 
sdstuberCommented:
What's the data distribution within the table?

Are your 50 values distributed more or less uniformly throughout the table?

That is, if you read your table block-by-block from disk will at least one of those 1.13 million rows be in each block or nearly so?

If so, then an index won't help, even if the optimizer tries to use one because you'll still be reading the whole table (or nearly so) anyway.  In this case the index will actually make things worse because you have to process the index in addition to reading the the whole table.
0
 
COANetworkCommented:
It will help, somewhat, since an index seek is, as a rule, much faster than a table scan.  You can run an estimated execution plan to see what MS suggests.  On the other hand, if your table gets a lot of inserts and/or updates, those operations will be slowed by the index.  Also, if you have so few distinct values, the index may be ignored if the optimizer finds it faster to scan than seek.
Indexed views would probably offer best performance, but in your case it may be cumbersome - creating 50 of them.
0
 
lcohanDatabase AnalystCommented:
You MUST be careful to match EXACTLY the data type in the table column to ALL the code variables for the index to be used properly - I.E. datetime <> smalldatetime.

I would INCLUDE in the index the KEY of that row as well.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
25112Author Commented:
>>What's the data distribution within the table?
 it is in order.. it is date value.. a season of data has one date.. and after a season all the record get the next date and so forth.
 
 there are only bulk inserts in this table. no updates or deletes.
0
 
LowfatspreadCommented:
your non clustered date index is probably not going to be used as data is likely to be spread evenly across the underlying table so a table scan will be deemed quickest...

you may wish to consider using filtered indexes however , or including the 6 columns you want on the index....

can you explain you scenario is more detail...

are your queries generally on the lastest set(s) of data
    setting up filtered indexes for the used sets my be appropriate

is it always the same 6 columns to be extracted...?
would it make sense to change the clustering key?

...
0
 
Scott PletcherSenior DBACommented:
There's virtually no chance SQL would use a nonclustered index on date to satisfy that query.

So, a nonclustered index will help only if it is a "covering index": so it would have to include all 6 selected columns and be keyed by the date column.

Depending on other queries to the table, you may need to cluster that table on the date column.  Technically a covering index will do fewer reads than a clustered index, but SQL must constantly maintain the extra covering index, and you must modify the index every time the SELECT query changes -- for example, if you add a 7th column to the SELECT, you must add it as another included column in the index.
0
 
lcohanDatabase AnalystCommented:
Aside all good advice from above in my opinion 57M (million rows to be specific) is not quite a huge table unless you have a "huge record" with lots of columns - character type (ntext is the worst) and/or datetime data type. If the table structure its not confidential - could you post that here as well?

FYI - I have quite a few tables in my SQL database(s) with over half a Billion rows not partitioned and everything works great therefor from my experience queries required effort,  execution time and pressure they put on your hardware depends from many different aspects.

SQL own Performance Dashboard reports could help you in general on that server not just with this query in particular - aside of the query execution plan.
0
 
25112Author Commented:
thanks.. the covering index seems to do much better.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now