Link to home
Start Free TrialLog in
Avatar of curiouswebster
curiouswebsterFlag for United States of America

asked on

Any benefit to adding a Clustered index here?

On my Exceptions table, with 15 million records, I found that the Clustered index is the default, on the primary key Log_ID. I was considering making a Clustered index on the "Timestamp" column (the row creation datetime) to speed queries, since all of my queries would filter by only records in the past one year. Be but I found an index already existed.

Attached is the image of the Non-Clustered index containing Timestamp, so I would like to understand if I would gain or lose by:

Removing both the:

PK_Log (Clustered) Index
and
log_selectindex (Non-Unique, Non-Clustered) Index

and adding a new Clustered Index on Timestamp

It seems that there might be a slight performance improvement when querying on Timestamp only because the existing log_selectindex needs to do a secondary lookup when the return row (or rows) require a secondary lookup, versus the Clustered Index on Timestamp where the data rows would already be stored in the sequence required.

BUT, by removing the default PK_Log (Clustered) Index, confusing results would be returned when someone does a simple query like:

select TOP 100 * from Log

since there is no guarantee the Log_ID would be sequential and if the user then included "order by Log_ID DESC" for example, this would slow the query way down.

Do I have it about right?

Thanks.
ASKER CERTIFIED SOLUTION
Avatar of Jim Horn
Jim Horn
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of curiouswebster

ASKER

thanks
Just comments... There are possible tradeoffs everywhere. When a table contains an int or bigint PK with no inherent meaning (a good thing), I will often look at the queries issued against the it, and if there is something out there that executes frequently, performs some sort of lookup against non-PK keys, and selects most or all of the columns, I will use that predicate to define the clustered index. (Sometimes other keys can be added to match additional queries; sometimes there is a tradeoff on key column order between column selectivity and needing to add more indexes.)

In those situations (with a non-clustered PK) the PK is often used only for validation, such as foreign key and EXISTS checks, or is used infrequently as a lookup key such that the performance hits from bookmark lookups isn't significant. If the PK is also used more heavily for lookups, so that the performance matters, I will define a separate non-clustered covering index for that.

In the worst cases, there are multiple sets of keys used for frequent wide lookups, and the non-clustered index storage can end up far exceeding the size of the clustered index. If the table contains billions of rows, that can be quite a mess. I try to see it coming and design to avoid it, but if it has already happened it can be hard to fix.
Another thought....

It appears that the LOG_ID and TIMESTAMP column are both increasing value columns so that a clustered index on either column would result in the same order of the data.

If so, consider putting the clustered index on the column that is more often used in queries against the table.  A clustered index of INT or TIMESTAMP values would be less than 1/10 the size of a normal index.  Index caching would require less memory giving you better performance.