Link to home
Start Free TrialLog in
Avatar of xersoft
xersoft

asked on

When to use a compound index and how to determine order of fields

I have  fact table defined in part like this
[id] [int] IDENTITY(1,1) NOT NULL,
[Date] [datetime] NOT NULL, --default GetDate()
[UserID] [int] NOT NULL,

I query this table a lot as you can imagine. In the vast majority of cases I query with a date range and a user id. In most cases I then filter based on other fields in the table but a constant is filtering on date range and the user id.

My question is what would be the best index to place on this table and why?
I could put a single index on both the userID and Date separately.
I could put a compound index on UserID, Date or on Date, UserID

My thought is that Date, UserID would be best.

My reasoning is that since items will be added with the current date in most cases the index shouldn't need to reorder itself as much. If the compound index was on UserID, Date then every time a row was added it would need to insert into the proper users section of the index and then in the proper date range.

Am I thinking about this correctly? Please provide details regarding any advice offered.
Thanks!
Avatar of Nightman
Nightman
Flag of Australia image

I would make id your clustered index (you may even want this to be the PK), and have a non-clustered covering index on Date,UserID. You will only get page splits (the inserts you are talking about) if the data is clustered on the other two (UserID, because it is never going to be sequential, should not be part of the clustered index).
Avatar of xersoft
xersoft

ASKER

oops forgot to mention I did make the ID clustered and it is the primary key.

So Date, UserID is better then UserID, Date?

Do indexes go through these page splits? And would there be more page splits with UserID, Date instead of Date, UserID?

Thanks for the help Nightman
Only the clustered index will cause page splits (as the clustered data essentially represented the page).

As for Date, UserID - it doesn't really make much of a difference either way.
>> Only the clustered index will cause page splits
A data page can split because the clustered index forces a new row to be added or if a row is updated and the page is now too small. Index pages can also be split.

If you are querying on date, userid then that is probably the best clustered index. Note that the clustered index is included in all other indexes so these columns will also be included in the id index making that larger (and slower to maintain) but I suspect the gain in accessing the page will outweigh that.
I would put date first as you probably will want date range queries too.
>>Index pages can also be split
True, but this is not going to cause a pages split on the clustered index because of data coming in out of sequence, which I think was the concern.

As I said, if the data is not entered in sequence the date should not be part of the cluster. If the data is always in date sequence then clustering on the date would make sense. If the insert volume is low, this will improve your reporting performance.

However, clustering on date, userid for a high volume OLTP application will force page splits on a regular basis, incurring high IO costs and resulting in high index fragmentation. In this case, I would select the id as the clustered index (creating a hotspot for SQL as the last page is always in memory) and a non-clustered covering index on date, userid to improve performance for reporting.

The key is really what your requirement is - high volume inserts, or primarily static data for reporting.
SOLUTION
Avatar of nigelrivett
nigelrivett

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
True - if there are many other columns in the select it would not be covering. It really depends on the use of the table.
Just jumping in here.... does anyone have any thoughts on the order of the columns if the date column is highly selective and the userid is not.

I seem to recall reading somewhere that it is better to put the selective column first and the non-selective column last.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Avatar of Scott Pletcher
Scott Pletcher
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial