Opions on Clustered Index Use

When I create my database tables, the clustered index is automatically created on the identity (Pkey) Column.

It has been suggested to me that I need to create a clustered index on a different set of columns, researching this, I read that it should always include the primary key of the table - that is a contraction to what I understood.  

Here's the situation.   I have report tables, REPORTS - REPORTPRODUCTS - REPORTPRODUCTYEARS, etc.    When a user is creating a new report, he gets a report_ID from the top table Reports.  Based on his reporting criteria, several child tables are populated.  If he changes his criteria, the child records are deleted an repopulated with the new data.  Then he can create PDF files and Excel files from the tabulated Child Tables.

It has been suggested that I should create CLUSTERED indexes on the child tables like this...

For the ReportProduct table (a child of Reports)...
   REPORT_ID (parent),  PRODUCT_ID

And for the ReportProductYears table (grandchild...)
   REPORT_ID (grandparent),   REPORTPRODUCT_ID (parent),  YEAR

My question - seems like a good strategy.   Any concerns doing this?   This would mean the primary key of the child and grandchild tables would NOT have a clustered index on them.   I would then create just a unique index on these primary key fields, yes??

Advice...?   Thank you!
LVL 39
gdemariaAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

MlandaTCommented:
I think it's important to first understand the meaning of these keys/indexes.

Primary key: an identifier (column or set of columns) that uniquely identifies each row in your table. Many times, we just use the identity column for this, but it doesn't have to be. Often times, there is perhaps another set of fields that form,  "the real" primary key, but it's just convenient to use an identity column.

Clustered index: an index that affects how data is stored in the table. Unlike non-clustered indexes, which are stored outside the table and use a pointer to link to the record....the Clustered index actually affects how the data in the table gets written to and organised on the  disk, no pointers. Therefore only one Clustered index per table.


With this background, we can state:

Primary is not always necessarily the best for your Clustered index... Especially in cases where you are using an identity column. The identity does not necessarily need to be part of your clustered index.

Clustered index is very important for reporting, and when correctly done, improves performance significantly because typically, the same (or a contiguous) block of pages needs to be accessed. Just think of a novel... The clustered index is the arrangement of the chapters in the book. The default sequence. Imagine they were all scattered, and you always had to refer to the topic index to figure out where to read next.

Your best indexing strategy is dependent on understanding your queries. What are your filters? Are you accessing everything just based on the ReportID? Should you create a Clustered index based on both...  report and product? In terms of how you are reading the data, what physical organisation of the data would work best for your reports? (imagine the report writer reading data from disk into your reports) If most of your access is just on ReportID, then you can perhaps just put primary key and clustered index on it. But you've just got to think of how that data should be organized in the table and what will make things fast. Depends on your reports and it might be more than just the Reportid and the product ID.

Index tuning is tricky without data. You might find that a set of indexes on the same data works well for a given set of reports but does not work so well for a different set of reports. Had a lot to do with the specifics of the reports on question

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
HuaMin ChenProblem resolverCommented:
The clustered index definitely is help to quicken the mapping of both master and detail tables, based on their common key, which is FK of master table and PK of detail table.

So on master table, you still can have one clustered index on this FK column.
gdemariaAuthor Commented:
@MlandaT, thank for that great information!

Your best indexing strategy is dependent on understanding your queries. What are your filters? Are you accessing everything just based on the ReportID? Should you create a Clustered index based on both...  report and product? In terms of how you are reading the data, what physical organisation of the data would work best for your reports?

I have lots of large tables that would only be access based on their parent ID (foreign key) and a date or other attribute.  For example,  Product Pricing - we keep a history.  So the price of a product is a large table ProductPrices with its own primary key productPriceID, but I would always fetch the price based on Product ID and Effective Date.   I think you are saying these two fields should be my clustered index.  

The same is for my reports.   All the child and grandchildren table would ONLY be touched if they are part of the REPORT_ID.   One user running a report NEVER touches a record of another ReportID, so it seems that the REPORT_ID should be the first element of my clustered index, then add other fields that may help narrow the search.  These other fields may vary though based on the criteria of the search (show all products by date, all products by category, by location).    What would I do for a clustered key here?   REPORT_ID seems obvious, but what next in line?

Implementation - can I just drop my cluster index and create a new one and SQL Server will reorganize everything?  Is it that easy to change?

Thank for the help!!
SolarWinds® VoIP and Network Quality Manager(VNQM)

WAN and VoIP monitoring tools that can help with troubleshooting via an intuitive web interface. Review quality of service data, including jitter, latency, packet loss, and MOS. Troubleshoot call performance and correlate call issues with WAN performance for Cisco and Avaya calls

gdemariaAuthor Commented:
@HuaMinChen  - thank you for your post!

The clustered index definitely is help to quicken the mapping of both master and detail tables, based on their common key, which is FK of master table and PK of detail table.

So on master table, you still can have one clustered index on this FK column.

I think I understand what you're saying, but I think you reversed FK and PK ?    So the master table can still have on clustered index on the Master Primary Key (Not the FK).   And on the Child table the clustered index could be the FKey to the Master plus the PKey of the child.  If that is true, then I understand, thank you.
HuaMin ChenProblem resolverCommented:
Yes, even for Child table, you can have one other index, specifically for the PK only!
gdemariaAuthor Commented:
@MlandaT, any follow up from you?   See previous post.

Also question - can I just drop my cluster index and create a new one and SQL Server will reorganize everything?  Is it that easy to change?
HuaMin ChenProblem resolverCommented:
Yes, you can re-create the clustered index.
Scott PletcherSenior DBACommented:
Yes, you could just drop the existing clustered index and create a new one.

However, you may not need the nonclustered indexes you have once the clustered index changes.  Therefore, I'd recommend instead that you:

1) drop all nonclus indexes
2) create the new clus index
3) wait a while, then look at missing index and index usage stats and see if (A) the new clus index is effective and (B) a nonclus index(es) are needed.
MlandaTCommented:
@gdemaria... I think the comments given by the other experts give enough additional insights on top of my initial comments.

show all products by date, all products by category, by location
Columns that you FILTER on might need indexes. ReportID is probable a good candidate for inclusion in a clustered index. The other benefit you will get from a clustered index is SORT ORDER. You can include a few additional columns with the default sort order you will commonly want to use for your reports. (Of course your users will filter on other columns and sort in different ways,,. but there might be some natural sorting order... like date for example. That said, you don't want to make the clustered index too wide (i.e. too many columns).

Implementation - can I just drop my cluster index and create a new one and SQL Server will reorganize everything?  Is it that easy to change?
Yes. Fire at will :) Although, for large tables, building the clustered index might take time on large tables. So plan ahead, and avoid doing it all the time.

Database Maintenance
As you add more data to your system, also make sure to have a database maintenance plan which regularly rebuilds fragmented database indexes (https://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html)

Additional reading
Really nice discussion on the subject: https://www.simple-talk.com/sql/learn-sql-server/effective-clustered-indexes/
Also (unrelated but we sometimes fall into the trap of then become too focussed on indexes, and forget other considerations and lose balance): https://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/

Warehousing
It is also important to note that sometimes we worry too much about real time reporting when in fact, the business can operate just fine with yesterday's data. It's still OK to run a batch job in the middle of the night to pre-calculate some large tables (which you need for reporting) and rebuild indexes on it, then have the users access the precalculated stuff in the morning. Depends on your requirements of course, but setting realistic reporting targets and requirements also helps the business to still operate efficiently without creating unrealistic expectations :) Your indexes will help a lot, but sometimes, managing the user's expectations will ease the pressure :)

All the best with your reports :)
Scott PletcherSenior DBACommented:
but I would always fetch the price based on Product ID and Effective Date.   I think you are saying these two fields should be my clustered index.  

Absolutely, 100%.  That is the perfect clustering key for that table.  That should also be the primary key.  Contrary to myth, every table does NOT "need" an identity column.  In particular, what in data modeling we call "intersection tables" -- such as products and prices/dates -- often can just use the clustering key as the primary key also.


All the child and grandchildren table would ONLY be touched if they are part of the REPORT_ID.   One user running a report NEVER touches a record of another ReportID, so it seems that the REPORT_ID should be the first element of my clustered index, then add other fields that may help narrow the search.  These other fields may vary though based on the criteria of the search (show all products by date, all products by category, by location).    What would I do for a clustered key here?   REPORT_ID seems obvious, but what next in line?

Again, 100% right, REPORT_ID first.  The first key is absolutely critical, later keys not as much, but they can still help, so whatever column you commonly use an equal comparison on that most limits the rows.
gdemariaAuthor Commented:
Thank you all - probably the best answers I have ever gotten on a question.   Wish I had more points to share!
MlandaTCommented:
Glad to have helped.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.