asked on

Best Practices for Cluster Index

I am creating a clustered index strategy for my reporting tables.

Reports (report_id)

ReportProducts
Pk reportProduct_id
Fk report_id

ReportProductOptions
Pk reportProductOption_id
Fk reportProduct_id
fk report_id
optionCode
optionName
lastSold_Date

I am going to create clustered index so that the report_id is always first in the cluster. I hope this will prevent the constant deadlocks I am getting.

My question is... if I am going to access my ReportProductOptions table directly to the primary key, do I need to use the report_id in the where clause?

update reportProductOptions
set optionName = 'Hello'
where report_id = 123
and reportProductOption_id = 34567

...or....

update reportProductOptions
set optionName = 'Hello'
where reportProductOption_id = 34567

I suspect my clustered index would NOT hold the table's primary key

reportProductOptions Cluster Index: report_id, product_id, optionCode, optionName

Regular (non-cluster) Index on the primary key: reportProductOptions reportProductOption_id

Advise please? Is this the right approach?

Peter Chan

Hi,
Yes, we usually refer to column (having the index) when referring or linking the tables. Index is for big table having lots of records inside.

ste5an

As far as I understand you: You will have a clustered index and a different primary key?

Then the answer is yes, you need the report ID in the WHERE conditions. otherwise the clustered index may not be used and your execution plan may use a primary key scan/seek and lookups.

But you need to review your model. Cause such an strategy sounds like you have in reality a combined primary key. But even when you don't have it, then you need a two column foreign key relationship between ReportProducts and ReportProductOptions.

I would reconsider the entire approach. When it's about reporting, then chose a dimensional model instead of a normalized relational model. Cause reports are calculated once, there is normally no need to update them like in your sample code snippets.

btw, can you explain your use-case of reporting? Cause

gdemaria

ASKER

Hi ste5an - thanks for your response.

Our reports are very complex, they are not simply pulling data, they are running lots of calculations with drill downs to products, options and annual values. At each level many values are calculated on a running basis (conditionally compounded every year). I simplified the model in my example, there is an annual table ReportProductOptionYears and this table has about 5,000 record created for every report_id.

Even though the tables are cleared out in the middle of the night, they can quickly reach over 1 million records. The problem is deadlocks. Every user is assigned a report_id and his transactions should never touch any other report_id, so how can there be deadlocks? The advise of others have given is to cluster index starting with the report_id in every table. That will hopefully reduce deadlocks of records the user will never touch. I am trying to figure out the details of that. For the deep child tables, do I cluster around the primary key?

The table ReportProductOptionYears is 4 levels deep from the reporting table. Do I setup a cluster like this:

Report_id, ReportProductOption_Id, YearNumber
and then I leave the primary key as it's own index (non-clustered)

Or do I just include the Primary Key as part of the clustered index?

Report_id, ReportProductOptionYear_ID, YearNumber

In order to update a value on the ReportProductOptionYears table, is it better to ...

Can I go directly to the primary key without the report_id... if the primary key is not in the cluster but its own index?

update ReportProductOptionYears
set costFactor = xxxx
where ReportProductOptionYear_ID = 111111

Or this....

update ReportProductOptionYears
set costFactor = xxxx
where report_id = 1111111
and ReportProductOptionYear_ID = 111111

How far do I go? Do I include every fkey in the cluster and then add to the where clause?

update ReportProductOptionYears
set costFactor = xxxx
where report_id = 1111111
and ReportProductOption_ID = 111111
and ReportProductOptionYear_ID = 111111

Peter Chan

Hi，
If report_id is first indexed column of one index, you can do it like

update ReportProductOptionYears
    set costFactor = xxxx
where report_id = 1111111
and     ReportProductOption_ID = 111111
and     ReportProductOptionYear_ID = 111111

Open in new window

Scott Pletcher

There's nothing wrong with multiple-key clustering indexes, especially for child tables (as these are). Such combined keys produce more efficient execution plans across all queries. That should indeed eliminate (or at least drastically reduce) the deadlocks you're currently seeing.

Here's my suggested indexing (Cl=clustered index):

ReportProducts
reportProduct_id
Fk report_id
Cl (report_id, reportProduct_id)

ReportProductOptions
reportProductOption_id
reportProduct_id
Fk report_id
optionCode
optionName
lastSold_Date
Fk (report_id, reportProduct_id)
Cl (report_id, reportProduct_id, reportProductOption_id) --yes, 3 levels of keys. If that's what the data structure needs,
then that's what you should use, not artificially limit it to one column to follow some generic "rule".
Of course make sure you've done proper diligence on the initial design and selection of keys to make sure they will be stable later. Don't over-rush the design process.

ste5an

For the deadlocks:
When deadlocks happen you need to look at the reason. Is it an INSERT or UPDATE?

In the first case you need to play with the fill factor. Using a value between 0.6 and 0.9 may help. In the second case you need to look into supporting indices. Your sample UPDATE

UPDATE reportProductOptions
SET    optionName = 'Hello'
WHERE  report_id = 123
       AND reportProductOption_id = 34567;

Open in new window

requires an covering index over ReportProductOptions ( Report_ID, ReportProductOption, OptionName ) and change the query to

UPDATE reportProductOptions
SET    optionName = 'Hello'
WHERE  report_id = 123
       AND reportProductOption_id = 34567
       AND optionName != 'Hello'
       OR optionName IS NULL;

Open in new window

ste5an

For the query predicates:

Always use the clustered index columns in your WHERE predicates. Cause (Report_ID, ReportProductOption_ID) is for sure in reality a not strict monotone function. Thus using the first column of the clustered key should enable a index/table seek, where as without you should end in index scan otherwise.

This question needs an answer!

Become an EE member today

7 DAY FREE TRIAL

Members can start a 7-Day Free trial then enjoy unlimited access to the platform.

View membership options

Learn why we charge membership fees

We get it - no one likes a content blocker. Take one extra minute and find out why we block content.