Solved

Surrogate primary key index VS. index on composite key(s)

Posted on 2006-10-27
9
638 Views
Last Modified: 2008-02-01
First off, I want to thank all of you for the help (particularly aneeshattingal & Scott Pletcher, among others) -- you have all been incredibly helpful in my foray into index optimization.

This answer seems obvious to me...but of course in IT there's rarely *one foolproof answer*, so I'm looking for expert input:

BACKGROUND: I'm working with a table which has a composite primary key (varchar2, varchar3, int, int, int, int, int) and there's a nonclustered index on those columns.   There are 3660 rows in the table.

QUESTION: While I know there are valid reasons for creating the table as such, it seems I should delete the nonclus index & put a nice, simple surrogate primary key int in there, and allow it to cluster.  Then grab all procs which reference [the previous composite primary key] and reference the surrogate instead.

That is, there's too much overhead in the nonclus index... and I should remove it.

ADDITIONAL CONSIDERATION: How would you answer the same question if the rows grow to 100,000?  

Thanks!
0
Comment
Question by:nespa
  • 4
  • 2
  • 2
  • +1
9 Comments
 
LVL 8

Assisted Solution

by:drydenhogg
drydenhogg earned 100 total points
Comment Utility
Im not sure what 'too much overhead' for the nonclustered index you are finding, can we assume it is causing RID Lookup / Bookmark lookups within your query plans? Whether 3k or 100k records, it is still a small table.

If your current primary key is unique and you want uniqueness to be enforced, your going to have to add a constraint to do it.

0
 
LVL 10

Accepted Solution

by:
AaronAbend earned 300 total points
Comment Utility
Well, why not cluster on that key?  It may cost a bit on the insert side I suppose.

But when it comes to performance, the only thing I am sure of is the quote attributed to Turing: "Premature optimization is the root of all evil".

So what is your problem? Overhead is not a problem in and of itself - unless you can measure its impact on performance and report that measurement in minutes or seconds. Short of that, I have been working with this SQL engine for 20 years and it still does things I cannot explain - and do not care to. We cannot change the optimizer's behavior, only watch it.  And if it is running OK, do not change things.

Now this previous analysis is unrelated to another problem you may be thinking about: complexity. A primary key with 7 elements - if I am reading you right - is definitely a candidate for a surrogate, but that is not because of the primary key itself, but when you need to reference it as a foreign key, because indexing the foreign key on that many columns in a lot of tables will likely cause a lot of insert and update performance problems, not to mention the coding complexity.

the number of rows is tiny by any standard - I do not think you will see any issues until you get to 500,000 rows - and probably not even then.
0
 
LVL 69

Assisted Solution

by:ScottPletcher
ScottPletcher earned 100 total points
Comment Utility
>> varchar2, varchar3 <<

Does that mean the lengths are 2 and 3?  If so, make them CHAR, not VARCHAR.  Every VARCHAR takes two extra bytes, which is insane for a column that's only 2-3 bytes anyway.

That is a lot of columns, but if they are *natural* keys -- that is, people can naturally think of them and use them -- it might be better from a useability standpoint than assigning a completely random #.  Also, the total length (25) is still not unreasonable, especially for so small a table.
0
 
LVL 1

Author Comment

by:nespa
Comment Utility
first off, apologies for putting varchar2, varchar3 - that was plain stupid as i didn't mean to indicate varchar2 but rather varchar(2).

secondly, i realize i should have added another important point - this DB is having SERIOUS performance issues and i'm looking into everything imaginable to help out, including the indexes (although the biggest performance gain is decreasing the # of JOINs -- in some procs i'm seeing 10-15 JOINs!).  so, given the fact that this table is one out of 15 being JOINed, it seems like this table would be a good candidate for a surrogate key, even though there are only 3K records.

also, these are the only columns in the table, and the pattern is like:
 CountryID, ProvinceID, CityID, DeptID, plus a few extra columns, if that helps - no extra cols other than the full composite key.   reminds me of a facts table in OLAP, except that these aren't all foreign keys to other tables (but 4 are).

drydenhogg - yep, the composite key *is* unique, so uniqueness is enforced; another reason i was considering the surrogate instead.

AaronAbend - yep, i agree with the coding complexity introduced by the composite key - yet another for surrogate.

scott - i would think that 1 col. (surrogate ID col) would be easier than remembering the combination of 7 keys.
and also, 1 out of 15 joins -- wouldn't it be better to join on one int (4) than on 25 bytes?   That was my original assumption anyway.

The joins are the main problem (as I can see - from trying everything else) -- there are simply too many procs to deal with up-front so i'm tackling this piece-by-piece (and taking highest-priority procs first).

Given the above - does it still make sense to use a surrogate id col?
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 10

Expert Comment

by:AaronAbend
Comment Utility
Are performance problems on inserts, updates, deletes or queries?


0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
>> i would think that 1 col. (surrogate ID col) would be easier than remembering the combination of 7 keys. <<

Depends on the specific values.  Would it be easier to remember your address, phone number and initials or some new random 8-10 digit number?


>> and also, 1 out of 15 joins -- wouldn't it be better to join on one int (4) than on 25 bytes?   That was my original assumption anyway. <<

Of course, yes.  It would be better still to use a smallint if possible (only two bytes), including negative values if necessary.
0
 
LVL 1

Author Comment

by:nespa
Comment Utility
Aaron - great point; more info:   *reads* are most important here.   a few inserts & updates here & there, but reading/aggregating data seems to be the most important action...so much so i've considered denormalizing into star-schema/OLAP-type structures (for read performance)

scott - yes, also good point about the info...but this DB has an app front end, so the users are abstracted from all that stuff.     smallint is a good idea, too - but the table holds user info so there could be more than 32k users at some point.
0
 
LVL 1

Author Comment

by:nespa
Comment Utility
i guess i should have rephrased my question --

when initially creating tables, what are the benefits of using a surrogate key vs. a composite key with many columns?  

to use your answers in response, it seems if the column size is kept small (25 bytes) and won't hold many records (10K or fewer), and the data is logically arranged in a way that you'd consider the keys normal keys, then use composite.

but overall it seems a unique surrogate id is the easier choice, and as the table grows the key grows with it.


to close this question out, my *main* concern overall is that the creation of the index pages -- are all non-clus. indexed columns stored in the index pages themselves?   or is a composite "id" or key stored instead of all of those values?  

so... it seems the clustered, unique surrogate should win out, since that's the only info stored in the index pages.
0
 
LVL 1

Author Comment

by:nespa
Comment Utility
I wound up *leaving* the key as-is (instead focusing on all the JOINs in the stored procs).   The discussion about overhead helped; you're right - not *that* much overhead & considering table growth will probably not be much (especially not nearly as much as 500,000 rows).

Thanks for the discussion / info - have a great weekend.
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Introduced in Microsoft SQL Server 2005, the Copy Database Wizard (http://msdn.microsoft.com/en-us/library/ms188664.aspx) is useful in copying databases and associated objects between SQL instances; therefore, it is a good migration and upgrade tool…
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now