Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


Surrogate primary key index VS. index on composite key(s)

Posted on 2006-10-27
Medium Priority
Last Modified: 2008-02-01
First off, I want to thank all of you for the help (particularly aneeshattingal & Scott Pletcher, among others) -- you have all been incredibly helpful in my foray into index optimization.

This answer seems obvious to me...but of course in IT there's rarely *one foolproof answer*, so I'm looking for expert input:

BACKGROUND: I'm working with a table which has a composite primary key (varchar2, varchar3, int, int, int, int, int) and there's a nonclustered index on those columns.   There are 3660 rows in the table.

QUESTION: While I know there are valid reasons for creating the table as such, it seems I should delete the nonclus index & put a nice, simple surrogate primary key int in there, and allow it to cluster.  Then grab all procs which reference [the previous composite primary key] and reference the surrogate instead.

That is, there's too much overhead in the nonclus index... and I should remove it.

ADDITIONAL CONSIDERATION: How would you answer the same question if the rows grow to 100,000?  

Question by:nespa
  • 4
  • 2
  • 2
  • +1

Assisted Solution

drydenhogg earned 400 total points
ID: 17819567
Im not sure what 'too much overhead' for the nonclustered index you are finding, can we assume it is causing RID Lookup / Bookmark lookups within your query plans? Whether 3k or 100k records, it is still a small table.

If your current primary key is unique and you want uniqueness to be enforced, your going to have to add a constraint to do it.

LVL 10

Accepted Solution

AaronAbend earned 1200 total points
ID: 17819673
Well, why not cluster on that key?  It may cost a bit on the insert side I suppose.

But when it comes to performance, the only thing I am sure of is the quote attributed to Turing: "Premature optimization is the root of all evil".

So what is your problem? Overhead is not a problem in and of itself - unless you can measure its impact on performance and report that measurement in minutes or seconds. Short of that, I have been working with this SQL engine for 20 years and it still does things I cannot explain - and do not care to. We cannot change the optimizer's behavior, only watch it.  And if it is running OK, do not change things.

Now this previous analysis is unrelated to another problem you may be thinking about: complexity. A primary key with 7 elements - if I am reading you right - is definitely a candidate for a surrogate, but that is not because of the primary key itself, but when you need to reference it as a foreign key, because indexing the foreign key on that many columns in a lot of tables will likely cause a lot of insert and update performance problems, not to mention the coding complexity.

the number of rows is tiny by any standard - I do not think you will see any issues until you get to 500,000 rows - and probably not even then.
LVL 70

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 400 total points
ID: 17819923
>> varchar2, varchar3 <<

Does that mean the lengths are 2 and 3?  If so, make them CHAR, not VARCHAR.  Every VARCHAR takes two extra bytes, which is insane for a column that's only 2-3 bytes anyway.

That is a lot of columns, but if they are *natural* keys -- that is, people can naturally think of them and use them -- it might be better from a useability standpoint than assigning a completely random #.  Also, the total length (25) is still not unreasonable, especially for so small a table.
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Author Comment

ID: 17821365
first off, apologies for putting varchar2, varchar3 - that was plain stupid as i didn't mean to indicate varchar2 but rather varchar(2).

secondly, i realize i should have added another important point - this DB is having SERIOUS performance issues and i'm looking into everything imaginable to help out, including the indexes (although the biggest performance gain is decreasing the # of JOINs -- in some procs i'm seeing 10-15 JOINs!).  so, given the fact that this table is one out of 15 being JOINed, it seems like this table would be a good candidate for a surrogate key, even though there are only 3K records.

also, these are the only columns in the table, and the pattern is like:
 CountryID, ProvinceID, CityID, DeptID, plus a few extra columns, if that helps - no extra cols other than the full composite key.   reminds me of a facts table in OLAP, except that these aren't all foreign keys to other tables (but 4 are).

drydenhogg - yep, the composite key *is* unique, so uniqueness is enforced; another reason i was considering the surrogate instead.

AaronAbend - yep, i agree with the coding complexity introduced by the composite key - yet another for surrogate.

scott - i would think that 1 col. (surrogate ID col) would be easier than remembering the combination of 7 keys.
and also, 1 out of 15 joins -- wouldn't it be better to join on one int (4) than on 25 bytes?   That was my original assumption anyway.

The joins are the main problem (as I can see - from trying everything else) -- there are simply too many procs to deal with up-front so i'm tackling this piece-by-piece (and taking highest-priority procs first).

Given the above - does it still make sense to use a surrogate id col?
LVL 10

Expert Comment

ID: 17821482
Are performance problems on inserts, updates, deletes or queries?

LVL 70

Expert Comment

by:Scott Pletcher
ID: 17821531
>> i would think that 1 col. (surrogate ID col) would be easier than remembering the combination of 7 keys. <<

Depends on the specific values.  Would it be easier to remember your address, phone number and initials or some new random 8-10 digit number?

>> and also, 1 out of 15 joins -- wouldn't it be better to join on one int (4) than on 25 bytes?   That was my original assumption anyway. <<

Of course, yes.  It would be better still to use a smallint if possible (only two bytes), including negative values if necessary.

Author Comment

ID: 17821678
Aaron - great point; more info:   *reads* are most important here.   a few inserts & updates here & there, but reading/aggregating data seems to be the most important action...so much so i've considered denormalizing into star-schema/OLAP-type structures (for read performance)

scott - yes, also good point about the info...but this DB has an app front end, so the users are abstracted from all that stuff.     smallint is a good idea, too - but the table holds user info so there could be more than 32k users at some point.

Author Comment

ID: 17821738
i guess i should have rephrased my question --

when initially creating tables, what are the benefits of using a surrogate key vs. a composite key with many columns?  

to use your answers in response, it seems if the column size is kept small (25 bytes) and won't hold many records (10K or fewer), and the data is logically arranged in a way that you'd consider the keys normal keys, then use composite.

but overall it seems a unique surrogate id is the easier choice, and as the table grows the key grows with it.

to close this question out, my *main* concern overall is that the creation of the index pages -- are all non-clus. indexed columns stored in the index pages themselves?   or is a composite "id" or key stored instead of all of those values?  

so... it seems the clustered, unique surrogate should win out, since that's the only info stored in the index pages.

Author Comment

ID: 17822269
I wound up *leaving* the key as-is (instead focusing on all the JOINs in the stored procs).   The discussion about overhead helped; you're right - not *that* much overhead & considering table growth will probably not be much (especially not nearly as much as 500,000 rows).

Thanks for the discussion / info - have a great weekend.

Featured Post

[Webinar] Database Backup and Recovery

Does your company store data on premises, off site, in the cloud, or a combination of these? If you answered “yes”, you need a data backup recovery plan that fits each and every platform. Watch now as as Percona teaches us how to build agile data backup recovery plan.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have a large data set and a SSIS package. How can I load this file in multi threading?
Recently we ran in to an issue while running some SQL jobs where we were trying to process the cubes.  We got an error saying failure stating 'NT SERVICE\SQLSERVERAGENT does not have access to Analysis Services. So this is a way to automate that wit…
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.

572 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question