Surrogate primary key index VS. index on composite key(s)

Posted on 2006-10-27
Last Modified: 2008-02-01
First off, I want to thank all of you for the help (particularly aneeshattingal & Scott Pletcher, among others) -- you have all been incredibly helpful in my foray into index optimization.

This answer seems obvious to me...but of course in IT there's rarely *one foolproof answer*, so I'm looking for expert input:

BACKGROUND: I'm working with a table which has a composite primary key (varchar2, varchar3, int, int, int, int, int) and there's a nonclustered index on those columns.   There are 3660 rows in the table.

QUESTION: While I know there are valid reasons for creating the table as such, it seems I should delete the nonclus index & put a nice, simple surrogate primary key int in there, and allow it to cluster.  Then grab all procs which reference [the previous composite primary key] and reference the surrogate instead.

That is, there's too much overhead in the nonclus index... and I should remove it.

ADDITIONAL CONSIDERATION: How would you answer the same question if the rows grow to 100,000?  

Question by:nespa
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
  • 2
  • +1

Assisted Solution

drydenhogg earned 100 total points
ID: 17819567
Im not sure what 'too much overhead' for the nonclustered index you are finding, can we assume it is causing RID Lookup / Bookmark lookups within your query plans? Whether 3k or 100k records, it is still a small table.

If your current primary key is unique and you want uniqueness to be enforced, your going to have to add a constraint to do it.

LVL 10

Accepted Solution

AaronAbend earned 300 total points
ID: 17819673
Well, why not cluster on that key?  It may cost a bit on the insert side I suppose.

But when it comes to performance, the only thing I am sure of is the quote attributed to Turing: "Premature optimization is the root of all evil".

So what is your problem? Overhead is not a problem in and of itself - unless you can measure its impact on performance and report that measurement in minutes or seconds. Short of that, I have been working with this SQL engine for 20 years and it still does things I cannot explain - and do not care to. We cannot change the optimizer's behavior, only watch it.  And if it is running OK, do not change things.

Now this previous analysis is unrelated to another problem you may be thinking about: complexity. A primary key with 7 elements - if I am reading you right - is definitely a candidate for a surrogate, but that is not because of the primary key itself, but when you need to reference it as a foreign key, because indexing the foreign key on that many columns in a lot of tables will likely cause a lot of insert and update performance problems, not to mention the coding complexity.

the number of rows is tiny by any standard - I do not think you will see any issues until you get to 500,000 rows - and probably not even then.
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 100 total points
ID: 17819923
>> varchar2, varchar3 <<

Does that mean the lengths are 2 and 3?  If so, make them CHAR, not VARCHAR.  Every VARCHAR takes two extra bytes, which is insane for a column that's only 2-3 bytes anyway.

That is a lot of columns, but if they are *natural* keys -- that is, people can naturally think of them and use them -- it might be better from a useability standpoint than assigning a completely random #.  Also, the total length (25) is still not unreasonable, especially for so small a table.
Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.


Author Comment

ID: 17821365
first off, apologies for putting varchar2, varchar3 - that was plain stupid as i didn't mean to indicate varchar2 but rather varchar(2).

secondly, i realize i should have added another important point - this DB is having SERIOUS performance issues and i'm looking into everything imaginable to help out, including the indexes (although the biggest performance gain is decreasing the # of JOINs -- in some procs i'm seeing 10-15 JOINs!).  so, given the fact that this table is one out of 15 being JOINed, it seems like this table would be a good candidate for a surrogate key, even though there are only 3K records.

also, these are the only columns in the table, and the pattern is like:
 CountryID, ProvinceID, CityID, DeptID, plus a few extra columns, if that helps - no extra cols other than the full composite key.   reminds me of a facts table in OLAP, except that these aren't all foreign keys to other tables (but 4 are).

drydenhogg - yep, the composite key *is* unique, so uniqueness is enforced; another reason i was considering the surrogate instead.

AaronAbend - yep, i agree with the coding complexity introduced by the composite key - yet another for surrogate.

scott - i would think that 1 col. (surrogate ID col) would be easier than remembering the combination of 7 keys.
and also, 1 out of 15 joins -- wouldn't it be better to join on one int (4) than on 25 bytes?   That was my original assumption anyway.

The joins are the main problem (as I can see - from trying everything else) -- there are simply too many procs to deal with up-front so i'm tackling this piece-by-piece (and taking highest-priority procs first).

Given the above - does it still make sense to use a surrogate id col?
LVL 10

Expert Comment

ID: 17821482
Are performance problems on inserts, updates, deletes or queries?

LVL 69

Expert Comment

by:Scott Pletcher
ID: 17821531
>> i would think that 1 col. (surrogate ID col) would be easier than remembering the combination of 7 keys. <<

Depends on the specific values.  Would it be easier to remember your address, phone number and initials or some new random 8-10 digit number?

>> and also, 1 out of 15 joins -- wouldn't it be better to join on one int (4) than on 25 bytes?   That was my original assumption anyway. <<

Of course, yes.  It would be better still to use a smallint if possible (only two bytes), including negative values if necessary.

Author Comment

ID: 17821678
Aaron - great point; more info:   *reads* are most important here.   a few inserts & updates here & there, but reading/aggregating data seems to be the most important much so i've considered denormalizing into star-schema/OLAP-type structures (for read performance)

scott - yes, also good point about the info...but this DB has an app front end, so the users are abstracted from all that stuff.     smallint is a good idea, too - but the table holds user info so there could be more than 32k users at some point.

Author Comment

ID: 17821738
i guess i should have rephrased my question --

when initially creating tables, what are the benefits of using a surrogate key vs. a composite key with many columns?  

to use your answers in response, it seems if the column size is kept small (25 bytes) and won't hold many records (10K or fewer), and the data is logically arranged in a way that you'd consider the keys normal keys, then use composite.

but overall it seems a unique surrogate id is the easier choice, and as the table grows the key grows with it.

to close this question out, my *main* concern overall is that the creation of the index pages -- are all non-clus. indexed columns stored in the index pages themselves?   or is a composite "id" or key stored instead of all of those values?  

so... it seems the clustered, unique surrogate should win out, since that's the only info stored in the index pages.

Author Comment

ID: 17822269
I wound up *leaving* the key as-is (instead focusing on all the JOINs in the stored procs).   The discussion about overhead helped; you're right - not *that* much overhead & considering table growth will probably not be much (especially not nearly as much as 500,000 rows).

Thanks for the discussion / info - have a great weekend.

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SQL 2016: choose separate instances for software install; performance 19 24
SQL Job Hung 17 37
SQL Server Trigger 8 38
IF SQL Query 12 29
Introduction In my previous article ( I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question