Surrogate primary key index VS. index on composite key(s)

First off, I want to thank all of you for the help (particularly aneeshattingal & Scott Pletcher, among others) -- you have all been incredibly helpful in my foray into index optimization.

This answer seems obvious to me...but of course in IT there's rarely *one foolproof answer*, so I'm looking for expert input:

BACKGROUND: I'm working with a table which has a composite primary key (varchar2, varchar3, int, int, int, int, int) and there's a nonclustered index on those columns.   There are 3660 rows in the table.

QUESTION: While I know there are valid reasons for creating the table as such, it seems I should delete the nonclus index & put a nice, simple surrogate primary key int in there, and allow it to cluster.  Then grab all procs which reference [the previous composite primary key] and reference the surrogate instead.

That is, there's too much overhead in the nonclus index... and I should remove it.

ADDITIONAL CONSIDERATION: How would you answer the same question if the rows grow to 100,000?  

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Im not sure what 'too much overhead' for the nonclustered index you are finding, can we assume it is causing RID Lookup / Bookmark lookups within your query plans? Whether 3k or 100k records, it is still a small table.

If your current primary key is unique and you want uniqueness to be enforced, your going to have to add a constraint to do it.

Well, why not cluster on that key?  It may cost a bit on the insert side I suppose.

But when it comes to performance, the only thing I am sure of is the quote attributed to Turing: "Premature optimization is the root of all evil".

So what is your problem? Overhead is not a problem in and of itself - unless you can measure its impact on performance and report that measurement in minutes or seconds. Short of that, I have been working with this SQL engine for 20 years and it still does things I cannot explain - and do not care to. We cannot change the optimizer's behavior, only watch it.  And if it is running OK, do not change things.

Now this previous analysis is unrelated to another problem you may be thinking about: complexity. A primary key with 7 elements - if I am reading you right - is definitely a candidate for a surrogate, but that is not because of the primary key itself, but when you need to reference it as a foreign key, because indexing the foreign key on that many columns in a lot of tables will likely cause a lot of insert and update performance problems, not to mention the coding complexity.

the number of rows is tiny by any standard - I do not think you will see any issues until you get to 500,000 rows - and probably not even then.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Scott PletcherSenior DBACommented:
>> varchar2, varchar3 <<

Does that mean the lengths are 2 and 3?  If so, make them CHAR, not VARCHAR.  Every VARCHAR takes two extra bytes, which is insane for a column that's only 2-3 bytes anyway.

That is a lot of columns, but if they are *natural* keys -- that is, people can naturally think of them and use them -- it might be better from a useability standpoint than assigning a completely random #.  Also, the total length (25) is still not unreasonable, especially for so small a table.
Active Protection takes the fight to cryptojacking

While there were several headline-grabbing ransomware attacks during in 2017, another big threat started appearing at the same time that didn’t get the same coverage – illicit cryptomining.

nespaAuthor Commented:
first off, apologies for putting varchar2, varchar3 - that was plain stupid as i didn't mean to indicate varchar2 but rather varchar(2).

secondly, i realize i should have added another important point - this DB is having SERIOUS performance issues and i'm looking into everything imaginable to help out, including the indexes (although the biggest performance gain is decreasing the # of JOINs -- in some procs i'm seeing 10-15 JOINs!).  so, given the fact that this table is one out of 15 being JOINed, it seems like this table would be a good candidate for a surrogate key, even though there are only 3K records.

also, these are the only columns in the table, and the pattern is like:
 CountryID, ProvinceID, CityID, DeptID, plus a few extra columns, if that helps - no extra cols other than the full composite key.   reminds me of a facts table in OLAP, except that these aren't all foreign keys to other tables (but 4 are).

drydenhogg - yep, the composite key *is* unique, so uniqueness is enforced; another reason i was considering the surrogate instead.

AaronAbend - yep, i agree with the coding complexity introduced by the composite key - yet another for surrogate.

scott - i would think that 1 col. (surrogate ID col) would be easier than remembering the combination of 7 keys.
and also, 1 out of 15 joins -- wouldn't it be better to join on one int (4) than on 25 bytes?   That was my original assumption anyway.

The joins are the main problem (as I can see - from trying everything else) -- there are simply too many procs to deal with up-front so i'm tackling this piece-by-piece (and taking highest-priority procs first).

Given the above - does it still make sense to use a surrogate id col?
Are performance problems on inserts, updates, deletes or queries?

Scott PletcherSenior DBACommented:
>> i would think that 1 col. (surrogate ID col) would be easier than remembering the combination of 7 keys. <<

Depends on the specific values.  Would it be easier to remember your address, phone number and initials or some new random 8-10 digit number?

>> and also, 1 out of 15 joins -- wouldn't it be better to join on one int (4) than on 25 bytes?   That was my original assumption anyway. <<

Of course, yes.  It would be better still to use a smallint if possible (only two bytes), including negative values if necessary.
nespaAuthor Commented:
Aaron - great point; more info:   *reads* are most important here.   a few inserts & updates here & there, but reading/aggregating data seems to be the most important much so i've considered denormalizing into star-schema/OLAP-type structures (for read performance)

scott - yes, also good point about the info...but this DB has an app front end, so the users are abstracted from all that stuff.     smallint is a good idea, too - but the table holds user info so there could be more than 32k users at some point.
nespaAuthor Commented:
i guess i should have rephrased my question --

when initially creating tables, what are the benefits of using a surrogate key vs. a composite key with many columns?  

to use your answers in response, it seems if the column size is kept small (25 bytes) and won't hold many records (10K or fewer), and the data is logically arranged in a way that you'd consider the keys normal keys, then use composite.

but overall it seems a unique surrogate id is the easier choice, and as the table grows the key grows with it.

to close this question out, my *main* concern overall is that the creation of the index pages -- are all non-clus. indexed columns stored in the index pages themselves?   or is a composite "id" or key stored instead of all of those values?  

so... it seems the clustered, unique surrogate should win out, since that's the only info stored in the index pages.
nespaAuthor Commented:
I wound up *leaving* the key as-is (instead focusing on all the JOINs in the stored procs).   The discussion about overhead helped; you're right - not *that* much overhead & considering table growth will probably not be much (especially not nearly as much as 500,000 rows).

Thanks for the discussion / info - have a great weekend.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.