?
Solved

Surrogate primary key index VS. index on composite key(s)

Posted on 2006-10-27
9
Medium Priority
?
661 Views
Last Modified: 2008-02-01
First off, I want to thank all of you for the help (particularly aneeshattingal & Scott Pletcher, among others) -- you have all been incredibly helpful in my foray into index optimization.

This answer seems obvious to me...but of course in IT there's rarely *one foolproof answer*, so I'm looking for expert input:

BACKGROUND: I'm working with a table which has a composite primary key (varchar2, varchar3, int, int, int, int, int) and there's a nonclustered index on those columns.   There are 3660 rows in the table.

QUESTION: While I know there are valid reasons for creating the table as such, it seems I should delete the nonclus index & put a nice, simple surrogate primary key int in there, and allow it to cluster.  Then grab all procs which reference [the previous composite primary key] and reference the surrogate instead.

That is, there's too much overhead in the nonclus index... and I should remove it.

ADDITIONAL CONSIDERATION: How would you answer the same question if the rows grow to 100,000?  

Thanks!
0
Comment
Question by:nespa
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
  • 2
  • +1
9 Comments
 
LVL 8

Assisted Solution

by:drydenhogg
drydenhogg earned 400 total points
ID: 17819567
Im not sure what 'too much overhead' for the nonclustered index you are finding, can we assume it is causing RID Lookup / Bookmark lookups within your query plans? Whether 3k or 100k records, it is still a small table.

If your current primary key is unique and you want uniqueness to be enforced, your going to have to add a constraint to do it.

0
 
LVL 10

Accepted Solution

by:
AaronAbend earned 1200 total points
ID: 17819673
Well, why not cluster on that key?  It may cost a bit on the insert side I suppose.

But when it comes to performance, the only thing I am sure of is the quote attributed to Turing: "Premature optimization is the root of all evil".

So what is your problem? Overhead is not a problem in and of itself - unless you can measure its impact on performance and report that measurement in minutes or seconds. Short of that, I have been working with this SQL engine for 20 years and it still does things I cannot explain - and do not care to. We cannot change the optimizer's behavior, only watch it.  And if it is running OK, do not change things.

Now this previous analysis is unrelated to another problem you may be thinking about: complexity. A primary key with 7 elements - if I am reading you right - is definitely a candidate for a surrogate, but that is not because of the primary key itself, but when you need to reference it as a foreign key, because indexing the foreign key on that many columns in a lot of tables will likely cause a lot of insert and update performance problems, not to mention the coding complexity.

the number of rows is tiny by any standard - I do not think you will see any issues until you get to 500,000 rows - and probably not even then.
0
 
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 400 total points
ID: 17819923
>> varchar2, varchar3 <<

Does that mean the lengths are 2 and 3?  If so, make them CHAR, not VARCHAR.  Every VARCHAR takes two extra bytes, which is insane for a column that's only 2-3 bytes anyway.

That is a lot of columns, but if they are *natural* keys -- that is, people can naturally think of them and use them -- it might be better from a useability standpoint than assigning a completely random #.  Also, the total length (25) is still not unreasonable, especially for so small a table.
0
Percona Live Europe 2017 | Sep 25 - 27, 2017

The Percona Live Open Source Database Conference Europe 2017 is the premier event for the diverse and active European open source database community, as well as businesses that develop and use open source database software.

 
LVL 1

Author Comment

by:nespa
ID: 17821365
first off, apologies for putting varchar2, varchar3 - that was plain stupid as i didn't mean to indicate varchar2 but rather varchar(2).

secondly, i realize i should have added another important point - this DB is having SERIOUS performance issues and i'm looking into everything imaginable to help out, including the indexes (although the biggest performance gain is decreasing the # of JOINs -- in some procs i'm seeing 10-15 JOINs!).  so, given the fact that this table is one out of 15 being JOINed, it seems like this table would be a good candidate for a surrogate key, even though there are only 3K records.

also, these are the only columns in the table, and the pattern is like:
 CountryID, ProvinceID, CityID, DeptID, plus a few extra columns, if that helps - no extra cols other than the full composite key.   reminds me of a facts table in OLAP, except that these aren't all foreign keys to other tables (but 4 are).

drydenhogg - yep, the composite key *is* unique, so uniqueness is enforced; another reason i was considering the surrogate instead.

AaronAbend - yep, i agree with the coding complexity introduced by the composite key - yet another for surrogate.

scott - i would think that 1 col. (surrogate ID col) would be easier than remembering the combination of 7 keys.
and also, 1 out of 15 joins -- wouldn't it be better to join on one int (4) than on 25 bytes?   That was my original assumption anyway.

The joins are the main problem (as I can see - from trying everything else) -- there are simply too many procs to deal with up-front so i'm tackling this piece-by-piece (and taking highest-priority procs first).

Given the above - does it still make sense to use a surrogate id col?
0
 
LVL 10

Expert Comment

by:AaronAbend
ID: 17821482
Are performance problems on inserts, updates, deletes or queries?


0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 17821531
>> i would think that 1 col. (surrogate ID col) would be easier than remembering the combination of 7 keys. <<

Depends on the specific values.  Would it be easier to remember your address, phone number and initials or some new random 8-10 digit number?


>> and also, 1 out of 15 joins -- wouldn't it be better to join on one int (4) than on 25 bytes?   That was my original assumption anyway. <<

Of course, yes.  It would be better still to use a smallint if possible (only two bytes), including negative values if necessary.
0
 
LVL 1

Author Comment

by:nespa
ID: 17821678
Aaron - great point; more info:   *reads* are most important here.   a few inserts & updates here & there, but reading/aggregating data seems to be the most important action...so much so i've considered denormalizing into star-schema/OLAP-type structures (for read performance)

scott - yes, also good point about the info...but this DB has an app front end, so the users are abstracted from all that stuff.     smallint is a good idea, too - but the table holds user info so there could be more than 32k users at some point.
0
 
LVL 1

Author Comment

by:nespa
ID: 17821738
i guess i should have rephrased my question --

when initially creating tables, what are the benefits of using a surrogate key vs. a composite key with many columns?  

to use your answers in response, it seems if the column size is kept small (25 bytes) and won't hold many records (10K or fewer), and the data is logically arranged in a way that you'd consider the keys normal keys, then use composite.

but overall it seems a unique surrogate id is the easier choice, and as the table grows the key grows with it.


to close this question out, my *main* concern overall is that the creation of the index pages -- are all non-clus. indexed columns stored in the index pages themselves?   or is a composite "id" or key stored instead of all of those values?  

so... it seems the clustered, unique surrogate should win out, since that's the only info stored in the index pages.
0
 
LVL 1

Author Comment

by:nespa
ID: 17822269
I wound up *leaving* the key as-is (instead focusing on all the JOINs in the stored procs).   The discussion about overhead helped; you're right - not *that* much overhead & considering table growth will probably not be much (especially not nearly as much as 500,000 rows).

Thanks for the discussion / info - have a great weekend.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question