Solved

Do you always use an IDENTITY as a PK?

Posted on 2016-09-08
20
39 Views
Last Modified: 2016-09-13
I typically create an Identity ID column for the PKs of all my tables.  My thinking is I have full control over changing the table columns and it seems cleaner especially when using foreign keys.

I was thinking about a downside to this approach.  Consider a DB which tracks a number of businesses, each with their own product categories and products.  Below is how I would typically deal with this:

Schema
The problem I see is it is it is possible to join products with product categories of a different business unit:

Schema 1 example
I can fix it by making the ProductCategory PK BusinessId and Name fields.  Then the Product table's BusinessId must match.  This enforces integrity but seems ugly to me.  Is there a better way or do I just need to let go of my habit of using identities for all PKs?

Alt Schema
0
Comment
Question by:canuckconsulting
  • 8
  • 6
  • 4
  • +2
20 Comments
 
LVL 45

Expert Comment

by:Vitor Montalvão
ID: 41789436
Do you always use an IDENTITY as a PK?
No. Mostly never.
0
 

Author Comment

by:canuckconsulting
ID: 41789473
Thanks Vitor.

Could I trouble you (and future respondents) to elaborate a bit?
0
 
LVL 45

Assisted Solution

by:Kdo
Kdo earned 100 total points
ID: 41789499
An Identity is a DBMS generated value that ensures a unique key in every row of data.  Your business rules may require that the row data have it's own set of unique value(s) such as employee ID, SSN, etc.

Unless there's a good reason not to, I generally do like you do and have an Identity column in each table.  If I have a uniqueness requirement I'll apply a unique scalar or composite index over the appropriate column(s).

You'll generally get better performance if you can structure the data so that joins are done on scalar integer keys instead of composite mixed keys.  The best case is if that integer key makes for an appropriate clustered index.


Good Luck!
Kent
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
ID: 41789504
Well, I think the main question is why someone should use it.
You said: "My thinking is I have full control over changing the table columns and it seems cleaner especially when using foreign keys"

So, why a non-identity PK doesn't offer you the same?
For example, in an Employees table why not use Employee Number instead of creating an Identiy column that has a number that can't be searchable by the users?
And who says Employees can also say a Product table (Products has unique serial numbers, for example) or any other table that can use an unique identifier without being an identity field (Social Security number, Part-Number, Driver license numbering, ...).
0
 

Author Comment

by:canuckconsulting
ID: 41789549
Kdo - Thanks very much.  How would you deal with the situation above where Products could be associated with ProductCategories of the wrong business?  

Vitor - I think a meaningless auto-generated key is cleaner and future-proof.  

Using your Employees example, what happens if companies merge and the new company needs to continue using their existing IDs even if they conflict with the old key.  Also, why deal with generating a unique key for each employee (ex "Victor01", etc)?    I don't understand your comment about why this id would not be searchable.  

Using your product table example, if the serial # is your PK, your system is now dependent on that being true. Should you receive an old part with a duplicate serial # you have to deal with that.  With an Identity on the table duplicate serials are not an issue.  Ditto with driver's license..what if you have to compensate for foreign licenses which may conflict?
0
 
LVL 45

Assisted Solution

by:Vitor Montalvão
Vitor Montalvão earned 400 total points
ID: 41789569
I think a meaningless auto-generated key is cleaner and future-proof.
Where's the technical proof in this statement?

Using your Employees example, what happens if companies merge and the new company needs to continue using their existing IDs even if they conflict with the old key.
What happens if it's an identity field? Wouldn't both companies have same identity ids also?

I don't understand your comment about why this id would not be searchable.  
If you want to search for employee "Victor01" the user just need to add WHERE EmployeeID = 'Victor01' so it will search by the PK.

if the serial # is your PK, your system is now dependent on that being true
Well, that's relational database should mean.

Should you receive an old part with a duplicate serial # you have to deal with that.
I don't think this can happen. But if it happens is good I think so your system won't allow to insert a part that shouldn't exist in the system (no duplicate records). With an identity PK you'll accept duplicates unless you create unique indexes on the table.

Ditto with driver's license..what if you have to compensate for foreign licenses which may conflict?
Also I'm not sure if foreign licenses will match national driver licenses. But if does you should have the country column to be part of the PK.

But ofc there are always exceptions and that's why I told you I mostly not use identity fields.
0
 
LVL 45

Expert Comment

by:Kdo
ID: 41789571
It appears that the issue with the Product and Product Category relationships is that the tables have a 1 to 1 relationship and use the same key for the relationship.  If there is a 1 to 1 relationship you probably don't need separate tables here.  But the data suggests that there should be a 1 to many relationship.  i.e, You can have multiple products in the same category.

Will your system support multiple vendors with the same product?  If so, will the product (and possibly the product category) be repeated for each authorized vendor?  I would expect that you would not repeat them and let the data define the relationship.
0
 

Accepted Solution

by:
canuckconsulting earned 0 total points
ID: 41789618
Kdo
I've created a lousy example.  I intended a simple One-To-Many relationship from ProductCategory to Product; ie a single category can correspond to many products.  Each ProductCategory relates to a single Business.  Assume that each vendor has its' own categories.

So the issue in the example is both the Yankee Cap and Blue Jays Cap are correctly associated with ProductCategoryId 1 but the Blue Jays Cap is also associated with BusinessUnitId 2 which isn't possible.  I can't get my head around how I can enforce this without changing the PK on the ProductCategory table.


Vitor
 
Thanks for the detailed reply.  You've given me lots to think about.

Where's the technical proof in this statement?
This referenced my subsequent Employee and Product examples.

What happens if it's an identity field? Wouldn't both companies have same identity ids also?

Good point.  I still think i would prefer the Identity though as I could add a new "Original Employee ID" without impacting referencing FKs.

If you want to search for employee "Victor01" the user just need to add WHERE EmployeeID = 'Victor01' so it will search by the PK.

If using IDENTITY, If you want to search for employee "1234" you likewise would add WHERE Employee 1234.  I think I'm missing your point here.  

Well, that's relational database should mean

No, I meant your PK is dependent on an external ID outside of your control being unique.  I take your point that it is good to restrict duplicate serial #s if that is a business rule. But if the rule changes allowing duplicate serials to compensate for a new supplier you now need to update all tables with that FK.  With an identity you would just need to drop the unique constraint on the serial # index.

But ofc there are always exceptions and that's why I told you I mostly not use identity fields.

Understood.
0
 
LVL 45

Assisted Solution

by:Vitor Montalvão
Vitor Montalvão earned 400 total points
ID: 41789635
If using IDENTITY, If you want to search for employee "1234" you likewise would add WHERE Employee 1234.  I think I'm missing your point here.  
But "1234" isn't employee number, right? It's only a PK in the table so nobody knows it.

But if the rule changes allowing duplicate serials to compensate for a new supplier you now need to update all tables with that FK.
Well, then per definition we can't use the serial number as PK, right? :)

Few examples than comes to my mind where I'ld use an identity field are type tables.
Color
  1. Blue
  2. Yellow
  3. Red
  4. Green
  5. Black
  6. White
  7. Brown

Marital status
  1. Single
  2. Married
  3. Divorced
  4. Widow

And tables where you'll need to have more columns to perform a composite PK.
0
 
LVL 45

Expert Comment

by:Kdo
ID: 41789646
It looks like the join between Business and Product Category should go through Product.

SELECT *
FROM Business
LEFT JOIN Product ON ...
LEFT JOIN ProductCategory ON ...

That way any business can sell any authorized product, and you can still identify the product categories available to any business by the products authorized to that business.

The data may be fine as it is.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:canuckconsulting
ID: 41789649
But "1234" isn't employee number, right? It's only a PK in the table so nobody knows it.

No, it would be the employee number.  It's autogenerated but would be visible; like an autogenerated order id.  It's just meaningless.

Well, then per definition we can't use the serial number as PK, right? :)

Yes, if the business rules changed the serial # could no longer be used as the PK.  But by never using a meaning-full # (serial #, license #, etc) as the PK in the first place the system should be able to more easily deal with fundamental changes like this.  (ie FKs remain unaffected).


Following your comments Vitor I can see this is clearly not a yes/no question.  I was hoping for some discussion to help me better think about my initial instinctive (ie not thought out) decision to always use PKs and this has certainly done that!
0
 

Author Comment

by:canuckconsulting
ID: 41789685
Thanks Kdo.  I think my example was poor as it hasn't communicated what I'm after.  

A better example is as follows.  A business provides our company licenses we can print on our products.  A license is exclusive to that business that provides them.  A product is always associated with one of these businesses and can optionally be associated with one of their products.  The problem is how to ensure that a product is never associated with a license for a another business.

License schema
License Data
0
 
LVL 45

Assisted Solution

by:Vitor Montalvão
Vitor Montalvão earned 400 total points
ID: 41789709
No, it would be the employee number.  It's autogenerated but would be visible; like an autogenerated order id.  It's just meaningless.
Never saw a solution like that for employees ID just because usually they aren't totally numeric. But if that's the case then an Identity would work just fine, ofc.

Following your comments Vitor I can see this is clearly not a yes/no question.  
Sure, is not. And also not a right or wrong answer either. Important is that your database model is well designed and can answers the business demands. At the end you may even using both approaches.

A good example (at least for me) on why I prefer to avoid Identity columns is imagine if you want to know employees salaries (or absences) you can query directly the table Employee_Salary to get the EmployeedID and the Salary:
SELECT EmployeeID, Salary
FROM EmployeeSalary

Open in new window

The same isn't true if the EmployeeID is an Identity and for the sake of this example, let say is not really the Employee number, so you'll always need to join to the Employee table to retrieve that info:
SELECT E.EmployeeNumber, S.Salary
FROM EmployeeSalary S
    INNER JOIN Employee E ON E.EmployeeID=S.EmployeeID

Open in new window

Ofc, who says Employee can also say a Product or a Pacient, Student, etc...
0
 
LVL 45

Expert Comment

by:Kdo
ID: 41789772
You've got a data issue.  There are redundant (not normalized) features and they are getting in the way.

Product ID 6 (Spiderman Hat) has a Business ID of 1 and a License ID of 3.  It's in disagreement with License # 3 which states that it is associated with Business ID 2.

So Product 6 is authorized to Business 1, but business 1 has no license for the product.
0
 

Author Comment

by:canuckconsulting
ID: 41789788
Exactly.  I created it this to illustrate what I'm trying to prevent.

I don't think it is a normalization issue though.  A product must have a business but the license is optional.
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
ID: 41789792
About your other question, I think you have a design fail.
ProductCategory is a linked table and linked tables shouldn't have PK composed by the main tables. In this case Product.ID + Business.ID. Also, Name column should be removed. Btw, the table name should be ProductBusiness since you don't have any Category there.
And remove BusinessID and ProductCategoryID from Product table.
0
 

Author Comment

by:canuckconsulting
ID: 41789800
VItor-> Product Category is not a linked table; it's is own object.  The second example shows better what I intended

I think it is a separate question though so I'm going to close this PK issue now and open as a separate issue.

Thank you both for your help.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 41789870
Do you always use an IDENTITY as a PK?

Absolutely not!  Frankly it's a bit of laziness that some people always create such a "key".  Everyone is always in a rush to short-circuit, or even completely skip, the logical design process and get to a "real" (physical) table, but that's a huge mistake.

In particular, child and intersection tables should be clustered (keyed) first by the parent key(s).  This provides not only clearer and cleaner business logic but vastly better overall performance as well.

For example, an OrderItems table should be clustered (keyed) first on OrderID, not some meaningless identity value.
0
 
LVL 27

Expert Comment

by:tliotta
ID: 41790821
...what happens if companies merge and the new company needs to continue using their existing IDs even if they conflict with the old key.
I'm not clear on how that makes a non-identity PK less desirable. An identity PK could easily be even more troublesome for merged databases.

Regardless, I definitely prefer having an auto-generated/identity column for practically all volatile data tables. That doesn't necessarily mean that the column should be the PK though. Using an identity value often has different motivations behind it from using a PK. The fundamental purpose of the column (or even compound columns) should determine what the PK consists of.
0
 

Author Closing Comment

by:canuckconsulting
ID: 41795707
This generated some good discussion points to think about.  For my purposes I'll continue to use Identities as PK's in most situations.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

This article explains all about SQL Server Piecemeal Restore with examples in step by step manner.
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now