Solved

Primary key - autonumber VS text

Posted on 2013-12-20
11
509 Views
Last Modified: 2013-12-22
Can anyone clarify and explain the pros and cons of having a primary key that is text VS a primary key that is auto number –

Also – which could be used in what possible situations – for example joing many to many junction tables –
Example – there is a table called [country] with fields – countryName,countryID like this
countryName | countryID
USA | US
Canada | CN


There is a [customer] table with fields > customerName,customerAddrss like this
customerName | customerAddrss
Rony | Redmond WA
Tony | Bellevue WN

Between country and customer – its many to many so …..

Thank you
0
Comment
Question by:Rayne
11 Comments
 
LVL 75

Accepted Solution

by:
DatabaseMX (Joe Anderson - Access MVP) earned 50 total points
ID: 39733128
Disclaimer:
You will be opening a can of worms with this question :-)
===

There is virtually no justification for using anything but the Access Auto Number for the Primary Key of any Table. It's the simplest and most elegant approach with the least amount of headaches.

m
0
 
LVL 76

Assisted Solution

by:GrahamSkan
GrahamSkan earned 50 total points
ID: 39733130
It is largely as matter of convenience as well as fitting with convention.

Autonumbers are self-handling. You can, or course, set a text ID to be a unique index, and that would work just as well, however it might require some extra verification code.
0
 
LVL 75
ID: 39733131
typo:
"This is " >> "There is ..."
0
 
LVL 75
ID: 39733219
thx. Might as well delete the 'typo' comment then.
0
 
LVL 13

Assisted Solution

by:sameer2010
sameer2010 earned 100 total points
ID: 39733307
If we use autonumber, business logic of de-duplication is to be handled by the programmer. So if it is something like a lookup code (country code in your example) that is standardized, it may not make sense adding auto number. However, if you want to carry that as a key in multiple tables, then perhaps adding a number would help.
On the other hand, if we have a table that stores anything that comes its way e.g. upload table, which does not care about content of column data, but it is to be worried by other apps, autonumber would well go with it (may be with other important column like timestamp, etc.)

Having said this, it all depends on how you want to use the column being auto numbered. As a rule of thumb, if it is not going to be a foreign key anywhere else (or at limited places), then it would be better to go without number.
0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 
LVL 57

Assisted Solution

by:Jim Dettman (Microsoft MVP/ EE MVE)
Jim Dettman (Microsoft MVP/ EE MVE) earned 100 total points
ID: 39733505
<<You will be opening a can of worms with this question :-)>>

 Well you got that part right.

<<There is virtually no justification for using anything but the Access Auto Number for the Primary Key of any Table. It's the simplest and most elegant approach with the least amount of headaches. >>

 That's not entirely true and not always the case.

 To answer the question directly, the difference is one of performance and that's why meaningless keys (an autonumber) and surrogate keys are used.  But the very fact that an autonumber is meaningless does yield problems.

 Read through the following and see if it all makes sense:

http://www.experts-exchange.com/Microsoft/Development/MS_Access/A_2041-The-great-PK-debate-Natural-Keys-vs-Surrogates-again.html

Jim.
0
 
LVL 30

Assisted Solution

by:hnasr
hnasr earned 100 total points
ID: 39733511
Adding:
Auto-number VS text depends on context.

Pros: Allows typing, thoughts to be analyzed later.
1 my city
2 mu city
3 new city
4 remove spaces in entries
5 I have to clean my data to allow for proper querying.

Cons: Accepts duplicate concept, but different expressions.
1 tomatos
2 tomat
3 tomato
4 ketcup

But you can use auto-number, and set unique indexes for some fields to check for duplication.

If you are counting visitors, auto-number goes fine. If you want to spot repeating visitors, unique index the field which specifies the relevant visitor.
0
 
LVL 84

Assisted Solution

by:Scott McDaniel (Microsoft Access MVP - EE MVE )
Scott McDaniel (Microsoft Access MVP - EE MVE ) earned 100 total points
ID: 39733512
You're asking about "Natural" versus "Synthetic" Primary keys. As Databasemx said, you'll have quite a few differing opinions on this one!!

DISCLAIMER: This is really more of a personal choice than anything else, and the opinions stated below are my own. Take them as you wish, but realize there are plenty of people out there who have much different opinions on this, and who are very passionate about this topic. I'm not that passionate about it ( :) ), so I'm not going to get involved in the Natural vs Surrogate debate.

In general, the Primary Key should be a value that can be used to uniquely identify a specific record. The value can be a single field, or could be comprised of multiple fields. The PK value should be one that would not change, but that is not an absolute necessity, since you could certainly just update all Child records with those changes.

A "Natural" key is one that has meaning to the data. A good example of that would be a US Social Security Number, an Email Address, a TelephoneNumber, ect. Those bits of data would be unique to a specific person, and you could easily identify a person based on that. For example, if someone phones and asks about the status of their Order, and you've used the Email address as the PR for your Customer table, then you could ask the caller for their Email Address and you'd be able to find all Orders related to that Email address. You could not do that directly with a Surrogate key without joining to the Customer table with the Surrogate key.

A bad example of that would be data which may easily change, like Telephone_Number or EamilAddress. That can change for various reasons, and therefore would not be a great choice for a Primary Key in many cases.

You can also have a Natural Key that is comprised of multiple columns, like First_Name+Last_Name+Phone_Number. However, in my opinion this would also be a poor choice since (a) there are likely to be more than one person with exactly the same First_Name+Last_Name+Phone_Number, and (b) there are likely to be persons with NO Phone_Number and (c) those values could easily change.

The disadvantage to the Natural PK is that you must also ensure that your field or fields are indeed unique to the data, and you must also store all of the values in the related tables, which can increase the size of your database (as well as the size of the Indexes of your database). You must also be very, very sure that the Natural key you choose is the appropriate one for you business rules. Also, if your business rules change, you could end up with some very invasive changes to the database, and the use of Natural Keys could become difficult to manage. For example, if you start out only working with US-based customers, and you use the SSN field as the Primary Key, and you then expand your business to also include non-US customers, your SSN field becomes somewhat useless for those non-US customers. If you instead had used a Surrogate Key, you could simply make the SSN field to be not required and continue on as before.

The advantage to using Natural keys is that it uniquely identifies a row of data with no other information required, and it mirrors the "real world" nature of that data - in other words, it mimics exactly the way humans interact with that specific object. For example, if I were trying to identify a specific person in my real-world office, I could ask them for a copy of their Social Security card, which would uniquely identify them in a US-based system. If I used Surrogate keys, I could not ask them for their "Surrogate Key Number", since they'd have no idea what that might be. It also makes searches some easier, since you would not need to join other tables to that "child" table in order to find records associated with the "parent" data. For example, if I'm storing the Social_Security_Number in a "child" table containing Order information, and I need to retrieve all Orders for a specific Person, I can just search on the Order_Header.Customer_SSN field to find those Orders. If I'm instead used a Surrogate key, I'd have to Join the Order_Header table to the Customer table, and then Search on the Customer.Social_Security_Number field. Over time, as the database grows, this could result in poor performance.

A "Synthetic" or "Surrogate" Primary Key is one that has no meaning to the data (i.e. an AutoNumber field). The only purpose of that Synthetic key is to provide a handle to that specific row of data. The data actually contained in the row is what makes it unique, and the AutoNumber value is of no meaning. If you use Surrogate keys, you should also include at least one other Unique Index that is comprised of one or more Fields in the table, and that Unique Index should distinctly define a specific record. For example, in an Order_Header table, perhaps the "uniqueness" is defined by the Customer_ID+Order_Date+ShipTo_Address+PO_Number. If so, then you'd create a multi-field Index with those 4 fields, along with a PK AutoNumber field, and the multi-field index would insure data uniqueness, while the PK field would be used when relating that record to other tables (such as the Order_Details table).

The advantage to using Surrogate keys is that your related tables are not impacted by changes to the parent data's "uniqueness data". For example, if I'm using a Natural_Key defined as Customer_ID+Order_Date+ShipTo_Address+PO_Number, and I find that I've made an error when entering the PO Number for a Customer's Order, I can just change that PO value in the Order_Header, and any related records in Order_Details would immediately pick up those changes without the need to make changes in the Order_Details table. Had I instead used those 4 fields as my Primary Key in the Order_Header table, then I'd have to update data in (a) the Order_Header table and (b) the Order_Details table. If I somehow forgot to make changes in the Order_Details table when changing those values in Order_Header, then I'd end up with "orphaned" records in Order_Details.

The disadvantage to using Surrogate keys is that you must add new columns and indexes to tables, which can increase the size and "bloat" of your database, and which could possibly impact performance. Also, as mentioned earlier, you must Join the Parent table in order to search on the unique values of the Parent table. IMO this is a valid trade-off, especially with modern machines
0
 
LVL 30

Expert Comment

by:hnasr
ID: 39733525
Adding:
If you read a number or a text and visualize it in your head, then ir is not a proper key for computer software. It is a primary key for human manual record keeping.

A key is not meant to be interpreted any furthur in our heads, the computer is supposed to relief us from that extra burden.

Meaningful data comes into other table fields.

Telephone numbers, for example, are not used as primary keys, in general, they are used as extra fields in table for search purposed to find the proper key for that record. They, as LSMConsulting mentioned, may change.

As LSMConsulting started, a can is opened :).
Ending: If you need an auto-number key use it for the main table. The related foreign keys should be numeric.
0
 

Author Closing Comment

by:Rayne
ID: 39734195
Thank you All,

This is really helpful.

Thank you for also correcting my English, excuse my french
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

PL/SQL can be a very powerful tool for working directly with database tables. Being able to loop will allow you to perform more complex operations, but can be a little tricky to write correctly. This article will provide examples of basic loops alon…
This article describes how to use the timestamp of existing data in a database to allow Tableau to calculate the prior work day instead of relying on case statements or if statements to calculate the days of the week.
In Microsoft Access, learn different ways of passing a string value within a string argument. Also learn what a “Type Mis-match” error is about.
The viewer will learn how to  create a slide that will launch other presentations in Microsoft PowerPoint. In the finished slide, each item launches a new PowerPoint presentation and when each is finished it automatically comes back to this slide: …

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now