Primary key - autonumber VS text

Posted on 2013-12-20
Last Modified: 2013-12-22
Can anyone clarify and explain the pros and cons of having a primary key that is text VS a primary key that is auto number –

Also – which could be used in what possible situations – for example joing many to many junction tables –
Example – there is a table called [country] with fields – countryName,countryID like this
countryName | countryID
Canada | CN

There is a [customer] table with fields > customerName,customerAddrss like this
customerName | customerAddrss
Rony | Redmond WA
Tony | Bellevue WN

Between country and customer – its many to many so …..

Thank you
Question by:Rayne
LVL 75

Accepted Solution

DatabaseMX (Joe Anderson - Access MVP) earned 50 total points
ID: 39733128
You will be opening a can of worms with this question :-)

There is virtually no justification for using anything but the Access Auto Number for the Primary Key of any Table. It's the simplest and most elegant approach with the least amount of headaches.

LVL 76

Assisted Solution

GrahamSkan earned 50 total points
ID: 39733130
It is largely as matter of convenience as well as fitting with convention.

Autonumbers are self-handling. You can, or course, set a text ID to be a unique index, and that would work just as well, however it might require some extra verification code.
LVL 75
ID: 39733131
"This is " >> "There is ..."
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

LVL 75
ID: 39733219
thx. Might as well delete the 'typo' comment then.
LVL 13

Assisted Solution

sameer2010 earned 100 total points
ID: 39733307
If we use autonumber, business logic of de-duplication is to be handled by the programmer. So if it is something like a lookup code (country code in your example) that is standardized, it may not make sense adding auto number. However, if you want to carry that as a key in multiple tables, then perhaps adding a number would help.
On the other hand, if we have a table that stores anything that comes its way e.g. upload table, which does not care about content of column data, but it is to be worried by other apps, autonumber would well go with it (may be with other important column like timestamp, etc.)

Having said this, it all depends on how you want to use the column being auto numbered. As a rule of thumb, if it is not going to be a foreign key anywhere else (or at limited places), then it would be better to go without number.
LVL 57

Assisted Solution

by:Jim Dettman (Microsoft MVP/ EE MVE)
Jim Dettman (Microsoft MVP/ EE MVE) earned 100 total points
ID: 39733505
<<You will be opening a can of worms with this question :-)>>

 Well you got that part right.

<<There is virtually no justification for using anything but the Access Auto Number for the Primary Key of any Table. It's the simplest and most elegant approach with the least amount of headaches. >>

 That's not entirely true and not always the case.

 To answer the question directly, the difference is one of performance and that's why meaningless keys (an autonumber) and surrogate keys are used.  But the very fact that an autonumber is meaningless does yield problems.

 Read through the following and see if it all makes sense:

LVL 30

Assisted Solution

hnasr earned 100 total points
ID: 39733511
Auto-number VS text depends on context.

Pros: Allows typing, thoughts to be analyzed later.
1 my city
2 mu city
3 new city
4 remove spaces in entries
5 I have to clean my data to allow for proper querying.

Cons: Accepts duplicate concept, but different expressions.
1 tomatos
2 tomat
3 tomato
4 ketcup

But you can use auto-number, and set unique indexes for some fields to check for duplication.

If you are counting visitors, auto-number goes fine. If you want to spot repeating visitors, unique index the field which specifies the relevant visitor.
LVL 84

Assisted Solution

by:Scott McDaniel (Microsoft Access MVP - EE MVE )
Scott McDaniel (Microsoft Access MVP - EE MVE ) earned 100 total points
ID: 39733512
You're asking about "Natural" versus "Synthetic" Primary keys. As Databasemx said, you'll have quite a few differing opinions on this one!!

DISCLAIMER: This is really more of a personal choice than anything else, and the opinions stated below are my own. Take them as you wish, but realize there are plenty of people out there who have much different opinions on this, and who are very passionate about this topic. I'm not that passionate about it ( :) ), so I'm not going to get involved in the Natural vs Surrogate debate.

In general, the Primary Key should be a value that can be used to uniquely identify a specific record. The value can be a single field, or could be comprised of multiple fields. The PK value should be one that would not change, but that is not an absolute necessity, since you could certainly just update all Child records with those changes.

A "Natural" key is one that has meaning to the data. A good example of that would be a US Social Security Number, an Email Address, a TelephoneNumber, ect. Those bits of data would be unique to a specific person, and you could easily identify a person based on that. For example, if someone phones and asks about the status of their Order, and you've used the Email address as the PR for your Customer table, then you could ask the caller for their Email Address and you'd be able to find all Orders related to that Email address. You could not do that directly with a Surrogate key without joining to the Customer table with the Surrogate key.

A bad example of that would be data which may easily change, like Telephone_Number or EamilAddress. That can change for various reasons, and therefore would not be a great choice for a Primary Key in many cases.

You can also have a Natural Key that is comprised of multiple columns, like First_Name+Last_Name+Phone_Number. However, in my opinion this would also be a poor choice since (a) there are likely to be more than one person with exactly the same First_Name+Last_Name+Phone_Number, and (b) there are likely to be persons with NO Phone_Number and (c) those values could easily change.

The disadvantage to the Natural PK is that you must also ensure that your field or fields are indeed unique to the data, and you must also store all of the values in the related tables, which can increase the size of your database (as well as the size of the Indexes of your database). You must also be very, very sure that the Natural key you choose is the appropriate one for you business rules. Also, if your business rules change, you could end up with some very invasive changes to the database, and the use of Natural Keys could become difficult to manage. For example, if you start out only working with US-based customers, and you use the SSN field as the Primary Key, and you then expand your business to also include non-US customers, your SSN field becomes somewhat useless for those non-US customers. If you instead had used a Surrogate Key, you could simply make the SSN field to be not required and continue on as before.

The advantage to using Natural keys is that it uniquely identifies a row of data with no other information required, and it mirrors the "real world" nature of that data - in other words, it mimics exactly the way humans interact with that specific object. For example, if I were trying to identify a specific person in my real-world office, I could ask them for a copy of their Social Security card, which would uniquely identify them in a US-based system. If I used Surrogate keys, I could not ask them for their "Surrogate Key Number", since they'd have no idea what that might be. It also makes searches some easier, since you would not need to join other tables to that "child" table in order to find records associated with the "parent" data. For example, if I'm storing the Social_Security_Number in a "child" table containing Order information, and I need to retrieve all Orders for a specific Person, I can just search on the Order_Header.Customer_SSN field to find those Orders. If I'm instead used a Surrogate key, I'd have to Join the Order_Header table to the Customer table, and then Search on the Customer.Social_Security_Number field. Over time, as the database grows, this could result in poor performance.

A "Synthetic" or "Surrogate" Primary Key is one that has no meaning to the data (i.e. an AutoNumber field). The only purpose of that Synthetic key is to provide a handle to that specific row of data. The data actually contained in the row is what makes it unique, and the AutoNumber value is of no meaning. If you use Surrogate keys, you should also include at least one other Unique Index that is comprised of one or more Fields in the table, and that Unique Index should distinctly define a specific record. For example, in an Order_Header table, perhaps the "uniqueness" is defined by the Customer_ID+Order_Date+ShipTo_Address+PO_Number. If so, then you'd create a multi-field Index with those 4 fields, along with a PK AutoNumber field, and the multi-field index would insure data uniqueness, while the PK field would be used when relating that record to other tables (such as the Order_Details table).

The advantage to using Surrogate keys is that your related tables are not impacted by changes to the parent data's "uniqueness data". For example, if I'm using a Natural_Key defined as Customer_ID+Order_Date+ShipTo_Address+PO_Number, and I find that I've made an error when entering the PO Number for a Customer's Order, I can just change that PO value in the Order_Header, and any related records in Order_Details would immediately pick up those changes without the need to make changes in the Order_Details table. Had I instead used those 4 fields as my Primary Key in the Order_Header table, then I'd have to update data in (a) the Order_Header table and (b) the Order_Details table. If I somehow forgot to make changes in the Order_Details table when changing those values in Order_Header, then I'd end up with "orphaned" records in Order_Details.

The disadvantage to using Surrogate keys is that you must add new columns and indexes to tables, which can increase the size and "bloat" of your database, and which could possibly impact performance. Also, as mentioned earlier, you must Join the Parent table in order to search on the unique values of the Parent table. IMO this is a valid trade-off, especially with modern machines
LVL 30

Expert Comment

ID: 39733525
If you read a number or a text and visualize it in your head, then ir is not a proper key for computer software. It is a primary key for human manual record keeping.

A key is not meant to be interpreted any furthur in our heads, the computer is supposed to relief us from that extra burden.

Meaningful data comes into other table fields.

Telephone numbers, for example, are not used as primary keys, in general, they are used as extra fields in table for search purposed to find the proper key for that record. They, as LSMConsulting mentioned, may change.

As LSMConsulting started, a can is opened :).
Ending: If you need an auto-number key use it for the main table. The related foreign keys should be numeric.

Author Closing Comment

ID: 39734195
Thank you All,

This is really helpful.

Thank you for also correcting my English, excuse my french

Featured Post

Migrating Your Company's PCs

To keep pace with competitors, businesses must keep employees productive, and that means providing them with the latest technology. This document provides the tips and tricks you need to help you migrate an outdated PC fleet to new desktops, laptops, and tablets.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

My experience with Windows 10 over a one year period and suggestions for smooth operation
Phishing attempts can come in all forms, shapes and sizes. No matter how familiar you think you are with them, always remember to take extra precaution when opening an email with attachments or links.
In Microsoft Access, when working with VBA, learn some techniques for writing readable and easily maintained code.
With Microsoft Access, learn how to start a database in different ways and produce different start-up actions allowing you to use a single database to perform multiple tasks. Specify a start-up form through options: Specify an Autoexec macro: Us…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question