Choosing the primary key is an important data modeling step. Choosing between the two alternatives (natural or surrogate key) is something often discussed between experts.
There are several reasons to choose for one or for the other.
Some people encourage to choose one alternative for the whole data model, such as 'always use natural keys' or 'always introduce a surrogate key' and discussions about that often tend to be 'religious'.
In this paper, I will study all the pros and cons that I've seen given as a reason for the choice. I will take each one, give an example on a simple case, try to conclude for a 'rule' that helps to decide, and point to the consequences.
Because the reader may not want to read one more paper on natural vs. surrogate key, I'll first summarize my conclusions:
- there is no general rule, neither 'always use natural keys', nor 'always introduce surrogate keys'
- the choice must be done on a case by case basis: it depends on the table, the volume, the use case, the performance criteria, etc.
- the choice may be straightforward, or can necessitate a compromise (usually between performance and maintainability)
- when the choice is done, several Oracle features can help to limit the drawbacks of the chosen alternative.
- the difficulty to identify a natural key may come from an incomplete functional analysis, and a surrogate key sometimes hides a business concept that was forgotten.
We will analyze each of the arguments on several cases related with a simple data model, showing the pros and cons of the 2 alternatives, and the conclusion will help to understand the advantages or disadvantages, in order to do the right choice in front of a specific situation.
From the functional specification to the physical design.
The functional analysis describes the objects that are manipulated by the business: Business Objects - in OOA (Object Oriented Analysis) - or Entities .
They are modeled as a Computer Independent Model: the Domain Model - in OOA (Object Oriented Analysis) - or the Conceptual Data Model.
The design will transform those specification to an IT specification, the Platform Independent Model: the Design Class Diagram - in OOD (Object Oriented Design) - or the Logical Data Model.
The implementation of that model within a relational database transforms it to the Platform Specific Model: the Physical Data Model (and the Object-Relational Mapping in OOD)
Business Entities (Business Objects or Business Concepts)
The business object specification gives a definition of the objects, their attributes, their relationsips.
The definition may include the attributes that identify an instance of the business concept, and those identifiers are usually be part of the business object definition.
Those business concepts become entities in the logical model (or UML Classes in design model).
The identifying attributes become the candidate key(s).
When implementing that model in relational database, we need to define:
- one primary key per table (a table implements entities and relationships)
- alternate keys to enforce uniqueness for other candidate keys
- foreign keys referencing a primary key to enforce referential integrity
Some definitions about keys are detailed in the following article
The chosen primary key can be either:
- one of the candidate keys (coming from the identifying attributes).
It is often called 'natural key' but we can also encounter the terms:
'business key' as it is composed of business attributes
'intelligent key','meaningful key' as it has a meaning for end-users
- or one attribute created at design, generated at insert time, for each row, if we don't want to use a candidate key to identify rows and enforce referential integrity.
It is often called 'surrogate key' because it replaces a natural key, but we can also encounter the terms:
'technical key','system-generated key' as it is introduced by the implementation
'non-intelligent key','meaningless key' as it has no meaning for the end-user
'internal key' as it has no meaning outside the system
'synthetic key','artificial key' as opposed to natural key
A natural key
is a business concept that is used technically to implement referential integrity in relational databases.
It is unique, always defined, and immutable, all that within the scope of the business domain (must be the same for the same object across several databases or applications) .
A surrogate key
is a technical value only, that have no meaning to business, and is not visible to them.
It is unique, always defined, and immutable, all that within the scope of the IT system (cannot be used across several databases or applications).
Tables that implement the relationships (many-to-may) includes the primary keys from both tables, and the whole is a composite primary key for the association table.
An analysis on different use cases is done in the full paper
The summary of conclusion
are chosen when:
1. There are business attributes that identifies the business object instance
2. Those attributes are known during the whole business object life cycle (from its creation)
3. Those attributes values do not change during the whole business object life cycle - or it is exceptional.
4. Those attributes are used to access to specific business objects (and then are used in indexes, partition keys)
5. For each business object instance, the value of those attributes has a meaning for business.
6. The data has to be exchanged among different system.
7. We don't expect future evolution on the business object definition (and the key definition).
8. The key is composed with few attributes (5 starts to be high)
are chosen when:
1. No business attributes identifies the business concept, and business do not need an identifier.
2. The business identifier is not static during the object life cycle
3. The business identifier values must also include special technical (non-business) values
4. The functional specification cannot be made stable enough to limit future evolution of the business key
5. The surrogate key will never be visible outside if the system (in prints, reports, external systems, datawarehouses, etc)
6. There is no critical performance requirements to query that object
This is a case-by-case analysis, and some compromises may have to be done.
If a compromise has to be done, some RDBMS features can be implemented to limit the drawbacks
If surrogate key seems to be necessary, it is often a good idea to iterate on the functional specifications to be sure that it does not hide a missed business concept.
The goal the document was to list all concerns related with the choice of natural or surrogate key, so that the choice is not done by a systematic 'rule of thumb', but with a complete knowledge of the consequences.