Primary key,Foreign key

VIVEKANANDHAN_PERIASAMY
VIVEKANANDHAN_PERIASAMY used Ask the Experts™
on
What is use of primary key,foreign key, when though if there is no constraints applied to column.it's possible to join the rows.I am curious to know why it is used.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Aaron TomoskyDirector of Solutions Consulting

Commented:
It's an index so things go faster. You can generate diagrams and entity frameworks auomagically. Anyone not the db designer can see what the heck is going on...

Commented:
In the following link there are many info about:
Primary Key,
Foreign Key,
Unique Key.

Have a look please, it will give you an idea of what is what :)
My advise: Create PK and use it in FK. Without PK (with Improved Unique Index) you still need KEY CANDIDATE (with Regular Unique Index). Even query cannot help you here.

Details:

1. Primary Key (PK):
        a. It is a constraint.
        b. it is the unique identifier of the row - main purpose
        c. it has an index (without it the performance degrade very very significantly)
        d. it is a special index for performance purposes

2. Foreign Key (FK) - provides data integrity for master-detail relation
        a. It is a constraint.
        b. In relation master-detail it does not allow to have a detail record without master record.
        c. If you try to delete master record it provides different behavior of your choice:
               - it fails DELETE MASTER ROW
               - it deletes detail records of the master record, too - CASCADE DELETE.
        d. If you try to insert a detail record with the reference to not existing master - it fails


3. It is a good practice to have index (not unique) on Foreign Key for performance purpose.
4. SSMS Designer does not allow you to create FK without appropriate PK in master table.
5. T-SQL does not allow you to create FK without appropriate PK or KEY CANDIDATE in the master table.
6. You can create FK without PK by query - BUT YOU MUST HAVE key candidate - unique index.


CREATE TABLE tab1 (pk1 int NOT NULL, col1 VarChar(20))
CREATE UNIQUE INDEX tab1keycandidate ON tab1(pk1)
CREATE TABLE tab2 (pk2 int NOT NULL, fk1 int, col2 VarChar(20))
ALTER TABLE tab2 ADD CONSTRAINT fk FOREIGN KEY (fk1) REFERENCES tab1(pk1)
INSERT INTO tab1 VALUES (1,'tab1_row1')
INSERT INTO tab2 VALUES (1,1,'tab2_row1'), (2,1,'tab2_row2')
SELECT * FROM  tab1
SELECT * FROM  tab2
SELECT * FROM INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE WHERE TABLE_NAME = 'tab1' OR TABLE_NAME = 'tab2'
EXEC sys.sp_indexes @table_name = 'tab1', @table_server = 'local', @table_schema='dbo', @table_catalog = 'MedLab'
EXEC sys.sp_indexes @table_name = 'tab2', @table_server = 'local', @table_schema='dbo', @table_catalog = 'MedLab'
SELECT * FROM    INFORMATION_SCHEMA.CONSTRAINT_TABLE_USAGE WHERE TABLE_NAME = 'tab1' OR TABLE_NAME = 'tab2'

Open in new window

Acronis in Gartner 2019 MQ for datacenter backup

It is an honor to be featured in Gartner 2019 Magic Quadrant for Datacenter Backup and Recovery Solutions. Gartner’s MQ sets a high standard and earning a place on their grid is a great affirmation that Acronis is delivering on our mission to protect all data, apps, and systems.

so i have a query here, we are using primary key and foreign key for performance oriented and it's not but to speak about relationship right?


because we can join two tables without have these constraints as well.
Aaron TomoskyDirector of Solutions Consulting

Commented:
It's is a relationship that helps performance. Correct you don't NEED It to do a join.

Commented:
>What is use of primary key,foreign key if there is no constraints applied to column.
Huh? Withouth constraints, you do not haveprimary keys or foreign keys.  

For a table to be relational, it must not permit duplicate rows. The primary key constrainst is one way to guarantee that.  A table may have many candidate (unique keys) keys. From mong them, the designer declares one of them as primary.  It's almost as important to declare the others others as unique, but that step is often overlooked (and subsequently the topic of a the most pervasive of all EE database questions: "how do I get rid of duplicates?")

A foreign key constrains a column to the primary or unique key of another table. It's a declarative way to guarantee that the value in the foreign key column exists in the referenced table.    

>it's possible to join the rows.I am curious to know why it is used.

Sure, you can join on keys that are not declared; you can even join on columns that are not keys.

It's true that Primary Keys and Unique Keys generate unique indexes in virtually all databases, and some databases generate indexes for foreign keys, too.  But do not equate primary key  (a logical constraint) with the unque index (a physical structure) used to implement it.    

Top Expert 2012

Commented:
Exactly.  Primary/Foreign keys are constraints and confusing them with indexes and the related side-effect in SQL Server of improved performance, is in my view beside the point.  
dqmq:thanks for explaining!Ibut i have some more doubts,can you pls explain

 But do not equate primary key  (a logical constraint) with the unque index (a physical structure) used to implement it.    
 I am bit confused here.

If clustered will be applied on both primary and foreign key,how the data are stored in b-tree structure?will there not be any conflict while storing data?


I know practically, there is no conflict in the database.But i like to know the back ground process.
Top Expert 2012

Commented:
You are once again confusing indexes with constraints.  Forget about clustered or non-clustered (in SQL Server a primary key can be either) indexes and focus on what you are trying to achieve.
okay,then why we are using primary and foreign key?purpose?

we can join two tables without this keys definition as well.

If we are going to say, that sql will give good performance by designing the tables with these key, how the performance are improved or achieved?
Constraint - the rule that DB enforces on data. UNIQUE values in a column, for example. The table can have several columns like that.

Primary key - the column(or set of columns - composite PK) that represents unique value for every row. It is convenient to use PK to point on exact row we need.

Foreign key - the link (pointer) in master-detail relation. Several detail rows can point on the same master row.

Index - a database object that speeds up the search .
--------------
When you create PK in SSMS:
1. It sets NOT NULL to all PK columns.
2. It creates constraint UNIQUE.
3. It creates INDEX for PK.

When you create FK with validation:
1. It checks if for every DETAIL record MASTER record exists.
2. It create constraint FK.

When you delete a record from master FK constraint checks if detail record exists. If yes then either FAIL or CASCADE DELETE (removes corresponding detail records).

When you insert/update a record in the DETAIL FK checks if references master record exists. If not - insert/update fails.




Olaf DoschkeSoftware Developer

Commented:
Regarding the topic of primary keys and clustered index, normally the clustered index would be on  asecondary key field (or a combination of such fields), because if the PK column in one of the two most often used primary keys 1. an integer identity (autoincrementing number) or a uniqueidentifier.

Clustering records bsed on an integer PK makes more sense, as sql server then does store records in order of that integer, while for a heap table, with no clustered index, sql server stores records, where it finds enough place, which in a fragmented database means not in chronological order, so data gets scattered more and more.

Clustering by uniqueidentifiers results in a random order of records which contradict the goal of a clustered index and should therefore be avoided.

Or from the point of view of clustered indexes: The reason there can only be one clustered index is, that there can only be one physical order of data, the nodes of the clustered index b-tree are the records themselves, not just a pointer to the record, as in other indexes. So a clustered index is best put on fields grouping records that you would query as a group, eg a clustered index on a foreign key is useful in that respect, as you most often will query the group of records belonging to a parent record, but it's also true that most of the time the records with the same foreign key are in a sequence of pk, when using integers.

It's recommended to use a combination of fields for a clustered index, that could be used as a primary key, but are having a sorting effect in sorting together physically what will likely be queried together. But that's just the perspective of the clustered index. For simple joins you can of course profit from simple indexes on one field, too.

Thinking about the clustered index is rather thinking about how you'd like to order your data physically. In many cases eg with integer identity fields, the chronlogical order, in which records are stored with a clustered index on the pk integer is not the badest, typically you query newest data and older data get's less interesting and the data is sorted that way the more frequently used data is at the end of the table, together, not spread along the whole table.

In the end this helps keeping the need to move hdd heads less, in the perspective of multiple users, large RAM, SDD etc. this get's less important, as that really is just a problem of the hdd performance, with random access memory the random access does not have a seek time to move a read head somewhere, with multiple users, what is more important is a hdd controller putting requests for disc sectors into an order for best read (or write) performance.

Coming back to clustered index, the reason it should be unique is, it will be made unique anyway, if double key values come in, sql server adds a uniquifier integer value for a simple reason: The records have to be in a certain physical order, if keys are non unique theoretically two records would be stored into the same node of the index tree, and sql server prevents that, all records are therefore in a order.

You actually don't need to be too afraid of not having a fully unique combination of fields despite the primary key, if you eg have a persons lastname, firstname and dob, this does not need to be unique in a large dataset, but is already sorting good enough to let sql server add that uniquifier when needed, which you can think of as the sort order within the node of all records with the same index value. A clustered index degrades in perfomance the less good the uniqueness of the combined fields of it is.

Clustered indexes don't need to follow the rules of how you would normally sort data, it can also be fruitful to eg sort firstname with higher relevance than lastname, it really depends on what physical record order your queries will profit better.

It can even be good to have parts of the data double with different clustered indexes, eg in a social network the user data sorted by most frequent use of the site and by how prominent a user is (how often his profile is visited) may be two valid sort orders you may want or need. That can perhaps also be solved by more tables, to have more clustered indexes, eg instead of only a single user table, also have a [frequent users] and [prominent users] table with the last two being 1:1 related tables towards users, just having the main users.id and perhaps nickname or some other side info like their logincount or visits.

The clustered index used in that way does actually does not have the focus on joins only, for these the rule of thumb (for me) is, to have a normal index on both PK and FK, if I don't already have a clustered index on the PK for not having a better purpose of it.

Bye, Olaf.
Top Expert 2012

Commented:
>>If we are going to say, that sql will give good performance by designing the tables with these key, how the performance are improved or achieved? <<
No.  I did not say that.  In general CONSTRAINTS are there to ensure the integrity of your data and they have nothing to do with performance.
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<<okay,then why we are using primary and foreign key?purpose?>>
The purpose of a logical primary key is to insure distinguishability between tuples.  The purpose of a physical key as implemented in SQL Server is to have a mechanism implementing the logical primary key.   A logical primary key is a subset of attributes selected over another subset of attributes (candidate keys) to meet best the following criteria:

> Uniqueness
> Irreducibility
> Stability
> Familiarity


To understand this answer more thoughroughly, you need to understand what is the relational model.  Here is some reading for you:

http://www.amazon.com/dp/0321399420?tag=databasede095-20&camp=14573&creative=327641&linkCode=as1&creativeASIN=0321399420&adid=0C5260RV38257PDNK1MR&

Also, primary key purpose has nothing to do with either indexes or physical layer.
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<<In general CONSTRAINTS are there to ensure the integrity of your data and they have nothing to do with performance.>>
Not in general.: Always.  Constraints are logical mechanism to insure domain specialization in the declarative perspective of relational model.  And you are right: it has nothing to do with the physical layer (indexes, tables etc).  Some particular constraints can be implemented as indexes, others as check or default values so forth...

This is a typical case of confusion between the logical and physical layer in relational model.
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<You actually don't need to be too afraid of not having a fully unique combination of fields despite the primary key, if you eg have a persons lastname, firstname and dob, this does not need to be unique in a large dataset, but is already sorting good enough to let sql server add that uniquifier when needed, which you can think of as the sort order within the node of all records with the same index value. A clustered index degrades in perfomance the less good the uniqueness of the combined fields of it is.>>
Duplicates ? Result correctness ? Logical Inferences ?  Relational Model ? Ever heard of that ?

Doing data modeling according to physical criterias such as response time and/or physical data sorting is just another recipe for disaster.
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<<It can even be good to have parts of the data double with different clustered indexes>>
This is both wrong and misleading. Data redundancy is never a solution, only a problem.

Duplicates are one of the primary causes of incorrect results, performance overhead.
Olaf DoschkeSoftware Developer

Commented:
Racimo, you're cutting statements out of their context.

In regard to clustered indexes, if you create it without the UNIQUE constraint, sql server will take care of making the clustered index nodes unique, that's why I said you don't have to be afraid of not being able to specify a field combination guaranteeing unique tuples. If you don't agree to that, you have not understood the way sql server manages clustered indexes.

You also haven't quite understood the example I gave to create two side tables to have two different clustered indexes based on computed values like login count and visits count. I can agree to the general observation, that duplicates are prmary causes of incorrect results, but all I sugegsted to duplicate is the primary key. login and visit counts can be wrongly updated, but that could a) be corrected by recomputing these and b) would only invalidate unimportant rankings, eg it would still be possible to determine the 1000 most frequent users or most visited and maybe be wrong about 5% of them. This would still allow to keep almost the correct amount data with better availibilty through whatever mechanisms with the only disadvantage of 5% of them still being served with less performant query responses. Doesn't matter in that case.

I made a very lengthy offspin in clustered indexes, in general I can agree to being criticised for that, but let VIVEKANANDHAN_PERIASAMY decide, if that is of value to him.

Bye, Olaf.
Olaf DoschkeSoftware Developer

Commented:
As a very short answer to the original question:

1. The aspect of documentation and automatisms
If you don't just define primary and foreign IDs, but also specify the constraints accordingly, you document the database, you enable reverse engineering the database model, you enable sql server to automatically create joins in query designers and in ERM diagrams. You also enable to define relational integrity and let the database watch over that on it's own, no matter what clients access and contribute to the database.
 
2. aspect of natural vs surrogate keys.
A field dedicated to just act as a key value without itself contributing any meaning to the rest of the record is called surrogate, and while that word 'surrogate' has a bad reputation in regard of eg food, this is best practice in database design for being less impeding of data entry, updates and also schema changes, see http://www.techrepublic.com/article/the-great-primary-key-debate/1045050, especially see the table at the end of the article.

To use primary key values as foreign keys to enable a join is just following from the definition of foreign keys, you can't actually use any other value to specify the parent record you want to reference via foreign key. And this is making the impeding constraint (even without any technical constraint) in changing the actual data, as that results in the need to cascade all changes of values contributing to a natural primary key to foreign key fields of child tables.

The last aspect alone should easily make you choose surrogate keys as the better primary keys. The need of primary keys should be self evident. If you actually also want some field with natural data in it to be unique, a surrogate key doesn't hinder you to do that on top of the surrogate key, you can define secondary keys as many as you like to quarantee uniqueness of eg mail adresses, nicknames or whatever, still those are likely to change and thus would be a bad choice for use as foreign key in regard of the cascade of foreign key updates needed to propagate primary key changes.

That said I can link to my toughts about clustered indexes: As they really define the physical order of data, you actually could rather define them via a combination of fields that would define a natural primary key, as srting data that way is more likely to put data physically close to each other, that you also query together.

Bye, Olaf.
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<<In regard to clustered indexes, if you create it without the UNIQUE constraint, sql server will take care of making the clustered index nodes unique, that's why I said you don't have to be afraid of not being able to specify a field combination guaranteeing unique tuples. If you don't agree to that, you have not understood the way sql server manages clustered indexes.>>
If you define logical primary keys according to SQL Server's indexing limitations that simply means you do not understand what is the purpose of a primary key.  You are confusing the logical and physical layer.  The logical determines the physical implementation not the opposite.  

The structure of the indexing on direct image system (SQL Server, Oracle, DB2...) can implement any logical or surrogate key with any performance requirement , due to SQL Server limitation.  

> If only a familiar natural key is to be implemented, a simple unique non clustered OR unique clustered index (or unique constraint) can be placed on the subset of columns identifying the row.For  random accesses,unique non clustered indexes is adequate.  For Sequential Accesses, clustered unique indexes (or constraints) are to be used to facilitate pre sort operations.
> If a surrogate key is to be additionally implemented, due to lack of Familiarity of the Natural Key, then
both a unique index is to be implemented on the natural key for logical distinguishability and
a unique index on the surrogate key to guarantee 1:1 cardinality between the natural and surrogate key.  In this case, surrogate keys can be implemented as unique clustered indexes on increment columns (ex: identity) associated with unique non clustered indexes on the natural key.

Additionally, a deterministic surrogate key, such as a formula or concatenation of natural key columns and can be implemented as a simple materialized column whereas a non deterministic surrogate key, such as a counter can be implemented as an identity.

<<You also haven't quite understood the example I gave to create two side tables to have two different clustered indexes based on computed values like login count and visits count. >>
What I understand is that you claim that derived computed values could actually make a decent a primary key, without even meeting the basic criteria of Stability for any logical key.  You simply do not understand what is a logical primary key.
Olaf DoschkeSoftware Developer

Commented:
Racimo, you're not understanding that a clustered index has nothing to do with a primary key at all.

Bye, Olaf.
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<<Racimo, you're not understanding that a clustered index has nothing to do with a primary key at all.>>

A clustered index is simply the only mechanism by which a direct image system can actually implement uniqueness.  Independence between the logical and physical layer does not mean that the physical implementation is not determined by the logical.

Since you ignore the fundamentals of relational modeling, I see it impossible for you to understand why and where your statement are false.  Sorry but I can not tell more except that we should agree to disagree and move on.

Olaf DoschkeSoftware Developer

Commented:
<<a clustered index is simply the only mechanism by which a direct image system can actually implement uniquenes>>
No, not at all.

I do understand the principles of relational modeling, you're judging me from this single positing and your misunderstanding of the point I want to make. This could surely result from me not getting my points through, but you're also just reading some keywords signaling misunderstanding to you.

I was making an off topic discussion of clustered indexes as they surely are related to primary keys, especially in regard of performance, which is what VIVEKANANDHAN_PERIASAMY adresses in another related question about joins, but a primary key, primary index or constraint is not based on a clustered index and vice versa.

I agree, though, that we should not waste our time in teaching each other, as that's pointless at this point and not helping VIVEKANANDHAN_PERIASAMY.

Bye, Olaf.
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<<I do understand the principles of relational modeling>>
Anybody who has a read a single book from Date, Darwen, McGoveran, Fabian Pascal on relational modeling can realize how absurd and misleading some of your claims sound.

<<I was making an off topic discussion of clustered indexes as they surely are related to primary keys, especially in regard of performance, which is what VIVEKANANDHAN_PERIASAMY adresses in another related question about joins, but a primary key, primary index or constraint is not based on a clustered index and vice versa.>>
Confusion.  Further confusion.

<<I agree, though, that we should not waste our time in teaching each other, as that's pointless at this point and not helping VIVEKANANDHAN_PERIASAMY.>>
You don't get it, don't you ?  Given what you believe as true and wrote, there is no way you can teach anything anybody about database modeling.  You need to read a book about relational modeling and you will realize how wrong you are.
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<< <<a clustered index is simply the only mechanism by which a direct image system can actually implement uniquenes>>
No, not at all.

--> OK so what other mechanism insures uniquenesss on SQL Server.  A unique constraint always relies on an underlying index.  AIf you claim there are other mechanisms to do that, then it will be easy for you to say which one.
Aaron ShiloChief Database Architect

Commented:
hi

first of all yes you can allways join two tables regardless of an existence of a PK or FK.
all you need is the datatype to be the same and ofcourse the data should match.

you would normaly use PK,FK,UQ,CK,DF,NN when you want to secure your database
and generate a NORMALIZED erd in your rdbms.

Commented:
--> OK so what other mechanism insures uniquenesss on SQL Server.
I can think of several, but none that are as convenient as Unique, non-clustered, non-null index or triggers.
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<<I can think of several,>>
Please read the entire thread.  We are talking precisely about physical primary keys,not uniqueness in general that can be implemented using check constraints or other mechanisms.  
Olaf DoschkeSoftware Developer

Commented:
Racimo,

you can define a primary key constraint on a table without a clustered index (heap table). That's the only thing you need to know that MS SQL Server does not need a clustered index to provide or check the primary key constraint.

On the other hand: There can only be one clustered index per table, as that physically sorts records and they can only be in one order. But you don't need a unique index to specify it as the clustered index.

So also from that perspectives there is no technical link between primary key and clustered index, or I have a misinterpretation of how clustered translates into german.

Bye, Olaf.
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<<you can define a primary key constraint on a table without a clustered index (heap table). That's the only thing you need to know that MS SQL Server does not need a clustered index to provide or check the primary key constraint.
>>
Only you claim that a primary key constraint has to use a clustered index.   A primary key constraint always relies on a clustered OR nonclustered index associated with a referencing/dependency mechanism.

As BOL states...

"a unique index is just an index, whereas a unique constraint is a unique index that's listed as a constraint object in the database. "

and as Paul Randal says:

There are no significant differences between creating a UNIQUE constraint and creating a unique index independent of a constraint. Data validation occurs in the same manner and the query optimizer does not differentiate between a unique index created by a constraint or manually created.

Paul Randal
Dev Lead, Microsoft SQL Server Storage Engine

<<So also from that perspectives there is no technical link between primary key and clustered index, or I have a misinterpretation of how clustered translates into german.>>
See above.  I somehow believe that your problem has little to do with language.

Commented:
Racimo,
You are right, of course.  But your comments understandably incite a little push back:

"A primary key constraint always relies on a clustered OR nonclustered index associated with a referencing/dependency mechanism."

"The logical determines the physical implementation not the opposite. "
 
"A clustered index is simply the only mechanism by which a direct image system can actually implement uniqueness. "

"OK so what other [than clustered index] mechanism insures uniquenesss on SQL Server."

"We are talking precisely about physical primary keys,not uniqueness in general that can be implemented using check constraints or other mechanisms."  
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<< But your comments understandably incite a little push back>>
OK perhaps some of my statements could have been more precise, but I am surprised that you push back harder on valid things than on the  misleading fallacies posted on this thread.

The claims, for instance, that there would be no relationship between a primary key constraint and clustered/nonclustered indexes (though the two are separate logical and physical concepts).  The suggestion for instance that dupplicates could actually be good practice.

Perhaps you are correct that this is a translation issue...
....

Regards....

Commented:
>The suggestion for instance that dupplicates could actually be good practice.

As long as the logical model is not compromised, I have little concern about duplication (derived tables in the physical design) that achieves a performance goal. Any index, after all, amounts to duplication of a sort.

I cannot speak to whether the physical construct envisioned to support a second clustered index is worthwhile, but if it is, I have no objection, provided it's implemented in such a way that maintains the data integrity conveyed by the logical model.  

You've said that "logical determines physical".  I want to agree, but it cannot mean that for a given logical design there is one-and-only-one physical design that follows (as in "deterministically").  Rather, I think it means, that the logical comes first and that the physical should faithfully represent the logical upon which it is founded.






Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<<As long as the logical model is not compromised, I have little concern about duplication (derived tables in the physical design) that achieves a performance goal. Any index, after all, amounts to duplication of a sort.>>
In relational modeling, duplication has established meaning that is different from physical redundancy that is acceptable for performance reasons.  Logical or Physical , dupplicates are always a poor idea, especially in the context of primary keys.

<<I have no objection, provided it's implemented in such a way that maintains the data integrity conveyed by the logical model>>
Chances to maintain data integrity without a properly implemented logical key is nill.

<<ou've said that "logical determines physical".  I want to agree, but it cannot mean that for a given logical design there is one-and-only-one physical design that follows (as in "deterministically"). >>
Nobody said that there is only one physical design for each logical design.  In fact, I said the opposite since I proposed many cases of physical implementations based on performance and familiarity requirements.  The use of the deterministic term exclusively refers to the way a surrogate can be implemented.

The logical should always determine the physical.  Having the physical running around by itself is simply a recipe for disaster.
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<<the logical comes first and that the physical should faithfully represent the logical upon which it is founded.>>
Just as a side note, need for implementing surrogate key is a part of logical design since it has to do with the fourth criteria that must be met by a logical primary key: Familiarity.

Commented:
>>Just as a side note, need for implementing surrogate key is a part of logical design since it has to do with the fourth criteria that must be met by a logical primary key: Familiarity

On that point, I beg to disagree.  First, because a surrogate key, by defintion, is not familiar, and, by good practice, should not become so. When an alternate key (usually single-column, unique, non-null, identity) is needed in the logical design, I contend that is the discovery of a formerly missing or unrecognized natural key. And the day will not be far away that it evolves to convey business meaning.

I hold to a fairly strict definition of surrogate key: a meaningless key introduced in the physical design for programming convenience and/or improved performance.


   
 
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<<First, because a surrogate key, by defintion, is not familiar, and, by good practice, should not become so. >>
Date disagrees with you.  He proved that this requirement breaks the Information Principle.

<<I hold to a fairly strict definition of surrogate key: a meaningless key introduced in the physical design for programming convenience and/or improved performance.>>
Where do you hold a definition that is in contradiction with relational theorists definition (Codd, Date, Darwen, MacGoverhan, Pascal...) ?.  Surrogate Keys indeed belong to the Logical Layer but can be implemented physically.  

Codd on User Controlled Keys (old name of Surrogate Keys)
In the paper ACM Transactions on Database Systems, Vol. 4, No. 4, December 1979 pages 409 - 410 Codd introduces the concept of a surrogate key, the key points of his text

Two relations may have user-controlled keys defined on distinct domains (e.g., one uses social security, while the other uses employee serial number) and yet the entities denoted are the same.

Date on Surrogate Keys...(aka as user controlled keys)
From C. J. Date's book "An Introduction to Database Systems, C J Date, 8th Edition":
Page 434, reference 14.21 - he refers to P. Hall, J. Owlett and S J P Todd "Relations and Entities" and says:

Surrogate Keys are keys in the usual Relational sense but have the following specific properties:

> They always involve exactly one attribute. --> Surrogate concern relations not physical tables
>Their values serve solely as surrogate (hence the name) for the entities they stand for ......
> When a new entity is inserted into the database, it is given a surrogate key value that has never been used before and will never be used again, even if the entity in question is subsequently deleted.
> Ideally surrogate keys value would be system-generated.
> Surrogates must not be concealed from the user because of the Information Principle- tuple ID's should be however.

Commented:
When Codd introduced surrogate keys, he distiquished those that were "user-controlled" and DO belong in the logical model from and those that were "system-generated" do not.  Of course, he did not say it that way; what he said was:

"..Database users may cause the system to generate or delete a surrogate, but they have no control over its value, nor is its value ever displayed to them ..."(Dr. Codd in ACM TODS, pp 409-410) and Codd, E.  (1979), Extending the database relational model to capture more meaning.  ACM Transactions on Database Systems, 4(4).  pp.  397-434.

He used SSN and employee number (and the sentence you quoted above) to illustrate how user-controlled surrogates are problematic when they are exposed. It is wrong to infer from this that he meant SSN or employee number should be concealed.  And that was Date's point--you must expose them, else violate the ever-so-imporant Information Principal.  Rather, he meant system-generated surrogates are a better alternative because they do not exhibit the same problems.

I don't know if Date addressed system-generated surrogates with respect to the Information Principal, but it only becomes an issue if one insists on including them in the logical model.  The point is, in the physical model system generated surrogates are meaningless--they are not information--and the information principal does not apply.  Just as it does not apply to indexes and other physical constructs that are not exposed to users.  

I believe in common usage today, "surrogate key" refers only the system generated type.
(For sure, that's my usage of it).   SSN and employee number, for example, certainly belong on the logical model, they just do not belong in my definition of "surrogate key".  To continue by example, a system-generated ID column in the same table does conform to my definition, carries no business meaning, should not be revealed, and there's really not much to be said about it in logical design.    


Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<< don't know if Date addressed system-generated surrogates with respect to the Information Principal, but it only becomes an issue if one insists on including them in the logical model. >>
He did.  

Date established specific terminology to distinguish between surrogate keys and tuple ID.  The former should not be concealed from the user since it carries business meaning while the latter is internal, system generated and hidden from users.  

As he says clearly:

In a nutshell: Surrogates are a model concept; tuple IDs are an implementation concept

Hence, surrogate key do remain in logical layer to be operated relationally.  Putting them in the physical layer results in a lot of confusion between the two layers.  
After reading some books , basic concept was

primary key and foreign are used for maintaining database integrity and there is nothing to do with index or physical structure of database  or do with joining as well.
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist
Commented:
Exactly.  Primary keys are logical concepts at fundamental level.  They are primarily physically implemented as unique indexes or unique constraints in SQL Server.
conclusion is primary and foreign key are used for database integrity
Top Expert 2012

Commented:
>>After reading some books , basic concept was <<
Yes, if you had been paying attention I believe that was mentioned nearly 2 weeks ago in this thread.

Commented:
I agree, Primary Key and Foreign Key are logical concepts and are not required for joining.

However, in SQL Server (and most other SQL databases, for that matter), declaring a Primary Key Constraint creates a corresponding unique index in the physical database.  While Primary Key (logical) and Unique Index (physical) are separate concepts, DDL syntax tends to blur the distinction.



 
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<<DDL syntax tends to blur the distinction.>>
Exactly.  

And because a faulty technological implementation can induce confusion, logical clear concepts are not to be redefined.  That is why logical should drive the physical.  The opposite is simply a recipe for disaster.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial