asked on

Multi column primary key, which column to use as leading edge?

Given SQL SERVER 2000 and table:

CREATE TABLE Person
(
CityID int not null,
PersonID bigint not null,
SomeOtherAttributes null
)
WHERE CityID is a FK to Cities table, and PersonID is only unique within a city (Off topic but FYI - the PersonId is imported from another application which does not use globally unique identifiers).

The primary key options are:
(CityId,PersonId)
or
(PersonId,CityId)

My question is which primary key to specify and why/how to figure out which primary key to specify?

Thank you!

Lowfatspread

cityid,personid maybe of use...

but what is the cardinality of the two columns (how many values )

what do you normally search with city ? or person ?

to a cetain extent it may not matter unless you are intending to cluster in the key as well...

Nightman

PersonID, CityId

As a general principle, you should have your most selective column first.

ASKER CERTIFIED SOLUTION

lahousden

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

mjmarlow

ASKER

Some of my more complex queries include a "CityId" qualifier in the join or where clause. I noticed that SQL Server index optimizer recommended then added an Index "CityId" on my Person table. I wonder if I would spare the extra index overhead if i used CityId as the leading edge instead (of following the book of using most selective column).

Nightman

That suggests that most of your queries are by CityId instead of PersonId. If that is the case, CityId would be the better option, as explained by lahousden

mjmarlow

ASKER

Lowfatspread:

Cardinality:
Persons: 705151
Cities: 159

I often search for a subset of (5-1000) persons who are isolated by City and presence in another table via a join on CityId and PersonId.

I do not intend on clustering the key, but from what i have read, clustering is done automatically if there are no other clustered indexes specified.