Link to home
Start Free TrialLog in
Avatar of silo77
silo77

asked on

Best DB schema for city/county/state

I am trying to figure out the pros and cons of two different schemas for city/county/state

the first:

city -> foreign keyed to -> county -> foreigned keyed to -> state
------
city.county_id  references county.id
county.state_id references state_id

the second:

     --> foreign keyed -> state
city
    --> foreign keyed ->  county
------
city.state_id references state_id
city.county_id  references county.id


lookups will primarily be done with city/state.  Any suggestions of this would be appreciated


Avatar of silo77
silo77

ASKER

typo:

the first:

city -> foreign keyed to -> county -> foreigned keyed to -> state
------
city.county_id  references county.id
county.state_id references state.id

the second:

     --> foreign keyed -> state
city
    --> foreign keyed ->  county
------
city.state_id references state.id
city.county_id  references county.id
Avatar of Member_2_2484401
Hmmm .... One state contains many counties, and one county contains many cities.

Therefore, it seems logical that the County table would contain a foreign-key to the State table, and the City table would contain a foreign-key to County.

State
-----
stateID (PK)
stateName

County
--------
countyID (PK)
countyName
stateID (FK)

City
----
cityID (PK)
cityName
countyID (FK)

HTH,
DaveSlash
it may seem logical dave's way, but if you are saying that lookups will primarily be done by city/state, then you may want to go with the second one:


City
----
cityID (PK)
cityName
countyID (FK)
stateID (FK)

County
--------
countyID (PK)
countyName

State
------
stateID (PK)
stateName


you could also make County have a key to the State table, just incase, so have it like:
City
----
cityID (PK)
cityName
countyID (FK)
stateID (FK)

County
--------
countyID (PK)
countyName
stateID (FK)

State
------
stateID (PK)
stateName
If you search using both city and state it's best to have all the information in the city table.  Like:

State
-----
stateID (PK)
stateName

County
------
countyID (PK)
countyName
stateID

City
-----
cityID (PK)
cityName
stateName
stateID (FK)
countyID (FK)

That way you can fill your address record querying only the city table.  The con is a little extra storage but with a considerable gain in performance and simplicity.  Also notice using the same name for the stateID in all tables makes it more clear and will help you auto-relate tables in many applications.
if you go with one of my ways than city/state lookups will be faster than with dave's way, because for dave's way you would need to join all 3 tables, but for my way you would only need 2.
Avatar of silo77

ASKER

Thanks for all the feedback - I orignally spec'd it out exactly like Dave suggested, but then realized the performance boost that could be achieved using the method described by wertyk.

My question is, in regards to the solutions where city links to county and state and county links to state  .. isnt that considered poor normalization since there are overlapping keys??  That was always the impression I had

I am still torn, we are a ral estate site, and our primary db hits are address and user info - so I may go with wetyk solution.  However, I dont like the concept of the county table have non-unique entries and it being a linked-to table (ie there are many counties with the same name, different state)

I have to digest this some more, and look into the performance issues .. if any.

Will keep you posted and determine which is best.

Thanks
ASKER CERTIFIED SOLUTION
Avatar of JesterToo
JesterToo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>>>   However, I dont like the concept of the county table have non-unique entries <<<

Define a unique constraint on the combination of columns that mustn't be duplicated.  For instance:

1.   in the County table, make the combination of StateName and CountyName a unique constraint.

2.   in the City table, make the combination of StateName and CityName a unique constraint.  I'd recommend this index be built with the CityName first followed by StateName.  You could also add CountyName but it probably won't be very useful.  Since these 3 tables will, for the most part, probably become static after they're built, you can afford to create plenty of indexes on them.

As these tables approach being fully populated, the nature of their use becomes less OLTP and more OLAP.
Avatar of silo77

ASKER

I had contemplated putting the county and state names right in the city table.  This would then allow for ease of query.  
There is a state.name and state.abbreviation, but I could always do a lookup based on the abbreviation if I want to display the full name

In theory, I wouldnt even need the county table, and the state table would just be used for populating select boxes with all the choices.

Currently I am going to do a full city/county/state import and it is looking to be about 60,000 cities (which contains about 25K of alias city names) ... hmm, the more I think about this, and the heavy use of addresses in this app, I like this approach ...
Avatar of silo77

ASKER

Anyone want to comment on this proposed schema - it uses the flatfile concept suggested by JESTER.  It uses 'code' which winds up being the state abbreviation as the foreign key between STATE/COUNTY and STATE/CITY, which reduces the overhead one step further in that I have the abbreviation right there, and can join on the code if I need the full name.  Thoughts.

TABLE state (
         code
         name
         PRIMARY KEY (code),
         UNIQUE code (code),
      UNIQUE name (name)
)


TABLE county (
      id  UNIQUE,
      name
      state_code
      FOREIGN KEY (state_code) REFERENCES state(code)
      ON DELETE RESTRICT ON UPDATE RESTRICT,
      PRIMARY KEY (id)
      unique(state_code,name)
)


TABLE city (
      id UNIQUE,
      name
      alias
      type
      county
      state_code FOREIGN KEY (state_code) REFERENCES state(code)
      ON DELETE RESTRICT ON UPDATE RESTRICT,
      PRIMARY KEY (id),
)
it looks ok db wise. Though there could be a potential problem (depending on the country) with this on setting the state code to unique. by having one state table you would be including states of different countries and some codes are unique only for one country meaning another might be using the same state code. ex washington and Western Australia that has WA as the state code. You might want to try the other suggested approaches or the following

merging the country and state
using the state name instead of the state code as the foreign key
use a state code of your own that is unique

Though if the states of all the countries that you will be including are unique when added together this wouldnt really be a problem