How Well My Table is Normalized? - Relational Database

Posted on 2009-12-26
Last Modified: 2012-06-22
I posted two table DDLs - Motel and MotelRoom. These tables are intended to be normalized up to 3rd normal form.

Regarding Motel:

I had Country, City and Province each seperated in its own table and then
i just put the foreign key CountryID, ProvinceID and CityID in the Motel's table. However,
an expert long time ago didn't recommend to do it that way because it just adds an overhead when you want to query the data. I'm not convinced yet since his recommendation doesn't fulfill the normalization process. I don't know i might be wrong. Any idea?

Regarding MotelRoom:

I had roomType and bedType each in its own table and then i just added
the foreign key roomTypeID and bedTypeID in MotelRoom. However, one
of the dba's suggest me to put rid of roomType and bedType tables and
put these two as columns with CHECK constraints. I'm kind of confused
now, does this fullfill normalization process?



        motelID     NUMBER(7)    ,  CONSTRAINT motel_motelid_pk     PRIMARY KEY(motelID),

        branchname  VARCHAR2(50)    CONSTRAINT motel_branchname_nn  NOT NULL            ,

        numofrooms  NUMBER(10)      CONSTRAINT motel_numofrooms_nn  NOT NULL            ,

        address1    VARCHAR2(100)   CONSTRAINT motel_address1_nn    NOT NULL            ,

        address2    VARCHAR2(100)                                                       ,

        city        VARCHAR2(50)    CONSTRAINT motel_city_nn        NOT NULL            ,

        province    VARCHAR2(50)    CONSTRAINT motel_province_nn    NOT NULL            ,

        postalcode  VARCHAR2(50)    CONSTRAINT motel_postalcode_nn  NOT NULL            ,

        country     VARCHAR2(50)    CONSTRAINT motel_country_nn     NOT NULL





        roomID       NUMBER(7),     CONSTRAINT motelroom_roomid_pk      PRIMARY KEY(roomID)                                                      ,

        motelID      NUMBER(7),     CONSTRAINT motelRoom_motelid_fk     FOREIGN KEY(motelID) REFERENCES Motel(motelID)                            ,

        roomtype     CHAR(3)        CONSTRAINT motelroom_roomtype_nn    NOT NULL                                                                  ,

                                    CONSTRAINT motelroom_roomtype_ck    CHECK(roomtype IN('STD', 'MOD', 'SUP', 'DLX', 'ROH', 'JST', 'STE', 'STU')),

        bedtype      CHAR(1)        CONSTRAINT motelroom_bedtype_nn     NOT NULL                                                                  ,

                                    CONSTRAINT motelroom_bedtype_ck     CHECK(bedtype  IN ('K', 'C', 'Q', 'D', 'T'))                              ,

        price        NUMBER(6,2)    CONSTRAINT motelroom_price_nn       NOT NULL                                                                  ,

        notes        VARCHAR2(500)                                                                                                                ,

        isavailable  CHAR(1)        CONSTRAINT motelroom_isavailable_nn NOT NULL                                                                  ,

                                    CONSTRAINT motelroom_isavailable_ck CHECK(isavailable IN ('Y', 'N'))


Open in new window

Question by:F-J-K
    LVL 28

    Assisted Solution

    by:Bill Bach
    Normalization for a course exercise is VERY different from real-world database design.  If this is for a homework project, then you should follow the rules of your instructor for normalization.

    If you are doing this for a real-world application, then I agree that the separation of the City, Province and Country is probably overkill, and will result in decreased performance with little added value.  The idea behind normalization is to reduce data duplication, and thus the chance for errors.  You can certainly come up with a separate primary key for each of these fields, but it doesn't seem worthwhile.  Why not just use the name string as the primary key?  If you have no other attributes for the city, then why not eliminate the extra table altogether?  You can probably prevent spelling errors in the key during data entry by creating a separate table of city names first, but the performance degradation needed to do an additional table lookup all the time wouldn't be worth it.

    As for the check constraints, I think I weighed in on this topic last time -- you must ask yourself how frequently will the room types and bed types be changing?  If they will be static for a long time, then you can certainly use the check constraint.  However, I still believe that the "price" attribute will be based not only on the room number, but it will ALSO be based on the room type.  In fact, I'd argue that the price depends MORE on the room type than on anything else, since a Suite will usually cost more than a standard room, regardless of the individual room number.  Secondly, when the pricing for a suite changes, do you want to update ALL of the individual ROOM records that have a STE roomtype, or do you just want to update the ONE record in the roomtype table for the Suite?

    I can imagine that BedType won't change that much, and pricing may not be based on this attrbute at all, so this should be fine as a check constraint.
    LVL 51

    Assisted Solution

    by:Mark Wills
    Hi, tend to agree with BillBach.

    There is a phrase I use a lot it is "PNF" which stands for practical normal form and represents real life scenarios whenever the purists start talking database normalisation.

    There is a catch - the educational institutions do not yet recognise the official term :)

    When designing tables need to think in terms of entity and attributes. Which attributes are grouped to form an entity.

    For example, MotelRoom might have a few "fixed" attributes such as windows, balconies and such like. However they can often be configured differently with furnishings such as King Bed twins etc. Those type of configurations might be best represented as RoomType where a "Suite" has certain characteristics, differently from "King Room" etc. In that case I would e inclined to keep those "set" of attributes as a seperate table rather than a constraint list within "MotelRoom"

    That would also apply to BillBach's example of price - quite likely another table because of characteristics attached to price (such as seasonality) and most likely determined by roomtype.

    So, the key to "PNF" is the logical grouping of attributes that belong together in the one spot. So, for things like "Motel" there is only one address for a physical location, so it is fair and reasonable to hold that in the Motel table. However, there could be multiple locations within the chain, so, each "Motel" might then belong to a "Chain" etc...

    Always start with the indiviual pieces of information you need, then look how they interrelate. One thing we need to do is avoid repeating information, and if you find yourself repeating the same data (ie content of a field) over multiple rows then that should be your "warning bell" and a key indicator that a new data relationship is required.

    So in your design so far, Would indicate that RoomType does warrant its own table rather than a check constraint, then maybe the bedtype could be a list of bed configurations available for that roomtype - and because it might depend on the room type (e.g. King v Single) then also suggests a table.

    Motel is then the physical location. MotelRoom is similarly a physical space within Motel. RoomType is a set of room configurations for the Motel of which a MotelRoom can use. Starts to make more sense when you construct a sentence...
    LVL 15

    Assisted Solution

    by:Franck Pachot

    If you refer to the normalization theory, just follow the rules: identify the candidate keys, and apply normal forms.

    If this is a real world problem, the basic idea of normalisation is to avoid tables that group different meanings.
    And the main reason for that is:
     - if a table covers different meanings, there may be integrity issues (because updating one row may need to update several other rows, deleting one row may delete more information than required, ...)
    However, some of these anomalies can be prevented by check constraints (but not all of them) in SQL databases.

     - if a table implements different business meanings, then you may encounter issues in future application evolution.

    So in your exemple, If an address (address lines, city, country,...) has its own menaing in your system, then you should have it in a separate table. That will be the case if - for example:
     - you need to manage addresses even if there is no motel
     - you need to query motels by city name. Then you probably want to avoid typos in city name, and check the postalcode->city dependency (thus having a cities table)
     - ...

    About roomtype and bedtype, it is the same idea: if they have their own meaning (for example, one day you will need to have a description field, or your user interface must show a list of possible values) then they should have their own tables. If they are just attributes of MotelRoom and have no meaning by themselves, then you can leave them as check constraints.

    This is my non-mathematical way to see normalization ;)

    LVL 22

    Assisted Solution

    Regarding your question about CountryID, ProvinceID, roomTypeID, etc. I assume these are meant to be surrogate keys. Adding surrogate keys to your model has nothing to do with normalization per se. Nor do check constraints that simply define the domain of values for an attribute. Normalization is about analyzing dependencies and eliminating redundancy. The actual type of the attributes concerned (numeric, varchar, etc) is irrelevant.
    LVL 47

    Assisted Solution

    Everything depends on the volume of the data.
    Also the expected response time is very important factor.

    Case 1.

    You are asked to design the tables for fast insert of the data and short volume of data (avoid redundancy!), but the extract time is not important. In this case you have to use strictly normalized tables.
    This in DW (Data Warehousing) is achieved by Snowflake model.

    Case 2.

    You are asked to do a model with fast extraction, but it is allowed to sacrify the speed of insert and updates and also waste of disk space. In this case you have to use partitially denormalized design (star schema in DW). The designers use also snapshots (materialized views) by static data - when the data changes rarely.

    So the aim of the system is the leading factor in the design.
    LVL 1

    Author Comment

    Thanks for your responses!


    "if you find yourself repeating the same data (ie content of a field) over multiple rows then that should be your "warning bell" and a key indicator that a new data relationship is required."

    This is why i was thinking to make City and Country each in its own table because if i have million records and i wanted to query e.g. all records that is in US then i will only get records that has US. USA won't be returned. Thus, during records insertions there might be some tiny inconsistency which will cost alot in the future. I might be wrong, i don't know. Anything would you like to say?
    LVL 22

    Assisted Solution

    There would be advantages and disadvantages to creating a table for Country, but it won't stop anyone creating countries for both US and USA.

    Creating a city table and using CityID in the Motel table in place of a city name has nothing to do with normalization however. There is no non-key dependency being removed here - you are just substituting one attribute for another. So it is false to think that your new design would be "more normalized" than before. That doesn't make it "wrong" of course, but I suggest you think carefully about how the City table would actually benefit you and whether it is really an advantage in this case.

    LVL 21

    Expert Comment

    lot of comments..
    Here is one file for self help.
    LVL 51

    Accepted Solution

    Addresses are always a hard example (or a good one depending on the argument)...

    Even the expert organisations do not get that right - I remember doing a job for our national postal service. While you would reckon they could get it right, alas, it was not the case. The datawarehouse covering years and years of data had 4 different representations of the one State.

    I think a big part of the design is limiting the maintenance functions to those who will ensure data integrity in the first place. That way you are less likely to get a US / USA / U.S.A / America type representations creeping in. Often (as schwertner implies) it is more a business convention or methodology that determines accuracy of data. That issue is different to database design as dportas suggests above.

    As for addresses, well I do believe a lookup table is required. But (technically) you often need to start from the bottom up, though, the user experience suggests you start at street then suburb then post/zip then state then country. How many websites have you seen where the last field is the Country and you have waded through all the inappropriate states before you get there.

    The real challenge with addresses is building the lookup table. I have tried using Postcode as a major key, but then soon find a few suburbs sharing the one postcode. Conversely, a suburb (particularly central business districts) can have multiple postcodes. And often the case that a suburb name exists in multiple states.

    So that lookup table quickly becomes a victim of "Practical Normal Form" because the independant data relationships quickly become unwieldy if we try to follow the letter of the law in 3NF and higher. So a multiplely indexed table of "Postal/Zipcodes" has postcode, suburb, state, country and we do end up with some repeated pieces of information. But the real reason for that table to exist is to ensure data integrity from a business use perspective, and is often used as a "pick list" to embed that same information back into an address block held by the owner of the address. Or, if you do have a simple surrogate key representing the combination of zip+suburb+state+country then you can achieve some practical normalisation at least within the owner of the address (ie just use that surrogate key instead of individual columns). So, if we decide to go that way, we can then contemplate building that "postal/zipcode" table - that in turn from a maintenane function would then need to point to a selection of states or known countries (for example). Probably, but what about new zoning / new suburbs / new postcodes - you would have to make sure that everyone of them exists before hand - or - accommodate the possibility of "unknown" - and that sucks when it could easily be typed in. So, maybe only a few attributes of address need to be controlled - which ones of course comes back to your business model. For a national postal service, the more controls over the more address attributes is so much more important. If we look at a typical business they might only be interested in say suburb (city) / state / country - right up until the time comes to geocode those individual sites. Which is why addresses are always an interesting case study (and dont get me started on phone numbers).

    But even with 'N'th normal form in our address design, what is to stop me from entering all of : US / USA / U.S.A / America ? I would say nothing is stopping me, other than a list that someone is held accountable for, and that is a business decision, not a design decision.

    So, "to make City and Country each in its own table", does not really give you the desired outcome as such (echoing dportas comments).

    But if the only countries that can be selected are constrained by a "controlled" list of countries, then yes you would need a table for Countries. Fortunately there are some very strong standards in place with ISO 3166 and FIPS 10-4.

    So, the design question is really how do we adopt those established standards...

    Back to the Motel Address... Let us also consider a physical address, postal address, head office address. It is possible that all three are different, or, for small boutique hotels might be one in the same. From a user perspective trying to find a place in their destination of choice then physical address is most likely. Let us also consider that physical address is unique to that motel. Where as a head office address is likely to be shared by all memers of that chain. In the latter case it might be first considered very reasonable to have a seperate address table - or - do we simply point back to the "chain" ? So the "business object" known as address has a lot of different types, but their ownership can normally be clearly defined (e.g. head office address is held by the "chain" and members point to the chain)

    And that is where design really starts - who is the real owner of data - the physical address is owned by the motel, the head office address is owned by the "motel chain" so addresses do not often repeat as it turns out. True, various attributes of an address do repeat, but those attributes are needed as part of the address object to create that unique location. Again, the true consumer of the attribute "country" is an address object. The fact that there is a lookup table to select countries is (as far as I am concerned) a referential integrity and data predictability / consistancy issue.

    Bottom line, design your systems to achieve the business requirements. Databases are getting bigger, faster, stronger all the time with improved compression, optimisation and retrieval, so a couple of small "technical" misdemeanors are easily tolerable so long as there is the business requirement to reinforce that design decision.

    So, having City as a pointer to the city table, or Country as a pointer to the country table versus having their names held within the Motel table is a bit moot in some regards - if there are no other attributes - then what you really need is referential integrity and that is achieved by a lookup table (returning code or name). It is similar in some regards (ie referential integrity) to moving those check constraints from Motel Room to a RoomType table - however, it is easy to imagine additional attributes for roomtype.

    Another aspect to consider is "how many" if only a few thousand, then that is not so many rows in a Motel table. If several hundred thousand then the impacts are a lot more significant. Again, that comes back to your business model. Using country code again, for me, I would be using coded information - simply because of the standards in place for country codes. In reality, every time you go to access address it is highly likely that the descriptive information is being used/ retrieved (anyway). Over a few thousand rows, fine, over several hundred thousand rows then there are architectural considerations that come into play (such as index tree storage issues) that need to be considered in your designs.

    In all honesty, there is far too little information to give you the "best" advice about your specific designs.

    All the experts here have given valuable snippets of advice, and the best advice I can give is to look back through all those considerations and learn from them to come up with your own design. Then model it for a bit (even just pen and paper) does it achieve your business requirements ? How important is it to comply with "nth degree normal forms" ?

    I noticed VARCHAR2 datatypes, and also notice SQL2008 - are you doing Oracle or SQL ?
    Not that it makes too much difference, just curious...
    LVL 47

    Assisted Solution

    Creating a city table is directly connected with the normalization.


    NAME                 CITY
    ====                =====
    ray                    Dallas
    blue                  Sofia
    yellow               Hayward

    First at all see the place you need to store the name - at least 30 bytes
    Even more, because there cities with the same name in different states.

    Theoretically (if the city changes the name, but pratically if it is enetered incorrect due misunderstanding!) see
    in how much records you will need to do the changes and how will you distinguish the wrong from the correct names.  

    There are also another considerations to normalize the table set.
    LVL 22

    Expert Comment

    >> Creating a city table is directly connected with the normalization.

    I don't see why you would think that. Your example is incomplete but I assume MotelID is the only key and the relevant dependencies are MotelID->Name and MotelID->City. I would say therefore that the Motels table is already in 5NF. Creating another table for City won't change that and has nothing to do with normalization that I can see.
    LVL 15

    Expert Comment

    by:Franck Pachot
    Hi dportas,
    You're right but some normal forms are also concerned by dependecies between non-key attributes. Motel table is not 5NF because of the (supposed) dependency City -> Country, and maybe the City -> Postalcode as well.
    LVL 1

    Author Comment

    Your replies are detailed and well explained. They helped me make up my decision! I'm grateful of your tremendous support.

    To close up,


    "I noticed VARCHAR2 datatypes, and also notice SQL2008 - are you doing Oracle or SQL ?
    Not that it makes too much difference, just curious..."

    The syntax i posted is written in Oracle. I was into the design aspect rather than the syntax. I don't think
    the syntax has an impact of the design i make.

    Best Regards,
    LVL 1

    Author Closing Comment

    Well Done! A+

    Featured Post

    Do You Know the 4 Main Threat Actor Types?

    Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

    Join & Write a Comment

    This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
    In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
    This video shows how to copy a database user from one database to another user DBMS_METADATA.  It also shows how to copy a user's permissions and discusses password hash differences between Oracle 10g and 11g.
    This video shows setup options and the basic steps and syntax for duplicating (cloning) a database from one instance to another. Examples are given for duplicating to the same machine and to different machines

    733 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    19 Experts available now in Live!

    Get 1:1 Help Now