DB Design Question. Over denormailising to One To One Mappings.

Posted on 2005-04-19
Last Modified: 2010-03-19
Just looking for opinions on this design issue. Our database contains different types of documents (artiles,whitepapers etc) stored in the tables
Document (DocumentID,dateWritten,author,AdminCountryid,dateAvailable,format,price.....)
DocumentLocal (LocalID,DocumentID,title,Publishingcountryid,dateAvailable,priceband,....)

Where the Document contains header details about a document and DocumentLocal contains information about local verions of the document. For example some documents might have different titles or prices for different countries. Traditionally all these documents where available only
in hard copy format. Now some of these documents will become exclusively available online in digital format, some will remain only in hardcopy format and some in both format. This also applies to the local version

The design I inherited was to extend Document, and DocumentLocal with bit fields to determine format and create a new table for Digital Documents.
Proposed solution
Document (DocumentID,DateWritten,Author,AdminCountryid,isSoftCopy,isHardCopy,DateAvailable, Format, Price...)
DocumentLocal (LocalID,DocumentID,title,Publishingcountryid,isHardCopy,HardCopydateAvailable, HardCopypriceband,isSoftCopy,SoftCopyDateAvailable,softCopyPriceband....)
So the Document table is made up of details common to the soft and hard copies and all hard copy data if isHardCopy = 1. if isSoftCopy = 1 then there will be a row in DigitalDocuments. This was done mainly because 95% of documents are available in HardCopy version. But this seems wrong to me as the hardcopy version should be treated like the digital and separated out. I would prefer to restructure our DB as follows.

Document (DocumentID,DateWritten,Author,AdminCountryid)
DocumentLocal (LocalID,DocumentID,title,Publishingcountryid)
HardCopyDocument(DocumentID,DateAvailable, Format, Price)
HardCopyDocumentLocal(LocalDocumentID, dateAvailable, Priceband)
DigitalDocumentLocal(LocalDocumentID, dateAvailable, Priceband)

Is this taking normalisation too far as I have introducted more joins. To get a full documents details you now have to do a left outer join on document, DigitialDocument and hardCopyDocument. Same with their local versions.
Question by:JardIT
    LVL 8

    Assisted Solution

    This would seem to be a more balanced normalisation:-

    Document (DocumentID,DateWritten,Author,AdminCountryid)
    DocumentLocal (LocalID,DocumentID,title,Publishingcountryid)

    DocumentFormats(DocumentID, FormatID, DateAvailable, Format, Priice)
    DocumentFormatsLocal(LocalDocumentID, FormatID, dateAvailable, PriceBand)

    With a Format Table

    FormatID, Format
    1, "HardCopy"
    2, "Digital"

    LVL 2

    Author Comment

    Yea I thaught of that but, and forgive me for not explaining it clearer above, but there some fields that apply to the digital format only. And there is talk that more will be added in the furture that will apply only to the digital format. So if I went with the DocumentFormats table and I need to include some fields that will only be populated when FormatID = Digital. And for 95% of the data these will be nulls but over time will drop as most new documents will have a digital version and when historical documents are retyped in a soft copy.

    So is it a bad thing to have these null fields repeated ?
    LVL 7

    Accepted Solution

    from just looking at table and column names it would seem that the original format of the db would cover hardcopy and electronic versions given the fact that it has a "format" coulmn.

    if the above was the case it would recive a unique documentid if there is an electronic version and a hardcopy version avalible, don't know if this is what you would want or if the column name is just deciving and has nothing to do with if its an electronic version or not.

    also its kinda weird that "dateAvailable" is in both of the original tables, i guess it is actually two diffrent things both given the same name (dateAvailable) but thats a diffrent issue.

    as for the proposed change that is suggested, well, it is certinally de-normalized! (you wernt kidding about that :) it looks like a non-it buissness man (who should never never never even LOOK at a db designed it), might be some frustration shown here because of how often i run into this type of thing myself.

    as for your own suggestion, it's not too shabby, i dont understand why you dont include the price in the "DigitalDocument(DocumentID,DateAvailable,Format)" perhaps this is just how te buisness works.  i dont think the xtra joins will be a big deal, compared to db understandability (if thats even a word) the xtra tables are nothing at all to deal with.  with the correct index's and views no one will know the better.

    anthonywjones66's version seems good as well, aside from his and your own version theres really not many other (reasonable) ways too look at the problem.  i think its important to point out the PK's and FK's to avoid any confusion ill take the liberty here, correct and forgive me if im wrong anthony

    [Almost a Quote]
    Document (DocumentID(PK),DateWritten,Author,AdminCountryid)
    DocumentLocal (LocalID(PK),DocumentID(FK),title,Publishingcountryid)

    DocumentFormats(DocumentID(PK), FormatID(PK)(FK), DateAvailable, Format, Priice)
    DocumentFormatsLocal(LocalDocumentID(PK), FormatID(PK)(FK), dateAvailable, PriceBand)

    With a Format Table

    FormatID(PK), Format
    1, "HardCopy"
    2, "Digital"
    [/Almost a Quote]

    important to note that each document id will be repeated in the "documentformats" table and "documentformatslocal"

    overall this db design is assuming that for any given document the price (and other info?) will vary depending upon the location and format that it has.

    good luck


    LVL 7

    Expert Comment

    just read your last post, the most simple way to deal with new data that reguards only digital documents would be to add another table that would contain..


    or if the information reguarding a digital document would vary depending on location...


    You wouldnt really have to worry about anything involving format in a seprate table because it would only contain documents that are digital anyhow, and if its only a few number of records the joins should be fast.  it would eliminate the nulls.

    LVL 2

    Author Comment

    Hi Navicerts,

    Just to explain things clearer format can be different for both the electronic version (doc, pdf, text) and hardcopy (typed,scripted).  So there is a format table
    as follows Format (FormatID(pk), Description). Bad name I know buts it legacy stuff. The dateAvailable represents when that format of the document is available. Some documents need to be modified before being available in certain countries so their local dateAvailable will be different than when the original document was made available in the source country.

    My original thinking (which I will re-write now identifying keys hopefully to make things clearer)

    Document (DocumentID(PK),DateWritten,Author,AdminCountryid(FK),.....other document header fields)
    DocumentLocal (LocalID(PK),DocumentID(FK),title,PublishingCountryid(FK),... other data related to local versions of the documents...)
    HardCopyDocument(DocumentID(PK)(FK), DateAvailable, Format, Price, binding, cover,NoCopies)
    HardCopyDocumentLocal(LocalDocumentID(PK)(FK), dateAvailable, Priceband)
    DigitalDocumentLocal(LocalDocumentID(PK)(FK), dateAvailable, Priceband)

    The reason I was thinking of this schema was because there should (!!) only ever be a digital copy and a hardcopy versions of a document. Some fields only apply to the digital and some only apply to the hard copy. While the local versions could be stored in a DocumentFormatsLocal table as suggested I have been told that their could be fields added that apply to a local verions of a document available in digital format (and you guessed it the same for hard copy).

    I want to go back to the team and say
    1) using IsSoftCopy and IsHardCopy indicators isnt the most efficient way of doing things because...(as the developers think it makes things clear to them what formats the document is available in and only needs 1 select and no joins)
    2) storing fields that dont related to the key (such as HardCopydateAvailable, HardCopypriceband which depend on the isHardCopy indicator) is inefficent and should be spilt out in to different tables, even though it results in more joins and left outer joins as it may or may not have a digital or hardcopy version because....



    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    Join & Write a Comment

    Suggested Solutions

    Title # Comments Views Activity
    SQL Query 18 66
    I've got to change the date range for this select. How? 4 25
    SSD vs SAS disk for SQL 3 9
    SQL Split output 2 0
    I wrote this interesting script that really help me find jobs or procedures when working in a huge environment. I could I have written it as a Procedure but then I would have to have it on each machine or have a link to a server-related search that …
    JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
    This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
    Via a live example, show how to shrink a transaction log file down to a reasonable size.

    732 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now