[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now


DB Design Question. Over denormailising to One To One Mappings.

Posted on 2005-04-19
Medium Priority
Last Modified: 2010-03-19
Just looking for opinions on this design issue. Our database contains different types of documents (artiles,whitepapers etc) stored in the tables
Document (DocumentID,dateWritten,author,AdminCountryid,dateAvailable,format,price.....)
DocumentLocal (LocalID,DocumentID,title,Publishingcountryid,dateAvailable,priceband,....)

Where the Document contains header details about a document and DocumentLocal contains information about local verions of the document. For example some documents might have different titles or prices for different countries. Traditionally all these documents where available only
in hard copy format. Now some of these documents will become exclusively available online in digital format, some will remain only in hardcopy format and some in both format. This also applies to the local version

The design I inherited was to extend Document, and DocumentLocal with bit fields to determine format and create a new table for Digital Documents.
Proposed solution
Document (DocumentID,DateWritten,Author,AdminCountryid,isSoftCopy,isHardCopy,DateAvailable, Format, Price...)
DocumentLocal (LocalID,DocumentID,title,Publishingcountryid,isHardCopy,HardCopydateAvailable, HardCopypriceband,isSoftCopy,SoftCopyDateAvailable,softCopyPriceband....)
So the Document table is made up of details common to the soft and hard copies and all hard copy data if isHardCopy = 1. if isSoftCopy = 1 then there will be a row in DigitalDocuments. This was done mainly because 95% of documents are available in HardCopy version. But this seems wrong to me as the hardcopy version should be treated like the digital and separated out. I would prefer to restructure our DB as follows.

Document (DocumentID,DateWritten,Author,AdminCountryid)
DocumentLocal (LocalID,DocumentID,title,Publishingcountryid)
HardCopyDocument(DocumentID,DateAvailable, Format, Price)
HardCopyDocumentLocal(LocalDocumentID, dateAvailable, Priceband)
DigitalDocumentLocal(LocalDocumentID, dateAvailable, Priceband)

Is this taking normalisation too far as I have introducted more joins. To get a full documents details you now have to do a left outer join on document, DigitialDocument and hardCopyDocument. Same with their local versions.
Question by:JardIT
  • 2
  • 2

Assisted Solution

anthonywjones66 earned 375 total points
ID: 13817823
This would seem to be a more balanced normalisation:-

Document (DocumentID,DateWritten,Author,AdminCountryid)
DocumentLocal (LocalID,DocumentID,title,Publishingcountryid)

DocumentFormats(DocumentID, FormatID, DateAvailable, Format, Priice)
DocumentFormatsLocal(LocalDocumentID, FormatID, dateAvailable, PriceBand)

With a Format Table

FormatID, Format
1, "HardCopy"
2, "Digital"


Author Comment

ID: 13818080
Yea I thaught of that but, and forgive me for not explaining it clearer above, but there some fields that apply to the digital format only. And there is talk that more will be added in the furture that will apply only to the digital format. So if I went with the DocumentFormats table and I need to include some fields that will only be populated when FormatID = Digital. And for 95% of the data these will be nulls but over time will drop as most new documents will have a digital version and when historical documents are retyped in a soft copy.

So is it a bad thing to have these null fields repeated ?

Accepted Solution

Navicerts earned 375 total points
ID: 13818154
from just looking at table and column names it would seem that the original format of the db would cover hardcopy and electronic versions given the fact that it has a "format" coulmn.

if the above was the case it would recive a unique documentid if there is an electronic version and a hardcopy version avalible, don't know if this is what you would want or if the column name is just deciving and has nothing to do with if its an electronic version or not.

also its kinda weird that "dateAvailable" is in both of the original tables, i guess it is actually two diffrent things both given the same name (dateAvailable) but thats a diffrent issue.

as for the proposed change that is suggested, well, it is certinally de-normalized! (you wernt kidding about that :) it looks like a non-it buissness man (who should never never never even LOOK at a db designed it), might be some frustration shown here because of how often i run into this type of thing myself.

as for your own suggestion, it's not too shabby, i dont understand why you dont include the price in the "DigitalDocument(DocumentID,DateAvailable,Format)" perhaps this is just how te buisness works.  i dont think the xtra joins will be a big deal, compared to db understandability (if thats even a word) the xtra tables are nothing at all to deal with.  with the correct index's and views no one will know the better.

anthonywjones66's version seems good as well, aside from his and your own version theres really not many other (reasonable) ways too look at the problem.  i think its important to point out the PK's and FK's to avoid any confusion ill take the liberty here, correct and forgive me if im wrong anthony

[Almost a Quote]
Document (DocumentID(PK),DateWritten,Author,AdminCountryid)
DocumentLocal (LocalID(PK),DocumentID(FK),title,Publishingcountryid)

DocumentFormats(DocumentID(PK), FormatID(PK)(FK), DateAvailable, Format, Priice)
DocumentFormatsLocal(LocalDocumentID(PK), FormatID(PK)(FK), dateAvailable, PriceBand)

With a Format Table

FormatID(PK), Format
1, "HardCopy"
2, "Digital"
[/Almost a Quote]

important to note that each document id will be repeated in the "documentformats" table and "documentformatslocal"

overall this db design is assuming that for any given document the price (and other info?) will vary depending upon the location and format that it has.

good luck



Expert Comment

ID: 13818213
just read your last post, the most simple way to deal with new data that reguards only digital documents would be to add another table that would contain..


or if the information reguarding a digital document would vary depending on location...


You wouldnt really have to worry about anything involving format in a seprate table because it would only contain documents that are digital anyhow, and if its only a few number of records the joins should be fast.  it would eliminate the nulls.


Author Comment

ID: 13820191
Hi Navicerts,

Just to explain things clearer format can be different for both the electronic version (doc, pdf, text) and hardcopy (typed,scripted).  So there is a format table
as follows Format (FormatID(pk), Description). Bad name I know buts it legacy stuff. The dateAvailable represents when that format of the document is available. Some documents need to be modified before being available in certain countries so their local dateAvailable will be different than when the original document was made available in the source country.

My original thinking (which I will re-write now identifying keys hopefully to make things clearer)

Document (DocumentID(PK),DateWritten,Author,AdminCountryid(FK),.....other document header fields)
DocumentLocal (LocalID(PK),DocumentID(FK),title,PublishingCountryid(FK),... other data related to local versions of the documents...)
HardCopyDocument(DocumentID(PK)(FK), DateAvailable, Format, Price, binding, cover,NoCopies)
HardCopyDocumentLocal(LocalDocumentID(PK)(FK), dateAvailable, Priceband)
DigitalDocumentLocal(LocalDocumentID(PK)(FK), dateAvailable, Priceband)

The reason I was thinking of this schema was because there should (!!) only ever be a digital copy and a hardcopy versions of a document. Some fields only apply to the digital and some only apply to the hard copy. While the local versions could be stored in a DocumentFormatsLocal table as suggested I have been told that their could be fields added that apply to a local verions of a document available in digital format (and you guessed it the same for hard copy).

I want to go back to the team and say
1) using IsSoftCopy and IsHardCopy indicators isnt the most efficient way of doing things because...(as the developers think it makes things clear to them what formats the document is available in and only needs 1 select and no joins)
2) storing fields that dont related to the key (such as HardCopydateAvailable, HardCopypriceband which depend on the isHardCopy indicator) is inefficent and should be spilt out in to different tables, even though it results in more joins and left outer joins as it may or may not have a digital or hardcopy version because....



Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When trying to connect from SSMS v17.x to a SQL Server Integration Services 2016 instance or previous version, you get the error “Connecting to the Integration Services service on the computer failed with the following error: 'The specified service …
Microsoft Access has a limit of 255 columns in a single table; SQL Server allows tables with over 255 columns, but reading that data is not necessarily simple.  The final solution for this task involved creating a custom text parser and then reading…
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.
Suggested Courses

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question