DB Design Question. Over denormailising to One To One Mappings.

Just looking for opinions on this design issue. Our database contains different types of documents (artiles,whitepapers etc) stored in the tables
Document (DocumentID,dateWritten,author,AdminCountryid,dateAvailable,format,price.....)
DocumentLocal (LocalID,DocumentID,title,Publishingcountryid,dateAvailable,priceband,....)

Where the Document contains header details about a document and DocumentLocal contains information about local verions of the document. For example some documents might have different titles or prices for different countries. Traditionally all these documents where available only
in hard copy format. Now some of these documents will become exclusively available online in digital format, some will remain only in hardcopy format and some in both format. This also applies to the local version

The design I inherited was to extend Document, and DocumentLocal with bit fields to determine format and create a new table for Digital Documents.
Proposed solution
Document (DocumentID,DateWritten,Author,AdminCountryid,isSoftCopy,isHardCopy,DateAvailable, Format, Price...)
DocumentLocal (LocalID,DocumentID,title,Publishingcountryid,isHardCopy,HardCopydateAvailable, HardCopypriceband,isSoftCopy,SoftCopyDateAvailable,softCopyPriceband....)
So the Document table is made up of details common to the soft and hard copies and all hard copy data if isHardCopy = 1. if isSoftCopy = 1 then there will be a row in DigitalDocuments. This was done mainly because 95% of documents are available in HardCopy version. But this seems wrong to me as the hardcopy version should be treated like the digital and separated out. I would prefer to restructure our DB as follows.

Document (DocumentID,DateWritten,Author,AdminCountryid)
DocumentLocal (LocalID,DocumentID,title,Publishingcountryid)
HardCopyDocument(DocumentID,DateAvailable, Format, Price)
HardCopyDocumentLocal(LocalDocumentID, dateAvailable, Priceband)
DigitalDocumentLocal(LocalDocumentID, dateAvailable, Priceband)

Is this taking normalisation too far as I have introducted more joins. To get a full documents details you now have to do a left outer join on document, DigitialDocument and hardCopyDocument. Same with their local versions.
Who is Participating?
NavicertsConnect With a Mentor Commented:
from just looking at table and column names it would seem that the original format of the db would cover hardcopy and electronic versions given the fact that it has a "format" coulmn.

if the above was the case it would recive a unique documentid if there is an electronic version and a hardcopy version avalible, don't know if this is what you would want or if the column name is just deciving and has nothing to do with if its an electronic version or not.

also its kinda weird that "dateAvailable" is in both of the original tables, i guess it is actually two diffrent things both given the same name (dateAvailable) but thats a diffrent issue.

as for the proposed change that is suggested, well, it is certinally de-normalized! (you wernt kidding about that :) it looks like a non-it buissness man (who should never never never even LOOK at a db designed it), might be some frustration shown here because of how often i run into this type of thing myself.

as for your own suggestion, it's not too shabby, i dont understand why you dont include the price in the "DigitalDocument(DocumentID,DateAvailable,Format)" perhaps this is just how te buisness works.  i dont think the xtra joins will be a big deal, compared to db understandability (if thats even a word) the xtra tables are nothing at all to deal with.  with the correct index's and views no one will know the better.

anthonywjones66's version seems good as well, aside from his and your own version theres really not many other (reasonable) ways too look at the problem.  i think its important to point out the PK's and FK's to avoid any confusion ill take the liberty here, correct and forgive me if im wrong anthony

[Almost a Quote]
Document (DocumentID(PK),DateWritten,Author,AdminCountryid)
DocumentLocal (LocalID(PK),DocumentID(FK),title,Publishingcountryid)

DocumentFormats(DocumentID(PK), FormatID(PK)(FK), DateAvailable, Format, Priice)
DocumentFormatsLocal(LocalDocumentID(PK), FormatID(PK)(FK), dateAvailable, PriceBand)

With a Format Table

FormatID(PK), Format
1, "HardCopy"
2, "Digital"
[/Almost a Quote]

important to note that each document id will be repeated in the "documentformats" table and "documentformatslocal"

overall this db design is assuming that for any given document the price (and other info?) will vary depending upon the location and format that it has.

good luck


anthonywjones66Connect With a Mentor Commented:
This would seem to be a more balanced normalisation:-

Document (DocumentID,DateWritten,Author,AdminCountryid)
DocumentLocal (LocalID,DocumentID,title,Publishingcountryid)

DocumentFormats(DocumentID, FormatID, DateAvailable, Format, Priice)
DocumentFormatsLocal(LocalDocumentID, FormatID, dateAvailable, PriceBand)

With a Format Table

FormatID, Format
1, "HardCopy"
2, "Digital"

JardITAuthor Commented:
Yea I thaught of that but, and forgive me for not explaining it clearer above, but there some fields that apply to the digital format only. And there is talk that more will be added in the furture that will apply only to the digital format. So if I went with the DocumentFormats table and I need to include some fields that will only be populated when FormatID = Digital. And for 95% of the data these will be nulls but over time will drop as most new documents will have a digital version and when historical documents are retyped in a soft copy.

So is it a bad thing to have these null fields repeated ?
just read your last post, the most simple way to deal with new data that reguards only digital documents would be to add another table that would contain..


or if the information reguarding a digital document would vary depending on location...


You wouldnt really have to worry about anything involving format in a seprate table because it would only contain documents that are digital anyhow, and if its only a few number of records the joins should be fast.  it would eliminate the nulls.

JardITAuthor Commented:
Hi Navicerts,

Just to explain things clearer format can be different for both the electronic version (doc, pdf, text) and hardcopy (typed,scripted).  So there is a format table
as follows Format (FormatID(pk), Description). Bad name I know buts it legacy stuff. The dateAvailable represents when that format of the document is available. Some documents need to be modified before being available in certain countries so their local dateAvailable will be different than when the original document was made available in the source country.

My original thinking (which I will re-write now identifying keys hopefully to make things clearer)

Document (DocumentID(PK),DateWritten,Author,AdminCountryid(FK),.....other document header fields)
DocumentLocal (LocalID(PK),DocumentID(FK),title,PublishingCountryid(FK),... other data related to local versions of the documents...)
HardCopyDocument(DocumentID(PK)(FK), DateAvailable, Format, Price, binding, cover,NoCopies)
HardCopyDocumentLocal(LocalDocumentID(PK)(FK), dateAvailable, Priceband)
DigitalDocumentLocal(LocalDocumentID(PK)(FK), dateAvailable, Priceband)

The reason I was thinking of this schema was because there should (!!) only ever be a digital copy and a hardcopy versions of a document. Some fields only apply to the digital and some only apply to the hard copy. While the local versions could be stored in a DocumentFormatsLocal table as suggested I have been told that their could be fields added that apply to a local verions of a document available in digital format (and you guessed it the same for hard copy).

I want to go back to the team and say
1) using IsSoftCopy and IsHardCopy indicators isnt the most efficient way of doing things because...(as the developers think it makes things clear to them what formats the document is available in and only needs 1 select and no joins)
2) storing fields that dont related to the key (such as HardCopydateAvailable, HardCopypriceband which depend on the isHardCopy indicator) is inefficent and should be spilt out in to different tables, even though it results in more joins and left outer joins as it may or may not have a digital or hardcopy version because....


Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.