Link to home
Start Free TrialLog in
Avatar of JardIT
JardIT

asked on

DB Design Question. Over denormailising to One To One Mappings.

Just looking for opinions on this design issue. Our database contains different types of documents (artiles,whitepapers etc) stored in the tables
Document (DocumentID,dateWritten,author,AdminCountryid,dateAvailable,format,price.....)
DocumentLocal (LocalID,DocumentID,title,Publishingcountryid,dateAvailable,priceband,....)

Where the Document contains header details about a document and DocumentLocal contains information about local verions of the document. For example some documents might have different titles or prices for different countries. Traditionally all these documents where available only
in hard copy format. Now some of these documents will become exclusively available online in digital format, some will remain only in hardcopy format and some in both format. This also applies to the local version

The design I inherited was to extend Document, and DocumentLocal with bit fields to determine format and create a new table for Digital Documents.
Proposed solution
Document (DocumentID,DateWritten,Author,AdminCountryid,isSoftCopy,isHardCopy,DateAvailable, Format, Price...)
DocumentLocal (LocalID,DocumentID,title,Publishingcountryid,isHardCopy,HardCopydateAvailable, HardCopypriceband,isSoftCopy,SoftCopyDateAvailable,softCopyPriceband....)
DigitalDocument(DocumentID,DateAvailable,Format,Price)
So the Document table is made up of details common to the soft and hard copies and all hard copy data if isHardCopy = 1. if isSoftCopy = 1 then there will be a row in DigitalDocuments. This was done mainly because 95% of documents are available in HardCopy version. But this seems wrong to me as the hardcopy version should be treated like the digital and separated out. I would prefer to restructure our DB as follows.

Document (DocumentID,DateWritten,Author,AdminCountryid)
DocumentLocal (LocalID,DocumentID,title,Publishingcountryid)
HardCopyDocument(DocumentID,DateAvailable, Format, Price)
DigitalDocument(DocumentID,DateAvailable,Format)
HardCopyDocumentLocal(LocalDocumentID, dateAvailable, Priceband)
DigitalDocumentLocal(LocalDocumentID, dateAvailable, Priceband)

Is this taking normalisation too far as I have introducted more joins. To get a full documents details you now have to do a left outer join on document, DigitialDocument and hardCopyDocument. Same with their local versions.
SOLUTION
Avatar of anthonywjones66
anthonywjones66

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of JardIT
JardIT

ASKER

Yea I thaught of that but, and forgive me for not explaining it clearer above, but there some fields that apply to the digital format only. And there is talk that more will be added in the furture that will apply only to the digital format. So if I went with the DocumentFormats table and I need to include some fields that will only be populated when FormatID = Digital. And for 95% of the data these will be nulls but over time will drop as most new documents will have a digital version and when historical documents are retyped in a soft copy.

So is it a bad thing to have these null fields repeated ?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
just read your last post, the most simple way to deal with new data that reguards only digital documents would be to add another table that would contain..

SomeOtheDescriptiveName
-----------------------------------
DocumentID(PK)(FK)
Info1
Info2
Info3
Info4

or if the information reguarding a digital document would vary depending on location...

SomeOtheDescriptiveName
-----------------------------------
LocalID(PK)(FK)
Info1
Info2
Info3
Info4

You wouldnt really have to worry about anything involving format in a seprate table because it would only contain documents that are digital anyhow, and if its only a few number of records the joins should be fast.  it would eliminate the nulls.

-Navicerts
Avatar of JardIT

ASKER

Hi Navicerts,

Just to explain things clearer format can be different for both the electronic version (doc, pdf, text) and hardcopy (typed,scripted).  So there is a format table
as follows Format (FormatID(pk), Description). Bad name I know buts it legacy stuff. The dateAvailable represents when that format of the document is available. Some documents need to be modified before being available in certain countries so their local dateAvailable will be different than when the original document was made available in the source country.

My original thinking (which I will re-write now identifying keys hopefully to make things clearer)

Document (DocumentID(PK),DateWritten,Author,AdminCountryid(FK),.....other document header fields)
DocumentLocal (LocalID(PK),DocumentID(FK),title,PublishingCountryid(FK),... other data related to local versions of the documents...)
HardCopyDocument(DocumentID(PK)(FK), DateAvailable, Format, Price, binding, cover,NoCopies)
DigitalDocument(DocumentID(PK)(FK),DateAvailable,Format,accessability)
HardCopyDocumentLocal(LocalDocumentID(PK)(FK), dateAvailable, Priceband)
DigitalDocumentLocal(LocalDocumentID(PK)(FK), dateAvailable, Priceband)

The reason I was thinking of this schema was because there should (!!) only ever be a digital copy and a hardcopy versions of a document. Some fields only apply to the digital and some only apply to the hard copy. While the local versions could be stored in a DocumentFormatsLocal table as suggested I have been told that their could be fields added that apply to a local verions of a document available in digital format (and you guessed it the same for hard copy).

I want to go back to the team and say
1) using IsSoftCopy and IsHardCopy indicators isnt the most efficient way of doing things because...(as the developers think it makes things clear to them what formats the document is available in and only needs 1 select and no joins)
2) storing fields that dont related to the key (such as HardCopydateAvailable, HardCopypriceband which depend on the isHardCopy indicator) is inefficent and should be spilt out in to different tables, even though it results in more joins and left outer joins as it may or may not have a digital or hardcopy version because....

Cheers
A