JardIT
asked on
DB Design Question. Over denormailising to One To One Mappings.
Just looking for opinions on this design issue. Our database contains different types of documents (artiles,whitepapers etc) stored in the tables
Document (DocumentID,dateWritten,au thor,Admin Countryid, dateAvaila ble,format ,price.... .)
DocumentLocal (LocalID,DocumentID,title, Publishing countryid, dateAvaila ble,priceb and,....)
Where the Document contains header details about a document and DocumentLocal contains information about local verions of the document. For example some documents might have different titles or prices for different countries. Traditionally all these documents where available only
in hard copy format. Now some of these documents will become exclusively available online in digital format, some will remain only in hardcopy format and some in both format. This also applies to the local version
The design I inherited was to extend Document, and DocumentLocal with bit fields to determine format and create a new table for Digital Documents.
Proposed solution
Document (DocumentID,DateWritten,Au thor,Admin Countryid, isSoftCopy ,isHardCop y,DateAvai lable, Format, Price...)
DocumentLocal (LocalID,DocumentID,title, Publishing countryid, isHardCopy ,HardCopyd ateAvailab le, HardCopypriceband,isSoftCo py,SoftCop yDateAvail able,softC opyPriceba nd....)
DigitalDocument(DocumentID ,DateAvail able,Forma t,Price)
So the Document table is made up of details common to the soft and hard copies and all hard copy data if isHardCopy = 1. if isSoftCopy = 1 then there will be a row in DigitalDocuments. This was done mainly because 95% of documents are available in HardCopy version. But this seems wrong to me as the hardcopy version should be treated like the digital and separated out. I would prefer to restructure our DB as follows.
Document (DocumentID,DateWritten,Au thor,Admin Countryid)
DocumentLocal (LocalID,DocumentID,title, Publishing countryid)
HardCopyDocument(DocumentI D,DateAvai lable, Format, Price)
DigitalDocument(DocumentID ,DateAvail able,Forma t)
HardCopyDocumentLocal(Loca lDocumentI D, dateAvailable, Priceband)
DigitalDocumentLocal(Local DocumentID , dateAvailable, Priceband)
Is this taking normalisation too far as I have introducted more joins. To get a full documents details you now have to do a left outer join on document, DigitialDocument and hardCopyDocument. Same with their local versions.
Document (DocumentID,dateWritten,au
DocumentLocal (LocalID,DocumentID,title,
Where the Document contains header details about a document and DocumentLocal contains information about local verions of the document. For example some documents might have different titles or prices for different countries. Traditionally all these documents where available only
in hard copy format. Now some of these documents will become exclusively available online in digital format, some will remain only in hardcopy format and some in both format. This also applies to the local version
The design I inherited was to extend Document, and DocumentLocal with bit fields to determine format and create a new table for Digital Documents.
Proposed solution
Document (DocumentID,DateWritten,Au
DocumentLocal (LocalID,DocumentID,title,
DigitalDocument(DocumentID
So the Document table is made up of details common to the soft and hard copies and all hard copy data if isHardCopy = 1. if isSoftCopy = 1 then there will be a row in DigitalDocuments. This was done mainly because 95% of documents are available in HardCopy version. But this seems wrong to me as the hardcopy version should be treated like the digital and separated out. I would prefer to restructure our DB as follows.
Document (DocumentID,DateWritten,Au
DocumentLocal (LocalID,DocumentID,title,
HardCopyDocument(DocumentI
DigitalDocument(DocumentID
HardCopyDocumentLocal(Loca
DigitalDocumentLocal(Local
Is this taking normalisation too far as I have introducted more joins. To get a full documents details you now have to do a left outer join on document, DigitialDocument and hardCopyDocument. Same with their local versions.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
just read your last post, the most simple way to deal with new data that reguards only digital documents would be to add another table that would contain..
SomeOtheDescriptiveName
-------------------------- ---------
DocumentID(PK)(FK)
Info1
Info2
Info3
Info4
or if the information reguarding a digital document would vary depending on location...
SomeOtheDescriptiveName
-------------------------- ---------
LocalID(PK)(FK)
Info1
Info2
Info3
Info4
You wouldnt really have to worry about anything involving format in a seprate table because it would only contain documents that are digital anyhow, and if its only a few number of records the joins should be fast. it would eliminate the nulls.
-Navicerts
SomeOtheDescriptiveName
--------------------------
DocumentID(PK)(FK)
Info1
Info2
Info3
Info4
or if the information reguarding a digital document would vary depending on location...
SomeOtheDescriptiveName
--------------------------
LocalID(PK)(FK)
Info1
Info2
Info3
Info4
You wouldnt really have to worry about anything involving format in a seprate table because it would only contain documents that are digital anyhow, and if its only a few number of records the joins should be fast. it would eliminate the nulls.
-Navicerts
ASKER
Hi Navicerts,
Just to explain things clearer format can be different for both the electronic version (doc, pdf, text) and hardcopy (typed,scripted). So there is a format table
as follows Format (FormatID(pk), Description). Bad name I know buts it legacy stuff. The dateAvailable represents when that format of the document is available. Some documents need to be modified before being available in certain countries so their local dateAvailable will be different than when the original document was made available in the source country.
My original thinking (which I will re-write now identifying keys hopefully to make things clearer)
Document (DocumentID(PK),DateWritte n,Author,A dminCountr yid(FK),.. ...other document header fields)
DocumentLocal (LocalID(PK),DocumentID(FK ),title,Pu blishingCo untryid(FK ),... other data related to local versions of the documents...)
HardCopyDocument(DocumentI D(PK)(FK), DateAvailable, Format, Price, binding, cover,NoCopies)
DigitalDocument(DocumentID (PK)(FK),D ateAvailab le,Format, accessabil ity)
HardCopyDocumentLocal(Loca lDocumentI D(PK)(FK), dateAvailable, Priceband)
DigitalDocumentLocal(Local DocumentID (PK)(FK), dateAvailable, Priceband)
The reason I was thinking of this schema was because there should (!!) only ever be a digital copy and a hardcopy versions of a document. Some fields only apply to the digital and some only apply to the hard copy. While the local versions could be stored in a DocumentFormatsLocal table as suggested I have been told that their could be fields added that apply to a local verions of a document available in digital format (and you guessed it the same for hard copy).
I want to go back to the team and say
1) using IsSoftCopy and IsHardCopy indicators isnt the most efficient way of doing things because...(as the developers think it makes things clear to them what formats the document is available in and only needs 1 select and no joins)
2) storing fields that dont related to the key (such as HardCopydateAvailable, HardCopypriceband which depend on the isHardCopy indicator) is inefficent and should be spilt out in to different tables, even though it results in more joins and left outer joins as it may or may not have a digital or hardcopy version because....
Cheers
A
Just to explain things clearer format can be different for both the electronic version (doc, pdf, text) and hardcopy (typed,scripted). So there is a format table
as follows Format (FormatID(pk), Description). Bad name I know buts it legacy stuff. The dateAvailable represents when that format of the document is available. Some documents need to be modified before being available in certain countries so their local dateAvailable will be different than when the original document was made available in the source country.
My original thinking (which I will re-write now identifying keys hopefully to make things clearer)
Document (DocumentID(PK),DateWritte
DocumentLocal (LocalID(PK),DocumentID(FK
HardCopyDocument(DocumentI
DigitalDocument(DocumentID
HardCopyDocumentLocal(Loca
DigitalDocumentLocal(Local
The reason I was thinking of this schema was because there should (!!) only ever be a digital copy and a hardcopy versions of a document. Some fields only apply to the digital and some only apply to the hard copy. While the local versions could be stored in a DocumentFormatsLocal table as suggested I have been told that their could be fields added that apply to a local verions of a document available in digital format (and you guessed it the same for hard copy).
I want to go back to the team and say
1) using IsSoftCopy and IsHardCopy indicators isnt the most efficient way of doing things because...(as the developers think it makes things clear to them what formats the document is available in and only needs 1 select and no joins)
2) storing fields that dont related to the key (such as HardCopydateAvailable, HardCopypriceband which depend on the isHardCopy indicator) is inefficent and should be spilt out in to different tables, even though it results in more joins and left outer joins as it may or may not have a digital or hardcopy version because....
Cheers
A
ASKER
So is it a bad thing to have these null fields repeated ?