Link to home
Start Free TrialLog in
Avatar of Mr_Shaw
Mr_Shaw

asked on

Beyond 3NF

When normalizing a database should I go beyond 3NF
SOLUTION
Avatar of RiteshShah
RiteshShah
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Avatar of Aneesh
Aneesh
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Why is your queston not posted in DB2, Ingress, Postgress, MySQL threads?
I hope you understand the fun.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Normalization is a way to check the logical design of the project and to figure out
DML anomalies. Also it encounters mistakes by the logical design. Also it leads to best understanding of
business rules and data model.
In the same time normalization leads to fragmentation of the tables (the most common method
to normalize is to divide unnormalized tables in projection tables and ensure logical links between
using "primary-foreign key references". This brings wasting space to store the keys, wasting time to join the tables,
complex optimization of the query, unwanted full table scans, complex description of the modell, expensive programming and possible errors. Thatswhy after normalization major application experience denormalization to speed up the vital select statements. The slogan of Kevin Loney ("Oracle 8i the complete Reference") is "No major application will run in Third Normal Forms". This is the basis of the denormalization steps.
Avatar of dportas
dportas

"No major application will run in Third Normal Forms"
Taken at face value that is utter nonsense of course, and provably so. In the context of Loney's book that quotation is used as a provocative rhetorical device to begin a discussion. It's a totally worthless comment when taken out of context.
I have seen applications in the United States that due the normalization (after joining 15-20 tables)
"seems to run forever" as one of the leads wrote to the developers staff.
Of course the slogan is an open door for discussion where to stop the normalization.

Let us speak now very seriosly!

Imagine a medium project and a select there in which are involved 15 tables.
Think about the joins - the links obviously should be indexed. From both sides - from
parent key side and from foreign key side. Now think what will be the cost of simple DML
if more tables (or even one) is involved. What will be the price of the maintenance of the indexes?

Will never forget Juliet Closs from New York: "I have the feeling  that the application runs forever!"

So i will say that the normalization helps to analyze deeper the relationships between the primary keys and components of them with the other columns in the tables. This is very good approach and helps to avoid running in troubles in the future development. But if the number of the tables is too big and the SELECT statements "run forever" ... something has to be done.

Portas,
what about the famous article of Tom Kyte (Oracle Magazine, january 2008), where he wrote: " ... do not use bind variables! Yes, I have said: do not use bind variables!" So myths end!
The old luy of Oracle that if the application uses bind variables this will speed up the performance. This will only exclude the hard parsing, but the used execution plan could be the worst (and Tom Kyte shows such case in the article!).
Myths!
schwertner,

moral of the story is, not to use NF beyond 3NF as I have explained in my very first post. not worth at all in 99% situation and remaining 1% situation yet to come in my life. ;)
Ritesh, Very many (and perhaps even most) 3NF schemas are already in 5NF and/or BCNF. So what you say doesn't make much sense to me and probably isn't even possible as far as I can see. If you create a 3NF design that is also in 5NF then how can you "not use" 5NF!? If you have a real suggestion for actually *avoiding* normalized designs then I think you need to give more explanation.

It's fatuous to talk about performance wthout considering a database's physical implementation and the actual operations being performed on it. Normalization tells us nothing about either of those things. Saying "more normalization=less speed" is like saying "red cars go faster than blue ones" while explaining nothing about the engines and spec of the cars you are talking about.
I have no clue where to stop the normalization.
Have the feeling that the deeper you investigate and normalize the best will you know
the relationship between the primary key and the other non-key columns.
So you will be able to investigate the dark corners of the project.
Your findings could be critical if really the model needs normalization beyound
3NF. Because if you look at the history of the NF ( there are steps,
in his first and original article Dr. Codd mentione only 1NF - no repeating groups and every row has a primary key):
1. E.F. Codd, "A Relational Model of Data for Large Shared Data  
    Banks", Comm. ACM 13 (6), June 1970, pp. 377-387.
    The original paper introducing the relational data model.

2. E.F. Codd, "Normalized Data Base Structure: A Brief Tutorial",
    ACM SIGFIDET Workshop on Data Description, Access, and
    Control, Nov. 11-12, 1971, San Diego, California, E.F. Codd and
    A.L. Dean (eds.).
   An early tutorial on the relational model and normalization.

3. E.F. Codd, "Further Normalization of the Data Base Relational
   Model", R. Rustin (ed.), Data Base Systems (Courant  
   ComputerScience Symposia 6), Prentice-Hall, 1972. Also IBM
   Research Report RJ909.
   The first formal treatment of second and third normal forms.

So every normal forms is invented because there was a real situation that leads to anomalies.

But from other hand: now a days applications have enormous number of tables (E-Business_Suite - about 9000, SAP - over 14000). The performance of both is bad, as markgeer wrote many times these products are not designed for performance.

Avatar of Mr_Shaw

ASKER

thanks for giving me so much info