Beyond 3NF

When normalizing a database should I go beyond 3NF
Who is Participating?
HwkrangerConnect With a Mentor Commented:
Aside from what everyone said, I'll give you a little bit of real-world advice on what normal form to use.

1)  For what are you designing this application?
   a) example: data warehouses many use the fact/dimension modeling approach
   b) example: reporting "data marts" are sometimes very flat to quickly report data
   c) example: many entities map closely to objects in an application in 3rd NF, a 5th NF can create a nightmare for the data access layer developers.  (Which also can slow down application performance)
2) What is the life expentancy of the application?
  a) example: a quick tool doesn't require 3rd, BCNF, 4th, etc normal forms.
  b) example: an enterprise application that will remain for years probably should not be a flat design
3)  What is the targetted audience that will be accessing the data directly
  a) example: Reporting analysts that are not skilled with SQL will prefer to have a flatter schema
  b) example: Developers / DBAs that are extending the application and model will prefer to have a more normalized schema which better supports extensability and scalability
  c) example: If you allow power users to query using excel or other tools, the more normalized the database the more work they will have to do to get their data.
4)  What is the maintenance plan for this database?  How do you expect it to grow?  
  a) Flatter designs typically take up more space than the more normalized, with the exception of very small databases in 4th/5th NF

If you're asking this question, my suggestion is to do as much investigation in what you are developing and the normal forms.  You'll find that some things dont' make sense, while others do.  You want a model that is scalable, extensable, reusable, and most importantly *Maintainable*.  If you can't maintain it, it's just as bad as never making it.  (Or worse, since you spent resources on something that can't be maintained.)

Best of luck.
RiteshShahConnect With a Mentor Commented:
well, it is depends on your need, usage and performance because more normalization=less speed and more join
Aneesh RetnakaranConnect With a Mentor Database AdministratorCommented:
We usually wont go beyond the 3NF
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

tbsgadiConnect With a Mentor Commented:
Hi Mr_Shaw,

Have a look at Normalization: How far is far enough?

schwertnerConnect With a Mentor Commented:
You go to 4NF and BCNF  in some special cases
In these cases there are some anomalies that can hurt the design.

BCNF  - Occurs in the case of overlapping candidate keys
Fourth Normal Form

4NF Removes multi-valued dependencies.
Multi-valued dependency  when 3 attributes (A, B, C)
exist in a relation and for each value of A there is a well
defined set of values for B and a well defined set of values
for C, yet B and C are independent of each other
Why is your queston not posted in DB2, Ingress, Postgress, MySQL threads?
I hope you understand the fun.
dportasConnect With a Mentor Commented:
I suggest you design for at least BCNF and preferably 5NF unless you have a good reason not to. Apart from the obvious reasons of eliminating potential anomalies, 5NF is desirable because it helps to minimise any "bias" in your design. That means your design is best able to support new and changing requirements with the minimum of effort later on.
mrjoltcolaConnect With a Mentor Commented:
Good opinions already, I will add:

Good design is not a formula. It takes into consideration more than theoretical reasons for eliminating redundancy. It takes into consideration maintenance, performance and ease of mapping to another domain or an application language. It takes into consideration limitations of tools and people used to develop and maintain.
Normalization is a way to check the logical design of the project and to figure out
DML anomalies. Also it encounters mistakes by the logical design. Also it leads to best understanding of
business rules and data model.
In the same time normalization leads to fragmentation of the tables (the most common method
to normalize is to divide unnormalized tables in projection tables and ensure logical links between
using "primary-foreign key references". This brings wasting space to store the keys, wasting time to join the tables,
complex optimization of the query, unwanted full table scans, complex description of the modell, expensive programming and possible errors. Thatswhy after normalization major application experience denormalization to speed up the vital select statements. The slogan of Kevin Loney ("Oracle 8i the complete Reference") is "No major application will run in Third Normal Forms". This is the basis of the denormalization steps.
"No major application will run in Third Normal Forms"
Taken at face value that is utter nonsense of course, and provably so. In the context of Loney's book that quotation is used as a provocative rhetorical device to begin a discussion. It's a totally worthless comment when taken out of context.
I have seen applications in the United States that due the normalization (after joining 15-20 tables)
"seems to run forever" as one of the leads wrote to the developers staff.
Of course the slogan is an open door for discussion where to stop the normalization.

Let us speak now very seriosly!

Imagine a medium project and a select there in which are involved 15 tables.
Think about the joins - the links obviously should be indexed. From both sides - from
parent key side and from foreign key side. Now think what will be the cost of simple DML
if more tables (or even one) is involved. What will be the price of the maintenance of the indexes?

Will never forget Juliet Closs from New York: "I have the feeling  that the application runs forever!"

So i will say that the normalization helps to analyze deeper the relationships between the primary keys and components of them with the other columns in the tables. This is very good approach and helps to avoid running in troubles in the future development. But if the number of the tables is too big and the SELECT statements "run forever" ... something has to be done.

what about the famous article of Tom Kyte (Oracle Magazine, january 2008), where he wrote: " ... do not use bind variables! Yes, I have said: do not use bind variables!" So myths end!
The old luy of Oracle that if the application uses bind variables this will speed up the performance. This will only exclude the hard parsing, but the used execution plan could be the worst (and Tom Kyte shows such case in the article!).

moral of the story is, not to use NF beyond 3NF as I have explained in my very first post. not worth at all in 99% situation and remaining 1% situation yet to come in my life. ;)
Ritesh, Very many (and perhaps even most) 3NF schemas are already in 5NF and/or BCNF. So what you say doesn't make much sense to me and probably isn't even possible as far as I can see. If you create a 3NF design that is also in 5NF then how can you "not use" 5NF!? If you have a real suggestion for actually *avoiding* normalized designs then I think you need to give more explanation.

It's fatuous to talk about performance wthout considering a database's physical implementation and the actual operations being performed on it. Normalization tells us nothing about either of those things. Saying "more normalization=less speed" is like saying "red cars go faster than blue ones" while explaining nothing about the engines and spec of the cars you are talking about.
I have no clue where to stop the normalization.
Have the feeling that the deeper you investigate and normalize the best will you know
the relationship between the primary key and the other non-key columns.
So you will be able to investigate the dark corners of the project.
Your findings could be critical if really the model needs normalization beyound
3NF. Because if you look at the history of the NF ( there are steps,
in his first and original article Dr. Codd mentione only 1NF - no repeating groups and every row has a primary key):
1. E.F. Codd, "A Relational Model of Data for Large Shared Data  
    Banks", Comm. ACM 13 (6), June 1970, pp. 377-387.
    The original paper introducing the relational data model.

2. E.F. Codd, "Normalized Data Base Structure: A Brief Tutorial",
    ACM SIGFIDET Workshop on Data Description, Access, and
    Control, Nov. 11-12, 1971, San Diego, California, E.F. Codd and
    A.L. Dean (eds.).
   An early tutorial on the relational model and normalization.

3. E.F. Codd, "Further Normalization of the Data Base Relational
   Model", R. Rustin (ed.), Data Base Systems (Courant  
   ComputerScience Symposia 6), Prentice-Hall, 1972. Also IBM
   Research Report RJ909.
   The first formal treatment of second and third normal forms.

So every normal forms is invented because there was a real situation that leads to anomalies.

But from other hand: now a days applications have enormous number of tables (E-Business_Suite - about 9000, SAP - over 14000). The performance of both is bad, as markgeer wrote many times these products are not designed for performance.

Mr_ShawAuthor Commented:
thanks for giving me so much info
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.