Solved

Beyond 3NF

Posted on 2009-05-12
15
854 Views
Last Modified: 2012-08-13
When normalizing a database should I go beyond 3NF
0
Comment
Question by:Mr_Shaw
  • 5
  • 3
  • 2
  • +5
15 Comments
 
LVL 31

Assisted Solution

by:RiteshShah
RiteshShah earned 70 total points
ID: 24364288
well, it is depends on your need, usage and performance because more normalization=less speed and more join
0
 
LVL 75

Assisted Solution

by:Aneesh Retnakaran
Aneesh Retnakaran earned 70 total points
ID: 24364318
We usually wont go beyond the 3NF
0
 
LVL 46

Assisted Solution

by:tbsgadi
tbsgadi earned 20 total points
ID: 24364387
Hi Mr_Shaw,

Have a look at Normalization: How far is far enough?
http://articles.techrepublic.com.com/5100-10878_11-6140413.html




Gary
0
 
LVL 9

Accepted Solution

by:
Hwkranger earned 90 total points
ID: 24364572
Aside from what everyone said, I'll give you a little bit of real-world advice on what normal form to use.

1)  For what are you designing this application?
   a) example: data warehouses many use the fact/dimension modeling approach
   b) example: reporting "data marts" are sometimes very flat to quickly report data
   c) example: many entities map closely to objects in an application in 3rd NF, a 5th NF can create a nightmare for the data access layer developers.  (Which also can slow down application performance)
2) What is the life expentancy of the application?
  a) example: a quick tool doesn't require 3rd, BCNF, 4th, etc normal forms.
  b) example: an enterprise application that will remain for years probably should not be a flat design
3)  What is the targetted audience that will be accessing the data directly
  a) example: Reporting analysts that are not skilled with SQL will prefer to have a flatter schema
  b) example: Developers / DBAs that are extending the application and model will prefer to have a more normalized schema which better supports extensability and scalability
  c) example: If you allow power users to query using excel or other tools, the more normalized the database the more work they will have to do to get their data.
4)  What is the maintenance plan for this database?  How do you expect it to grow?  
  a) Flatter designs typically take up more space than the more normalized, with the exception of very small databases in 4th/5th NF

If you're asking this question, my suggestion is to do as much investigation in what you are developing and the normal forms.  You'll find that some things dont' make sense, while others do.  You want a model that is scalable, extensable, reusable, and most importantly *Maintainable*.  If you can't maintain it, it's just as bad as never making it.  (Or worse, since you spent resources on something that can't be maintained.)

Best of luck.
   
0
 
LVL 47

Assisted Solution

by:schwertner
schwertner earned 90 total points
ID: 24364971
You go to 4NF and BCNF  in some special cases
In these cases there are some anomalies that can hurt the design.

BCNF  - Occurs in the case of overlapping candidate keys
Fourth Normal Form

4NF Removes multi-valued dependencies.
Multi-valued dependency  when 3 attributes (A, B, C)
exist in a relation and for each value of A there is a well
defined set of values for B and a well defined set of values
for C, yet B and C are independent of each other
0
 
LVL 47

Expert Comment

by:schwertner
ID: 24366618
Why is your queston not posted in DB2, Ingress, Postgress, MySQL threads?
I hope you understand the fun.
0
 
LVL 22

Assisted Solution

by:dportas
dportas earned 80 total points
ID: 24367499
I suggest you design for at least BCNF and preferably 5NF unless you have a good reason not to. Apart from the obvious reasons of eliminating potential anomalies, 5NF is desirable because it helps to minimise any "bias" in your design. That means your design is best able to support new and changing requirements with the minimum of effort later on.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 40

Assisted Solution

by:mrjoltcola
mrjoltcola earned 80 total points
ID: 24368097
Good opinions already, I will add:

Good design is not a formula. It takes into consideration more than theoretical reasons for eliminating redundancy. It takes into consideration maintenance, performance and ease of mapping to another domain or an application language. It takes into consideration limitations of tools and people used to develop and maintain.
0
 
LVL 47

Expert Comment

by:schwertner
ID: 24372484
Normalization is a way to check the logical design of the project and to figure out
DML anomalies. Also it encounters mistakes by the logical design. Also it leads to best understanding of
business rules and data model.
In the same time normalization leads to fragmentation of the tables (the most common method
to normalize is to divide unnormalized tables in projection tables and ensure logical links between
using "primary-foreign key references". This brings wasting space to store the keys, wasting time to join the tables,
complex optimization of the query, unwanted full table scans, complex description of the modell, expensive programming and possible errors. Thatswhy after normalization major application experience denormalization to speed up the vital select statements. The slogan of Kevin Loney ("Oracle 8i the complete Reference") is "No major application will run in Third Normal Forms". This is the basis of the denormalization steps.
0
 
LVL 22

Expert Comment

by:dportas
ID: 24372695
"No major application will run in Third Normal Forms"
Taken at face value that is utter nonsense of course, and provably so. In the context of Loney's book that quotation is used as a provocative rhetorical device to begin a discussion. It's a totally worthless comment when taken out of context.
0
 
LVL 47

Expert Comment

by:schwertner
ID: 24373453
I have seen applications in the United States that due the normalization (after joining 15-20 tables)
"seems to run forever" as one of the leads wrote to the developers staff.
Of course the slogan is an open door for discussion where to stop the normalization.

Let us speak now very seriosly!

Imagine a medium project and a select there in which are involved 15 tables.
Think about the joins - the links obviously should be indexed. From both sides - from
parent key side and from foreign key side. Now think what will be the cost of simple DML
if more tables (or even one) is involved. What will be the price of the maintenance of the indexes?

Will never forget Juliet Closs from New York: "I have the feeling  that the application runs forever!"

So i will say that the normalization helps to analyze deeper the relationships between the primary keys and components of them with the other columns in the tables. This is very good approach and helps to avoid running in troubles in the future development. But if the number of the tables is too big and the SELECT statements "run forever" ... something has to be done.

Portas,
what about the famous article of Tom Kyte (Oracle Magazine, january 2008), where he wrote: " ... do not use bind variables! Yes, I have said: do not use bind variables!" So myths end!
The old luy of Oracle that if the application uses bind variables this will speed up the performance. This will only exclude the hard parsing, but the used execution plan could be the worst (and Tom Kyte shows such case in the article!).
Myths!
0
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24373470
schwertner,

moral of the story is, not to use NF beyond 3NF as I have explained in my very first post. not worth at all in 99% situation and remaining 1% situation yet to come in my life. ;)
0
 
LVL 22

Expert Comment

by:dportas
ID: 24373574
Ritesh, Very many (and perhaps even most) 3NF schemas are already in 5NF and/or BCNF. So what you say doesn't make much sense to me and probably isn't even possible as far as I can see. If you create a 3NF design that is also in 5NF then how can you "not use" 5NF!? If you have a real suggestion for actually *avoiding* normalized designs then I think you need to give more explanation.

It's fatuous to talk about performance wthout considering a database's physical implementation and the actual operations being performed on it. Normalization tells us nothing about either of those things. Saying "more normalization=less speed" is like saying "red cars go faster than blue ones" while explaining nothing about the engines and spec of the cars you are talking about.
0
 
LVL 47

Expert Comment

by:schwertner
ID: 24373607
I have no clue where to stop the normalization.
Have the feeling that the deeper you investigate and normalize the best will you know
the relationship between the primary key and the other non-key columns.
So you will be able to investigate the dark corners of the project.
Your findings could be critical if really the model needs normalization beyound
3NF. Because if you look at the history of the NF ( there are steps,
in his first and original article Dr. Codd mentione only 1NF - no repeating groups and every row has a primary key):
1. E.F. Codd, "A Relational Model of Data for Large Shared Data  
    Banks", Comm. ACM 13 (6), June 1970, pp. 377-387.
    The original paper introducing the relational data model.

2. E.F. Codd, "Normalized Data Base Structure: A Brief Tutorial",
    ACM SIGFIDET Workshop on Data Description, Access, and
    Control, Nov. 11-12, 1971, San Diego, California, E.F. Codd and
    A.L. Dean (eds.).
   An early tutorial on the relational model and normalization.

3. E.F. Codd, "Further Normalization of the Data Base Relational
   Model", R. Rustin (ed.), Data Base Systems (Courant  
   ComputerScience Symposia 6), Prentice-Hall, 1972. Also IBM
   Research Report RJ909.
   The first formal treatment of second and third normal forms.

So every normal forms is invented because there was a real situation that leads to anomalies.

But from other hand: now a days applications have enormous number of tables (E-Business_Suite - about 9000, SAP - over 14000). The performance of both is bad, as markgeer wrote many times these products are not designed for performance.

0
 

Author Closing Comment

by:Mr_Shaw
ID: 31580517
thanks for giving me so much info
0

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Join & Write a Comment

Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
This video shows syntax for various backup options while discussing how the different basic backup types work.  It explains how to take full backups, incremental level 0 backups, incremental level 1 backups in both differential and cumulative mode a…
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now