Solved

Database normalization

Posted on 2014-02-06
16
225 Views
Last Modified: 2014-02-07
Just curious what the Guru's (and above) thoughts are on normalization whereby you separate TEXT fields from a table. I was brought up on where a field was an indeterminate length then you should separate it but is that the case with MySQL? Is it true within InnoDB and/or MyISAM?
Am I just wasting my time separating the data and joining tables?
Or would I be better off just putting those fields in the main table even where some of those fields maybe blank?
0
Comment
Question by:Gary
  • 6
  • 4
  • 3
  • +2
16 Comments
 
LVL 82

Assisted Solution

by:Dave Baldwin
Dave Baldwin earned 333 total points
ID: 39841209
I thought you were supposed to store 'constant' and reused fields in their own separate tables so that any changes would be reflected in all the other places that the data is used.

The storage requirements for the different 'engines' in MySQL are a little confusing.  Large TEXT and BLOB fields are stored in 'hidden tables' outside the main table file.  Sometimes.  This link will take you to the beginning of the description of the storage requirements and the different 'engines'.

https://dev.mysql.com/doc/refman/5.5/en/storage-requirements.html
0
 
LVL 57
ID: 39841567
There are a couple of very fundamental thoughts behind relational design.  The base idea is that of a relation, which is a table.

A table should describe one "thing", or entity.  For example, customers.    All the data about that "thing" belongs in that table and that table only.   For example, customer name belongs here and not in an orders table.  By doing this, you avoid problems with data being stored in multiple places.

As part of forming the relation, all fields should be "atomic".  That is that should not be able to be split.  Example  "1 hammer, 2 nails"  has a qty and description, and has multiples of both.  Those are no-no's.

The other part of a relation is that each row needs to be uniquely identifiable.  If by combining every fact (column), you cannot uniquely identify the row, you don't have enough facts.

For a relational database to function well and properly, you do need to normalize.   However if it is not a relational DBMS, then there is no real requirement to normalize.

However since normalization is a process for organizing data in a mathematical and logical way, it makes sense to use it in many situations.   And if your using SQL with the DBMS, then it really makes sense to try and use it.

The mathematical model behind relational design stems from set theory, and SQL is designed specifically to work with sets of records.

Jim.
0
 
LVL 58

Author Comment

by:Gary
ID: 39841868
The question is specifically about TEXT/BLOB fields, not about normal normalisation to preventing duplication.
Dave...
Large TEXT and BLOB fields are stored in 'hidden tables' outside the main table file. Sometimes.
What is the sometimes?
I am reading conflicting advice about the best method, from makes no difference to memory storage for fixed length rows...etc
Also just reading up on https://dev.mysql.com/doc/innodb/1.1/en/innodb-other-changes-file-formats.html
0
 
LVL 57
ID: 39841927
<<The question is specifically about TEXT/BLOB fields, not about normal normalisation to preventing duplication.>>

 Well you better clairfy the question then.  Doesn't seem like your talking about normalization at all.

 re-reading your question and your last comment, it seems more about data storage within a DBMS system, which is something an end user is typically not involved with.

 There no reason I'm aware of that one would ever want to break a text field from the rest of the record simply because it's variable length.  It's up to the DBMS to decide how something like that should be physically stored.

  With that said, there are times when you might want to poke a DBMS into doing something it ordnairly would not do, but you need a through understand of how the DBMS system works.

  A good example of that is with JET databases; you can get "record level" locking by understanding that JET never splits a record across multiple pages.  So if you pad the record past half a page size, you end up with one record per page and thus record level locking even though the DBMS is locking at page level.   To do that properly though, you also need to understand that the OLE and Memo field types are not stored with the rest of the record, but in another part of the database.

 It would help if you would calirfy exactly what your going after here.

Jim.
0
 
LVL 58

Author Comment

by:Gary
ID: 39841978
It is a shopping website.
I have a table with fixed length fields, about 5000 rows, which stores things like product name, weight, cost, price, etc
For every row there are two variable length Text fields - could be 100 characters, could be 1000's of characters. And some rows have nothing at all.
This is MySQL - not MSSQL or Access (you are mentioning Jet and Memo)

The question is specifically is there any advantage in InnoDB to seperating the Text fields to their own table.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39842131
Normalization is a term of art in data base work.  Make a Google search for the exact phrase, "Should I Normalize my Database" and read the very good ideas from several different viewpoints.

If you have a table of 5,000 rows, you are unlikely to see any difference from just separating out the text columns.  Here's the math...

Assume 6,000 text bytes per row
Assume 5,000 rows
Data Size is 29 MB

That fits in memory.

This would be a different answer if the numbers were substantially higher, in which case I would recommend "normalization" of these long text columns.
0
 
LVL 58

Author Comment

by:Gary
ID: 39842176
Would using Barracuda file format make any difference?
0
 
LVL 108

Assisted Solution

by:Ray Paseur
Ray Paseur earned 167 total points
ID: 39842398
I'm not familiar with Barracuda, but I can share a general idea that might be useful.  About twice a month somebody posts a question here at EE surrounding the topic of putting images or videos into a data base.  We try to discourage that because those sorts of things are usually very large, perhaps megabytes.   SQL-type data bases are not well-designed for that kind of storage.  The folks who do not understand that also do not understand why they would not want to use SELECT * when querying the data base.  The symptom that often sends them to EE is "my data base is so slow."

The usual advice is to remove the images from the data base and store the images in the file system.  Store the URL of the images in the data base instead of the images.

If you removed the large wide columns from your DB and stored them in the file system, with a file pointer in the DB, you would be following the same general idea.  It might be interesting to time the application to get before and after comparisons.
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 58

Author Comment

by:Gary
ID: 39842466
This is just text, no images et al
0
 
LVL 82

Accepted Solution

by:
Dave Baldwin earned 333 total points
ID: 39842518
In the link I posted, some of the MySQL engines will store only the first 256 bytes of a TEXT or BLOB field in the row and anything over that in a 'hidden' table.  Others will store it inline in the row if it will fit along with the other data in the row limit of 65,536 bytes.  That is why I said 'Sometimes'.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39842588
This is just text, no images et al
Yes, I understand.  But I think the size of the text may make the same principles applicable.
0
 
LVL 58

Author Comment

by:Gary
ID: 39842626
@Dave
So basically storing it directly in the main table makes no difference as far as MySQL is concerned as it ends up being two tables anyway. This is the general concensus I found reading various sites.
0
 
LVL 82

Expert Comment

by:Dave Baldwin
ID: 39842659
That's what I understand except that the second table is not directly visible to you, it is automatically generated by the MySQL storage engine.  There are limits on what you can do with a TEXT or BLOB field.  I don't think you can index them.  But as Ray pointed out, if you "SELECT *" you will retrieve them which slow things down if you are not actually using the data.
0
 
LVL 58

Author Comment

by:Gary
ID: 39842670
They are only retrieved on one page and updated on another page, so no select * for every query
0
 
LVL 82

Expert Comment

by:Dave Baldwin
ID: 39842688
I usually set some limit on the text that can be entered thru a form.  I would say that if your data will fit in one row / 65535 bytes then I wouldn't worry about.  If it was image or PDF files, I would store them outside the database anyway and only put the file name in the database.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 39842704
Maybe you should ask yourself what benefit you would hope to achieve by storing the data directly in the database versus storing it in a file, with a pointer to the file being stored in the database. Cost-benefit analysis of either approach might have bearing on which way you go with your design.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now