Link to home
Start Free TrialLog in
Avatar of Gary
GaryFlag for Ireland

asked on

Database normalization

Just curious what the Guru's (and above) thoughts are on normalization whereby you separate TEXT fields from a table. I was brought up on where a field was an indeterminate length then you should separate it but is that the case with MySQL? Is it true within InnoDB and/or MyISAM?
Am I just wasting my time separating the data and joining tables?
Or would I be better off just putting those fields in the main table even where some of those fields maybe blank?
SOLUTION
Avatar of Dave Baldwin
Dave Baldwin
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
There are a couple of very fundamental thoughts behind relational design.  The base idea is that of a relation, which is a table.

A table should describe one "thing", or entity.  For example, customers.    All the data about that "thing" belongs in that table and that table only.   For example, customer name belongs here and not in an orders table.  By doing this, you avoid problems with data being stored in multiple places.

As part of forming the relation, all fields should be "atomic".  That is that should not be able to be split.  Example  "1 hammer, 2 nails"  has a qty and description, and has multiples of both.  Those are no-no's.

The other part of a relation is that each row needs to be uniquely identifiable.  If by combining every fact (column), you cannot uniquely identify the row, you don't have enough facts.

For a relational database to function well and properly, you do need to normalize.   However if it is not a relational DBMS, then there is no real requirement to normalize.

However since normalization is a process for organizing data in a mathematical and logical way, it makes sense to use it in many situations.   And if your using SQL with the DBMS, then it really makes sense to try and use it.

The mathematical model behind relational design stems from set theory, and SQL is designed specifically to work with sets of records.

Jim.
Avatar of Gary

ASKER

The question is specifically about TEXT/BLOB fields, not about normal normalisation to preventing duplication.
Dave...
Large TEXT and BLOB fields are stored in 'hidden tables' outside the main table file. Sometimes.
What is the sometimes?
I am reading conflicting advice about the best method, from makes no difference to memory storage for fixed length rows...etc
Also just reading up on https://dev.mysql.com/doc/innodb/1.1/en/innodb-other-changes-file-formats.html
<<The question is specifically about TEXT/BLOB fields, not about normal normalisation to preventing duplication.>>

 Well you better clairfy the question then.  Doesn't seem like your talking about normalization at all.

 re-reading your question and your last comment, it seems more about data storage within a DBMS system, which is something an end user is typically not involved with.

 There no reason I'm aware of that one would ever want to break a text field from the rest of the record simply because it's variable length.  It's up to the DBMS to decide how something like that should be physically stored.

  With that said, there are times when you might want to poke a DBMS into doing something it ordnairly would not do, but you need a through understand of how the DBMS system works.

  A good example of that is with JET databases; you can get "record level" locking by understanding that JET never splits a record across multiple pages.  So if you pad the record past half a page size, you end up with one record per page and thus record level locking even though the DBMS is locking at page level.   To do that properly though, you also need to understand that the OLE and Memo field types are not stored with the rest of the record, but in another part of the database.

 It would help if you would calirfy exactly what your going after here.

Jim.
Avatar of Gary

ASKER

It is a shopping website.
I have a table with fixed length fields, about 5000 rows, which stores things like product name, weight, cost, price, etc
For every row there are two variable length Text fields - could be 100 characters, could be 1000's of characters. And some rows have nothing at all.
This is MySQL - not MSSQL or Access (you are mentioning Jet and Memo)

The question is specifically is there any advantage in InnoDB to seperating the Text fields to their own table.
Normalization is a term of art in data base work.  Make a Google search for the exact phrase, "Should I Normalize my Database" and read the very good ideas from several different viewpoints.

If you have a table of 5,000 rows, you are unlikely to see any difference from just separating out the text columns.  Here's the math...

Assume 6,000 text bytes per row
Assume 5,000 rows
Data Size is 29 MB

That fits in memory.

This would be a different answer if the numbers were substantially higher, in which case I would recommend "normalization" of these long text columns.
Avatar of Gary

ASKER

Would using Barracuda file format make any difference?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Gary

ASKER

This is just text, no images et al
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
This is just text, no images et al
Yes, I understand.  But I think the size of the text may make the same principles applicable.
Avatar of Gary

ASKER

@Dave
So basically storing it directly in the main table makes no difference as far as MySQL is concerned as it ends up being two tables anyway. This is the general concensus I found reading various sites.
That's what I understand except that the second table is not directly visible to you, it is automatically generated by the MySQL storage engine.  There are limits on what you can do with a TEXT or BLOB field.  I don't think you can index them.  But as Ray pointed out, if you "SELECT *" you will retrieve them which slow things down if you are not actually using the data.
Avatar of Gary

ASKER

They are only retrieved on one page and updated on another page, so no select * for every query
I usually set some limit on the text that can be entered thru a form.  I would say that if your data will fit in one row / 65535 bytes then I wouldn't worry about.  If it was image or PDF files, I would store them outside the database anyway and only put the file name in the database.
Maybe you should ask yourself what benefit you would hope to achieve by storing the data directly in the database versus storing it in a file, with a pointer to the file being stored in the database. Cost-benefit analysis of either approach might have bearing on which way you go with your design.