MySQL large table design question


I am using MySQL 5 and designing some tables that will hold quite a large amount of data which in time will probably exceed 10,000 rows and some others will be over a million rows.

These tables have some optional data, such as “description” that will be applicable to some rows but not all. I have designed the tables so that there is an Int id that links to a content table where the optional data is stored. If the row has no description, then the content id is set to 0, but if data has been stored for it, it’s set to the relevant key id of the content row in the content table.

This means I have designed the large tables to be a bunch of Ints – linking to other tables for the different data they hold. Since much of the data is optional, this avoids having TEXT and large VARCHAR’s being stored in the big table. Also, because different parts of the system will require different parts of the data this means they aren’t dealing with one big table – they just fetch data from the relevant sub-tables as needed. For example – when calculating stock, the description, images, title and so on are not required.

My logic for this is based on trying to make the heavy tables as light as possible, trying to avoid them becoming physically huge with both data and numbers of rows.

I would like to check this is the right thing to do as it will require a little more work on the server in terms of doing a few extra short joins and also a few extra queries in some cases, but I would hope having the tables nice and clean will help with performance.

Is this correct? Is this a favourable way of designing large databases?

Also, I plan on using Transactions to ensure everything stays valid and reliable and ensure linked rows don’t get separated from parents/children. I know I will need to use InoDB tables for this, but should I be using InoDB for all tables, or should I be using MyISAM for everything except the important transaction based tables?

Is InoDB much less efficient compared to MyISAM?

Many Thanks,
Who is Participating?
I don't believe that views are as efficient as you describe - anyway there would be a huge overhead in keeping them up to date if that was the case - the only caching will be the optimised versions of the qeries to "build" the "view".
The main purpose of views is to make the queries easier to write; optimisations will be made at query level, but not at data storage level -- again, ensure you have appropriate indexes for views.

In the past I've actually gone down the route of having duplicated data in order to achieve acceptable performance - however this should be avoided everywhere possible; it was only used as a last resort.

JOINs accross 8 tables will have quite a high overhead - be sure to use the most appropriate JOIN for the task at hand, and use subqueries where possible aswell (they are cached seperately and thus can give large performance boost)

Make sure you allow MySQL to keep a large query cache in memory and you should notice a good performance boost.
Your approach is a valid one; however, you may find that due to that drastically varying ways in which databases optimise storage and queries - your attempt to improve performance at a high level may not actually give any benefit -- but it depends on the queries and volumes there-of that you will be executing.

I used an equivalent table layout to you when I built a large database with conditional relations, and it worked well.
Sticking to a normalised form is considered good practice from an information architecture point-of-view --- but when accounting for database performance optimisations, your approach is debatably better.
Any large pieces of data will be best in seperate tables unless they are used almost as heavily as the rest of the data -- your example above suggests that seperation is a good thing in your case.

Your main priority for performance should be ensuring that you create the relevant indexes -- this somewhat gets arround the issues of variable length records, though fixed length will always be the fastest.

Unless you have large, unimportant tables which will be heavily used, then I would keep everything as InnoDB to ensure concurrency throughout - the efficiency difference is very small given the increased robustness which should be a priority in almost all business systems.
SamWitneyAuthor Commented:
Thanks for your reply, that helps a lot.

If I use Views to make de-normalised versions of certain groups of tables (such as "products in stock" with all the product info, including content, stock level etc), will this increase performance when performing queries on the view, rather than large join queries each time? IE - are Views cached, and updated when the main tables are updated...?

That query would involve about 8 tables, as stock is quite complex as is some of the product info.

Also, based on my design above is MySQL 5 good at doing the joins to link the "content" to the tables, given that both tables are correctly indexed on the PK/FK's?
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.