Large number of storage / insertion of records in MySQL - Part 1

Ajit_Kr
Ajit_Kr used Ask the Experts™
on
Please provide solution for following issues for a web based application in ASP.NET 3.5:

There is requirement for approx 10 to 20 million records entry in the database daily, or we can say that there would be around entry of 10 lakh records per table on daily basis. What would be the best Database Design, Model & Architecture in MySQL database. Need proper optimization for quick retrieval, insertion, updates & deletion. Also provide the hardware detail that would required to handle such a large database in long term.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
If one wants to deal with your question in detail, refer:
http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/
http://bytes.com/groups/mysql/77103-mysql-large-table-slow-3-gig-table-3-days-reindex
http://bytes.com/groups/mysql/171643-going-large-table-any-tips
http://forums.mysql.com/read.php?21,36008,36008
http://forums.mysql.com/read.php?32,92682,92682

Now, from my experience:
Large data should be loaded using LOAD DATA INFILE. Basically without logging like what happens using TRUNCATE TABLE. Commit frequently. If it is one insert at a time from application, it is okay.  Take query execution plan using:
EXPLAIN tbl_name
Or:
EXPLAIN [EXTENDED] SELECT select_options

Create good indexes choosing from clustered/non-clustered, simple/composite, duplicate/unique. Choose columns based on high cardinality/uniqueness first in composite index. Make primary keys as clustered if they will not be updated/deleted. Create indexes on foreign keys to make joins fast since for PK MySQL will auto generate index. Choose columns in ON, WHERE, GROUP BY, ORDER BY clauses per table in correct combination and order index.

Design and model will be choose integer type columns as keys for joining. Do not use predicates which do pick index like IS NULL, IS NOT NULL, LIKE etc. For this make columns as NOT NULL with a DEFAULT. Advantage with MySQL is many constraints existing in other RDBMS are missing. But then again they need to be implemented from application if required.

Do normalization till form 3 avoiding INSERT/UPDATE/DELETE anamolies and make sure that consistency is maintained by application. Keep big tables in separate drives in windows and separate partitions in UNIX/Solaris/AIX/Linux. Keep tables in one place and indexes in another. This improves I/O parallelism.

Try in RAM, you have buffer pool size a little bigger than total size of tables + indexes on disk. Use 2/more CPU's and enable CPU parallelism. Use ANALYZE table command to keep statistics used by cost based optimizer upto date. There is no REORG in MySQL. You need to recreate the table/index from scratch using INSERT or MYSQLDUMP followed by MYSQL.

Use CHAR for fixed length data i.e. if you are going to stored > 90 % of size of column else use VARCHAR for variable length data. Use DATETIME or TIMESTAMP instead of separate DATE and TIME unless absolutely desired. Use VIEWS for complex queries to save on client side compile i.e. syntax and server side semantics i.e. existence of tables and columns & privilges.

If possible break huge table into many vertically where in the query can be easily directed to the correct split one. Keep drive/partition where MySQL is installed free from other applications. Keep enough disk free space in MySQL & OS drive/partition. Still better keep separate hard disks for big tables instead of a combined big one since hard disk head will be less loaded for disk seek and IO. Make design in such a way that master/lookup/type/read only/fact tables join with transaction/dimension tables and not other transaction tables. If it is read only database, you can go for OLAP design instead of OLTP.

Keep moving not actively required data to history/summary/archive tables in same schema/database or even different database. If data is not required at all unlike soft delete do a permanent hard delete. Hope this helps.

Author

Commented:
Thanks for the sugestion... in right direction.
The complete solution to this issue is very wide, I think it can't be descrided here. It would be more better if you can provide me more links.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial