Link to home
Start Free TrialLog in
Avatar of Jim Youmans
Jim YoumansFlag for United States of America

asked on

MySQL best way to merge two large tables

MySQL 5.6 on Centos 7
I have two very large tables (table A has 65 million rows and table B has 135 million rows) that I need to merge into a third table (lets call it table C).  Table C is partitioned on the CREATED_TS column.  A and B are identical except that they come from different "sources" and so in table C there is an extra column call OrigSource that has a code in it.
I have created a stored proc that cycles through each table and inserts 50K to 100K rows at a time but my issue is that it takes forever to run.  For Table A it took 10-12 hours and for table B it is still running after about 18 hours.
Is there a better, faster way to do this?  Can I export the tables with MySQLDump or something and then reinsert them into the new table?  Is there some kind of utility that will do this in a faster way?
Any suggestions would be greatly appreciated .  Thank you.
Jim
Avatar of Tomas Helgi Johannsson
Tomas Helgi Johannsson
Flag of Iceland image

Hi,
If you are building a new table that only you are accessing at the moment then I would consider using
select ... INTO outfile to export the data into a file and then LOAD DATA INFILE .. INTO TABLE to load it back to the destination table.
That procedure is much faster approach than regular inserts.
Also there is one parameter that you need to adjust and that is max_allowed_packet to make sure that you have enough buffersize and viz. speed up the unload and load. I would set it to the maximum which is 1GB.
https://dev.mysql.com/doc/refman/8.0/en/packet-too-large.html

And I suggest you run the mysqltuner alongside both the inserts that you are doing and the SELECT INTO/LOAD actions to monitor for any configuration bottlenecks that may appear, viz. look for any configuration that you might need to adjust to optimize your system.

Best regards,
    Tomas Helgi
1) No way to guess without full schemas of all 3x tables.

As you've described the schemas, a dump of A + B, then a restore into C will likely work.

2) You mentioned, "I have created a stored proc that cycles through each table and inserts 50K to 100K rows at a time but my issue is that it takes forever to run."


You've chosen the absolute slowest speed method to accomplish your task.

If speed is required, you'll do a dump of A + B, then restore both dumps to C.

3) Is there a better, faster way to do this?

Yes.

mysqldump - create dump files

mysql - restore dump files

4) Can I export the tables with MySQLDump or something and then reinsert them into the new table?

Yes. See above.

5) Is there some kind of utility that will do this in a faster way?

mysqldump + mysql
ASKER CERTIFIED SOLUTION
Avatar of Tomas Helgi Johannsson
Tomas Helgi Johannsson
Flag of Iceland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial