MySQL RDS Instance taking forever to load CSV files.

Hello all,

I had a large file e.g. 85 million records that I needed to import into an Amazon Mysql RDS instance (t2.medium). I took the file from our server, zipped and moved it to S3. I then copied the file from S3 to a t2.micro instance and split the file into multiple parts e.g. 1 million rows.

I then have the following shell script running:
#!/bin/bash

for f in file_part*
do
    echo $f
    mysql -e "load data local infile '"$f"' into table TABLENAME FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES    TERMINATED BY '\n'" -u user --password=password database -h hostname
done

Open in new window


I am running the following and seeing its taken over 12 hours to load nearly 12 million records. Seems extremely slow any ideas?

SELECT TABLE_NAME,TABLE_ROWS, round(((data_length + index_length) / (1024*1024)), 2) "Size in MB"
FROM information_schema.TABLES 
WHERE table_schema = "database"

Open in new window


Wayne
xathras1982Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

arnoldCommented:
run explain query

and it will tell you what the issue is i.e. missing indexes, etc.

The definition of the tables meaning triggers, constraints and available resources might be part of the issue.
xathras1982Author Commented:
Hi Arnold, the issue is the load not the select query. running explain against load just gives an error:

12:21:31      explain load data local infile 'file_part04' into table MYTABLE FIELDS TERMINATED BY ',' ENCLOSED BY '"'      Error Code: 1064. You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'load data local infile 'file_part04' into table MYTABLE FIELDS TERMINATED BY ' at line 1      0.031 sec
arnoldCommented:
how much space does each row require? Your issue could be the overhead of space allocation.  If you could increase the allocated space it may speed up the load process. What other index, constraints triggers do you have defined on the table?

One option to consider is using a temp/separate table on which there are minimal index, constraints setup. Once the data is loaded in, you would run the select into

A smaller test sample 10-30 rows in a file load it and see how long it takes.
SolarWinds® Network Configuration Manager (NCM)

SolarWinds® Network Configuration Manager brings structure and peace of mind to configuration management. Bulk config deployment, automatic backups, change detection, vulnerability assessments, and config change templates reduce the time needed for repetitive tasks.

xathras1982Author Commented:
What do you man when you refer to how much space does each row require and overhead of space allocation? How do i check that?

As for indexes and triggers, I removed all of them to see if that was slow down and that really didn't help matters.
arnoldCommented:
Each row of data has a space associated with it
column of 4 characters means it needs 4*8bites

You say your setup is on t2.medium yet you transferred the file to a separate instance versus to the same where your mysql is.

that could add  to the processing.

try the following
head -10 csv_filename > newfile.csv

 
and then try to load the 10 rows from newfile.csv, how long does it take.

instead of using the script,
use the mysql interface directly and type in
mysql -h host -u username -p databasename
you will be prompted for the password.

then run the
load data local infile '"/path/to/where/the/file/newfile.csv"' into table databasename.TABLENAME FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES    TERMINATED BY '\n';

and see what happens.

using a script, you do not check for errors before going to the next file.

see if you get any error or any indication what might be preventing the timely load of the data.
xathras1982Author Commented:
Arnold. Thanks for your help. Think i've found the problem though.

The RDS MySQL Instance default for insert buffer size was pretty poor. I changed my script and added the following prior to the data load:
SET bulk_insert_buffer_size = 1024 * 1024 * 1024;

Open in new window


I then had a clone of the table and ran my script again with this included. So the script looks more like this now:

#!/bin/bash

for f in file_part*
do
        echo $f
        mysql -e "SET bulk_insert_buffer_size = 1024 * 1024 * 1024;load data local infile '"$f"' into table TABLE_clone FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n'" -u user --password=password database -h hostname.rds.amazonaws.com
        rm -rf $f
done

Open in new window


This is averaging around 10-15 million inserts per hour now.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
xathras1982Author Commented:
Some good suggestions here. However the ultimate reason was the buffer size
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
MySQL Server

From novice to tech pro — start learning today.