Link to home
Start Free TrialLog in
Avatar of xathras1982
xathras1982

asked on

MySQL RDS Instance taking forever to load CSV files.

Hello all,

I had a large file e.g. 85 million records that I needed to import into an Amazon Mysql RDS instance (t2.medium). I took the file from our server, zipped and moved it to S3. I then copied the file from S3 to a t2.micro instance and split the file into multiple parts e.g. 1 million rows.

I then have the following shell script running:
#!/bin/bash

for f in file_part*
do
    echo $f
    mysql -e "load data local infile '"$f"' into table TABLENAME FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES    TERMINATED BY '\n'" -u user --password=password database -h hostname
done

Open in new window


I am running the following and seeing its taken over 12 hours to load nearly 12 million records. Seems extremely slow any ideas?

SELECT TABLE_NAME,TABLE_ROWS, round(((data_length + index_length) / (1024*1024)), 2) "Size in MB"
FROM information_schema.TABLES 
WHERE table_schema = "database"

Open in new window


Wayne
Avatar of arnold
arnold
Flag of United States of America image

run explain query

and it will tell you what the issue is i.e. missing indexes, etc.

The definition of the tables meaning triggers, constraints and available resources might be part of the issue.
Avatar of xathras1982
xathras1982

ASKER

Hi Arnold, the issue is the load not the select query. running explain against load just gives an error:

12:21:31      explain load data local infile 'file_part04' into table MYTABLE FIELDS TERMINATED BY ',' ENCLOSED BY '"'      Error Code: 1064. You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'load data local infile 'file_part04' into table MYTABLE FIELDS TERMINATED BY ' at line 1      0.031 sec
how much space does each row require? Your issue could be the overhead of space allocation.  If you could increase the allocated space it may speed up the load process. What other index, constraints triggers do you have defined on the table?

One option to consider is using a temp/separate table on which there are minimal index, constraints setup. Once the data is loaded in, you would run the select into

A smaller test sample 10-30 rows in a file load it and see how long it takes.
What do you man when you refer to how much space does each row require and overhead of space allocation? How do i check that?

As for indexes and triggers, I removed all of them to see if that was slow down and that really didn't help matters.
SOLUTION
Avatar of arnold
arnold
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Some good suggestions here. However the ultimate reason was the buffer size