finnegandev
asked on
MySQL LOAD DATA INFILE question
We have an extremely large database that another company gives to us. Everyday they drop around 20 different CSV files with new records and updates. I'd like to use LOAD DATA INFILE because it's so much faster, is there a way to have it overwrite the records that already have ID's as well as add the ones that don't?
If I need to do this with php, is there a "universal" script I can write that will automate it without having to enter thousands of different fields? I wanted to use a full dump of each table every night but the company won't play nice so all I get are the updates to the table.
Enough rambling, what's the best way to get these "updates" into mysql without writing a script to tranverse 200k records every day, one field at a time?
If I need to do this with php, is there a "universal" script I can write that will automate it without having to enter thousands of different fields? I wanted to use a full dump of each table every night but the company won't play nice so all I get are the updates to the table.
Enough rambling, what's the best way to get these "updates" into mysql without writing a script to tranverse 200k records every day, one field at a time?
Use "REPLACE INTO":
LOAD DATA LOCAL INFILE 'your_path/your_file.csv' REPLACE INTO TABLE your_table ......... blah blah.
LOAD DATA LOCAL INFILE 'your_path/your_file.csv' REPLACE INTO TABLE your_table ......... blah blah.
Why you should not use REPLACE INTO. It is effectively a delete and insert. An ON DUPLICATE key allows you to do the same thing without the overhead (and side effects) of a delete and insert. For example if you have an autonumber on your table a REPLACE INTO will cause all the autonumbers to change - as the record will be removed and re-inserted - this may not be desirable.
From the docs
From the docs
REPLACE is a MySQL extension to the SQL standard. It either inserts, or deletes and inserts. For another MySQL extension to standard SQL—that either inserts or updates—see Section 13.2.5.3, “INSERT ... ON DUPLICATE KEY UPDATE Syntax”.
You're 100% correct, Julian.
But, well, the author had these 2 phrases in his question:
- have it overwrite the records that already have ID's (very likely there's no autonumber)
- without having to enter thousands of different fields (yours requires entering those fields in the update part)
Some experts are more like educators and want to guide people into a better programmer. I respect them very much, but I, having been in technical support for too many years, tend to solve problems directly, one by one.
But, well, the author had these 2 phrases in his question:
- have it overwrite the records that already have ID's (very likely there's no autonumber)
- without having to enter thousands of different fields (yours requires entering those fields in the update part)
Some experts are more like educators and want to guide people into a better programmer. I respect them very much, but I, having been in technical support for too many years, tend to solve problems directly, one by one.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Would replace into delete all the data that's not duplicated?
Replace into does a delete and a re-insert of the data so it will remove a record if it exists and then add the record back again from the source.
ASKER
But will it write the new data to the table also?
Would there be a problem if I did a regular load data infile and then did the replace after? I would assume that the regular load data would error on the duplicate key?
Would there be a problem if I did a regular load data infile and then did the replace after? I would assume that the regular load data would error on the duplicate key?
Yes - however, I would recommend the process described above - the script posted is fully functional and generic - you provide it the table names and import files and it should do the rest.
ASKER
The only problem with that is I'm writing hundreds of thousands (900k+) in several of the tables and I'm worried about the script execution time and bogging down the server while the script runs. Some of the csv files I'm reading from are 30+ mb. I see it's using the load data infile but how is MySql at working between tables with large data sources?
ASKER
This is what I currently have, but if everyones in agreement that the other method isn't too taxing then I'd be happy to use that way.
on a side note, that script posted is beautiful.
$mysqli = mysqli_init();
mysqli_options($mysqli, MYSQLI_OPT_LOCAL_INFILE, true);
mysqli_real_connect($mysqli,$host,$user,$password, $database);
$query = SetupReplace($thedate,"SO.csv","bt_so");
$result = $mysqli->query($query);
if (!$result) {
echo "Failed to load " . $mysqli->error . "<br/>";
}
function SetupReplace($folder,$file,$table)
{
$myquery = <<< QUERY
LOAD DATA LOCAL INFILE '/var/csv/extracted/{$folder}/{$file}' REPLACE INTO TABLE {$table} FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\\r\\n' IGNORE 1 LINES
QUERY;
return $myquery;
}
on a side note, that script posted is beautiful.
bogging down the server while the script runsIrrespective of which method you use you will have to use server processing time so not sure why this is any different.
The LOAD INFILE is the fastest way to bring the data in - however you are going to have to experiment with the process to find what works best for you.
Given the size of your data temporary tables might not work as there is a memory limit imposed on them. You might need to use a standard table to import to first and then import from that to the real table.
ASKER
Thanks for all the help.
You are welcome.
Use the LOAD DATA INFILE to load the data into a temporary table.
Then run an insert with ON DUPLICATE to update the records that exist or insert if they don't - something like this
Open in new window