MySQL LOAD DATA INFILE question

We have an extremely large database that another company gives to us.  Everyday they drop around 20 different CSV files with new records and updates.  I'd like to use LOAD DATA INFILE because it's so much faster, is there a way to have it overwrite the records that already have ID's as well as add the ones that don't?

If I need to do this with php, is there a "universal" script I can write that will automate it without having to enter thousands of different fields?  I wanted to use a full dump of each table every night but the company won't play nice so all I get are the updates to the table.  

Enough rambling, what's the best way to get these "updates" into mysql without writing a script to tranverse 200k records every day, one field at a time?
finnegandevAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Julian HansenCommented:
I would do it like this

Use the LOAD DATA INFILE to load the data into a temporary table.

Then run an insert with  ON DUPLICATE to update the records that exist or insert if they don't - something like this
DROP TABLE IF EXISTS '_temp_import';

-- CREATE THE TEMP TABLE
CREATE TABLE `_temp_import`( 
   `client_id` int NOT NULL , 
   `firstname` varchar(100) , 
   `surname` varchar(100) , 
   PRIMARY KEY (`client_id`)
 )
-- LOAD THE DATA
LOAD DATA LOCAL INFILE 'path_to_file/input_file.csv'
INSERT INTO TABLE `_temp_import`
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(client_id, firstname, surname);

INSERT INTO `ACTUALTABLE` (client_id, firstname, surname)
SELECT client_id, firstname, surname FROM `_test_import`
ON DUPLICATE KEY UPDATE 
  firstname=VALUES(firstname), 
  surname=VALUES(surname);

DROP TABLE `_temp_import';

Open in new window

Brian TaoSenior Business Solutions ConsultantCommented:
Use "REPLACE INTO":
LOAD DATA LOCAL INFILE 'your_path/your_file.csv' REPLACE INTO TABLE your_table ......... blah blah.
Julian HansenCommented:
Why you should not use REPLACE INTO. It is effectively a delete and insert. An ON DUPLICATE key allows you to do the same thing without the overhead (and side effects) of a delete and insert. For example if you have an autonumber on your table a REPLACE INTO will cause all the autonumbers to change - as the record will be removed and re-inserted - this may not be desirable.

From the docs

REPLACE is a MySQL extension to the SQL standard. It either inserts, or deletes and inserts. For another MySQL extension to standard SQL—that either inserts or updates—see Section 13.2.5.3, “INSERT ... ON DUPLICATE KEY UPDATE Syntax”.

Open in new window

SolarWinds® VoIP and Network Quality Manager(VNQM)

WAN and VoIP monitoring tools that can help with troubleshooting via an intuitive web interface. Review quality of service data, including jitter, latency, packet loss, and MOS. Troubleshoot call performance and correlate call issues with WAN performance for Cisco and Avaya calls

Brian TaoSenior Business Solutions ConsultantCommented:
You're 100% correct, Julian.
But, well, the author had these 2 phrases in his question:
- have it overwrite the records that already have ID's (very likely there's no autonumber)
- without having to enter thousands of different fields (yours requires entering those fields in the update part)

Some experts are more like educators and want to guide people into a better programmer.  I respect them very much, but I, having been in technical support for too many years, tend to solve problems directly, one by one.
Julian HansenCommented:
It is a fairly simple task to write a PHP script to create the ON DUPLICATE query -

Here is a sample
<?php
$host = 'localhost';
$user = 'user';
$password = 'password';
$database = 'database';

// CREATE THE MYSQLI CONNECTION
$mysqli = new mysqli($host, $user, $password, $database);
if ($mysqli->connect_error) {
  die('Connect Error (' . $mysqli->connect_errno . ') ' . $mysqli->connect_error);
}      

// INPUTS ARRAY - STORES THE TABLES AND CORRESPONDING
// INPUT FILE PATH
$inputs = array (
  array (
    'table' => 'table1',
    'file' => 'c:/incoming/table1.csv'
  ),
  array (
    'table' => 'table2',
    'file' => 'c:/incoming/table2.csv'
  )
);

// PROCESS EACH TABLE IN TURN
foreach($inputs as $table) {
  echo "Processing table: {$table['table']}, Data file: [{$table['file']}]<br/>";
  
  // GET THE FIELDNAMES FROM THE TABLE WE ARE PROCESSING
  $query = "SELECT * FROM {$table['table']} LIMIT 1";
  $result = $mysqli->query($query);
  if ($result) {
    $row = $result->fetch_assoc();
	
    // FIELDNAMES ARE IN THE KEYS OF THE RETURNED ARRAY
    $fieldnames = array_keys($row);
	
    // IMPORT THIS TABLE
    DoImport($mysqli, $table['table'], $table['file'], $fieldnames);
  }
  else {
    echo "There was an error with [{$query}]. Error: " . $mysql->error . "<br/>";
  }
}

/*
  DoImport($db, $table, $file, $fieldnames)
  Function to import data from CSV file. Takes as parameters
  $db - mysqli connection
  $table - table to import to
  $file - file to import from
  $fieldnames - fieldnames of table
*/
function DoImport($db, $table, $file, $fieldnames)
{
  // CREATE THE TEMPORARY TABLE - BUT FIRST 
  // DROP IF IT EXISTS ALREADY
  $temptable = "_temp_{$table}";
  echo "Droping temporary table {$temptable}<br/>";
  
  $query = "DROP TABLE IF EXISTS {$temptable}";
  $result = $db->query($query);

  // CREATE THE TEMPORARTY TABLE USING SELECT 
  // WITH NULL ROWS RETURNED
  $query = "CREATE TEMPORARY TABLE `{$temptable}` AS (SELECT * FROM {$table} WHERE 1=2)";
  $result = $db->query($query);
  if (!$result) {
    echo "Failed to create temp table " . $db->error . "<br/>";
  }
  
  // LOAD THE DATA USING LOCAL INFILE
  // CHANGE TO SUITE YOUR CSV PROFILE
  $fields = implode(",", $fieldnames);
  $query = <<< QUERY
    LOAD DATA LOCAL INFILE '{$file}'
      INTO TABLE `{$temptable}`
      FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
      LINES TERMINATED BY '\\r\\n'
      IGNORE 1 LINES
      ({$fields})
QUERY;
  $result = $db->query($query);
  
  if (!$result) {
    echo "{$query}<br/>Failed to import data {$file} to temp table {$temptable}. Error: " . $db->error . "<br/>";
  }
  
  // INSERT INTO TABLE USING ON DUPLICATE
  // REQUIRES TARGET TABLE TO HAVE A PRIMARY KEY
  $query = "INSERT INTO `{$table}` ($fields)
    SELECT {$fields} FROM `$temptable`
    ON DUPLICATE KEY UPDATE ";
	
  // BUILD THE FIELD NAMES PART FROM
  // THE FIELDNAMES PARAMETER
  foreach($fieldnames as $f) {
    $query .= "{$f}=VALUES({$f}),";
  }
  
  // REMOVE THE TRAILING COMMA
  $query = trim($query, ",");

  $result = $db->query($query);
  if (!$result) {
    echo "{$query}<br/>Failed to insert data. Error: " . $db->error . "<br/>";
  }
  
  // CLEAN UP
  $query = "DROP TABLE `{$temptable}'";
  $db->query($query);
}

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
finnegandevAuthor Commented:
Would replace into delete all the data that's not duplicated?
Julian HansenCommented:
Replace into does a delete and a re-insert of the data so it will remove a record if it exists and then add the record back again from the source.
finnegandevAuthor Commented:
But will it write the new data to the table also?  

Would there be a problem if I did a regular load data infile and then did the replace after?  I would assume that the regular load data would error on the duplicate key?
Julian HansenCommented:
Yes - however, I would recommend the process described above -  the script posted is fully functional and generic - you provide it the table names and import files and it should do the rest.
finnegandevAuthor Commented:
The only problem with that is I'm writing hundreds of thousands (900k+) in several of the tables and I'm worried about the script execution time and bogging down the server while the script runs. Some of the csv files I'm reading from are 30+ mb.  I see it's using the load data infile but how is MySql at working between tables with large data sources?
finnegandevAuthor Commented:
This is what I currently have, but if everyones in agreement that the other method isn't too taxing then I'd be happy to use that way.

$mysqli = mysqli_init();
mysqli_options($mysqli, MYSQLI_OPT_LOCAL_INFILE, true);
mysqli_real_connect($mysqli,$host,$user,$password, $database);

$query = SetupReplace($thedate,"SO.csv","bt_so");
  $result = $mysqli->query($query);
  if (!$result) {
    echo "Failed to load " . $mysqli->error . "<br/>";
  }

function SetupReplace($folder,$file,$table)
{

	$myquery = <<< QUERY
    LOAD DATA LOCAL INFILE '/var/csv/extracted/{$folder}/{$file}' REPLACE INTO TABLE {$table} FIELDS TERMINATED BY ','  ENCLOSED BY '"'  LINES TERMINATED BY '\\r\\n' IGNORE 1 LINES
QUERY;

	return $myquery;

}

Open in new window


on a side note, that script posted is beautiful.
Julian HansenCommented:
bogging down the server while the script runs
Irrespective of which method you use you will have to use server processing time so not sure why this is any different.
The LOAD INFILE is the fastest way to bring the data in - however you are going to have to experiment with the process to find what works best for you.
Given the size of your data temporary tables might not work as there is a memory limit imposed on them. You might need to use a standard table to import to first and then import from that to the real table.
finnegandevAuthor Commented:
Thanks for all the help.
Julian HansenCommented:
You are welcome.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
MySQL Server

From novice to tech pro — start learning today.