[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1257
  • Last Modified:

How to optimize MySQL import script

Hi X-perts,

I need some advice how to optimize text-to-MySQL table script. My problem is that i have to import data from a huge  tab delimited text file (over 100mb with 150,000+ records). Currently, i am using a simple code as

<?
   # first get a mysql connection as per the FAQ

  $fcontents = file ('./spreadsheet.xls');
  # expects the csv file to be in the same dir as this script

  for($i=0; $i<sizeof($fcontents); $i++) {
      $line = trim($fcontents[$i]);
      $arr = explode("\t", $line);
      #if your data is comma separated
      # instead of tab separated,
      # change the '\t' above to ','
     
      $sql = "insert into TABLENAME values ('".
                  implode("','", $arr) ."')";
      mysql_query($sql);
      if(mysql_error()) {
         echo mysql_error() ."<br>\n";
      }
}
?>

Any ideas how to make it faster?

Thanks

A
0
andy7789
Asked:
andy7789
  • 14
  • 7
  • 4
2 Solutions
 
Raynard7Commented:
Hi, there are a few different methods,

1. it may be slow because your table is indexed - you may be better putting into a temp table, then copying in one go then deleting the temp table
2. you may want to disable keys before inserting
3. you may want to save this to file - then use the file import routines to pull in the file

or my favourite which seems to work most of the time in combination with #2

4. Insert multiple records at the one time - depending on the server you would want to do between 100 and 500 in a batch,

what this means is that you construct a list of insert values as part of the one statement ie

insert into tableX (field2, field3) values (2,43),(34,44),(55,3) ......

which will then put in 3 records (in the above example) this uses less resources - and should not cause any waiting time becuase these should be inserted before the next string is read by the db
0
 
ahoffmannCommented:
$sql = "LOAD DATA LOCAL INFILE './spreadsheet.xls' INTO TABLE tablename IGNORE 1 LINES";
# see MySQL's LOAD command about using delimiters and the IGNORE option
0
 
andy7789Author Commented:
Thank yo guys; I like the idea of loading the file via "LOAD DATA".

Is there a way to load only selected fields into the table? I have 80% fields that are not needed. What would be the best way? Delete the fields after the table has been created via LOAD DATA or inflate only necessary fields while adding data row-by-row?
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
Raynard7Commented:
no there is no way to filter using load data - you would have to load it to a temporary table, then delete the rows and then import more data.

the beauty is however if your temporary table has no indexes or other validataoin - then it should go in quickly,

however if you are deleting 80% of the records are you able to apply a regular expression to the text to reduce the bulk before importing using load data?

this combination may speed things up considerably.
0
 
andy7789Author Commented:
I am not sure how to apply a regular expression to the data file to make it faster than a table conversion. It is a text file with tab-delimited fileds and lines terminated by '\n'

So, to remove extra fields i will have to extract rows (probably, via explode - would be the fastest) and remove fields by transforming lines into an array and removing its certain elements. I suspect, it would be slower than just deleting extra columns from the temp table.

Do you have some better and faster algorithm for text conversion in mind?
0
 
Raynard7Commented:
sorry, I thought you were filtering rows, not columns,


what criteria are you using to remove these rows? or is it just the fields you wish to remove?

if it is just the fields then you can assign those columns dummy values;
```````````````````````````````````````
 You can also discard an input value by assigning it to a user variable and not assigning the variable to a table column:

LOAD DATA INFILE 'file.txt'
  INTO TABLE t1
  (column1, @dummy, column2, @dummy, column3);

from http://dev.mysql.com/doc/refman/5.0/en/load-data.html


`````````````````````````````

which will skip the columns with the @ in front of them
0
 
andy7789Author Commented:
what is column1? Is it a name of the 1st column in the table?
0
 
Raynard7Commented:
yes - it is so if your data was like :

2007-01-01->SomeData->33.45->34839

*-> = tab charactr

and you did

LOAD DATA INFILE 'file.txt'   INTO TABLE yourTableName  (yourDateColumn, @dummy, YourMoneyColumn, @dummy);

this would put the 2007-01-01 into the yourDateColumn and 33.45 into the yourmoneycolumn -


0
 
ahoffmannCommented:
> Delete the fields after the table has been created ..
why would feed your table with 80% useless data?
filter your data first, then import it as Raynard7 suggested.

> I am not sure how to apply a regular expression to the data file ..
if it is always the same column you nedd or you want to skip, then (g)awk is the tool to use, if you need more sophisticated pattern matches, then use perl

Can you please post a sample line of your data, and tell us which parts to use.
0
 
andy7789Author Commented:
What is wrong with this syntax:

LOAD DATA INFILE LOCAL 'http://www.xxx.com/qzip/extr.txt' INTO TABLE draf2
      FIELDS TERMINATED BY '\t'
      LINES TERMINATED BY '\n'

I am running the query directly via SQLyog. It returns

Error Code : 1064
You have an error in your SQL syntax; .....
0
 
andy7789Author Commented:
the correct syntax is LOAD DATA LOCAL  INFILE; However it returns an error

File '' not found (Errcode: 2)

If I run a php script in some directory, how should I reference the input file (extr.txt) located in the same directory?
0
 
andy7789Author Commented:
Here is the code with the absolute path to the server, but it does not work either - File " not found:

$source_file = '/usr/local/psa/home/vhosts/mydomain.com/httpdocs/qzip/extr.txt';

      $sql = "LOAD DATA LOCAL INFILE \"$source_file\" INTO TABLE draf2
      FIELDS TERMINATED BY '\t'
      LINES TERMINATED BY '\n'";
      mysql_query($sql,$myconn) or die(mysql_error());

What am I missing here?
0
 
ahoffmannCommented:
> What is wrong with this syntax:
you cannot use an URL for the file, it has to be a file path where the file is physically located on the host wher you start the command

> File " not found:
your syntax is wrong, should be like:

  $sql "LOAD ......" . $source_file . " ..... ";

Also not that you pass \n  and \t to the string which evaluates these escape characters, but you most likely want to hav \n and \t verbatim in your SQL statement, hence you need to write \\n and \\t in your string
0
 
andy7789Author Commented:
ok, just to leave the main syntax only:

      $sql = "LOAD DATA LOCAL INFILE '/usr/local/psa/home/vhosts/mydimain.com/httpdocs/qzip/extr.txt' INTO TABLE draf2
      FIELDS TERMINATED BY '\t'
      LINES TERMINATED BY '\n'";
      mysql_query($sql,$myconn) or die(mysql_error());

Still the same error - file not found, though the path is correct
0
 
andy7789Author Commented:
i tried adding $client_flags to db connection:

$client_flags ¦= 128;
$dblink = mysql_connect($dbhost, $dbuser, $dbpass, FALSE, $client_flags);

it makes no difference - file not found
0
 
ahoffmannCommented:
did you verify that the path and file exists (just wondering about ...mydimain.com/httpdocs...)
0
 
andy7789Author Commented:
yes, it is a full path on the server. I run the php script from the same directory qzip (where the input file is located)
0
 
andy7789Author Commented:
checked again via ssh pwd. The path and the file name are correct
0
 
andy7789Author Commented:
I am puzzled- how is that possible; running a php file with name db.php:


      $sql = "LOAD DATA LOCAL INFILE 'extr.txt' INTO TABLE draf2
      FIELDS TERMINATED BY '\\t'
      LINES TERMINATED BY '\\n'";
      mysql_query($sql,$myconn) or die(mysql_error());

returns mysql error:

File 'docs/qzip/db.php' not found (Errcode: 0)

what is this? where is the referfence to docs/qzip/db.php?

Please, give me some clue
0
 
ahoffmannCommented:
> .. I run the php script from the same directory qzip (where the input file is located

then use ./ as path, nothing else

> what is this? where is the referfence to docs/qzip/db.php?
sounds like a dangling error message in php's cache for exception messages ...
0
 
andy7789Author Commented:
it does not work either - the same "file not found". Here is the full code:

<?
set_time_limit(600);

$hostname_myconn = "localhost";
$database_myconn = "xxxx-2";
$username_myconn = "xxxxxxx";
$password_myconn = "xxxxxxx";
$myconn = mysql_connect($hostname_myconn, $username_myconn, $password_myconn, FALSE, 128) or trigger_error(mysql_error(),E_USER_ERROR);

mysql_select_db($database_myconn,$myconn);

$sql = "LOAD DATA LOCAL INFILE './extr.txt' INTO TABLE draf2
            FIELDS TERMINATED BY '\\t'
            LINES TERMINATED BY '\\n'";
mysql_query($sql,$myconn) or die(mysql_error());

?>

Amy ideas?
0
 
andy7789Author Commented:
if i run it from the SQLyog command prompt as

LOAD DATA LOCAL INFILE 'c:/temp/extr.txt' INTO TABLE draf2 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'

it works fine, but i cannot make it work from the server, i.e. running via php
0
 
ahoffmannCommented:
> .. it works fine, ..
then you need to ask your server admin which owner and which permissions your script has and in which physical directory it is started.
0
 
andy7789Author Commented:
Finally I have managed to fix this: removed LOCAL and set up user file privileges. Works perfect - 60,000 records (100Mb text file) is inflated for 3-5 sec. One more question before we close the topic:

1. When i create the 1st table it has not keys. Next, I assign one of the fields (id) as a primary key.
2. Next, when I load some updated data to the same table from another text file - how should I specify the command?

LOAD DATA INFILE 'c:/temp/extr.txt' REPLACE INTO TABLE draf2 FIELDS TERMINATED BY '\\t' LINES TERMINATED BY '\\n'

Is it correct, if i just define a primary key?

0
 
ahoffmannCommented:
dooh, don't tell your admin that mysql is configured to read data which should have  root/mysql permissions into user space tables ... sounds like a security problem which you can use as feature too :-/

> .. REPLACE ..
REPLACE works only if the there is a primary key or unique index on the table, not sure if LOAD DATA generates it and REPLACE behaves as you expect in subsequent LOAD DATA calls to the same table. Soryy for incomplete infor, you need to dig into MySQL docs ...
0

Featured Post

Transaction-level recovery for Oracle database

Veeam Explore for Oracle delivers low RTOs and RPOs with agentless transaction log backup and transaction-level recovery of Oracle databases. You can restore the database to a precise point in time, even to a specific transaction.

  • 14
  • 7
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now