Link to home
Start Free TrialLog in
Avatar of andy7789
andy7789

asked on

How to optimize MySQL import script

Hi X-perts,

I need some advice how to optimize text-to-MySQL table script. My problem is that i have to import data from a huge  tab delimited text file (over 100mb with 150,000+ records). Currently, i am using a simple code as

<?
   # first get a mysql connection as per the FAQ

  $fcontents = file ('./spreadsheet.xls');
  # expects the csv file to be in the same dir as this script

  for($i=0; $i<sizeof($fcontents); $i++) {
      $line = trim($fcontents[$i]);
      $arr = explode("\t", $line);
      #if your data is comma separated
      # instead of tab separated,
      # change the '\t' above to ','
     
      $sql = "insert into TABLENAME values ('".
                  implode("','", $arr) ."')";
      mysql_query($sql);
      if(mysql_error()) {
         echo mysql_error() ."<br>\n";
      }
}
?>

Any ideas how to make it faster?

Thanks

A
ASKER CERTIFIED SOLUTION
Avatar of Raynard7
Raynard7

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of andy7789
andy7789

ASKER

Thank yo guys; I like the idea of loading the file via "LOAD DATA".

Is there a way to load only selected fields into the table? I have 80% fields that are not needed. What would be the best way? Delete the fields after the table has been created via LOAD DATA or inflate only necessary fields while adding data row-by-row?
no there is no way to filter using load data - you would have to load it to a temporary table, then delete the rows and then import more data.

the beauty is however if your temporary table has no indexes or other validataoin - then it should go in quickly,

however if you are deleting 80% of the records are you able to apply a regular expression to the text to reduce the bulk before importing using load data?

this combination may speed things up considerably.
I am not sure how to apply a regular expression to the data file to make it faster than a table conversion. It is a text file with tab-delimited fileds and lines terminated by '\n'

So, to remove extra fields i will have to extract rows (probably, via explode - would be the fastest) and remove fields by transforming lines into an array and removing its certain elements. I suspect, it would be slower than just deleting extra columns from the temp table.

Do you have some better and faster algorithm for text conversion in mind?
sorry, I thought you were filtering rows, not columns,


what criteria are you using to remove these rows? or is it just the fields you wish to remove?

if it is just the fields then you can assign those columns dummy values;
```````````````````````````````````````
 You can also discard an input value by assigning it to a user variable and not assigning the variable to a table column:

LOAD DATA INFILE 'file.txt'
  INTO TABLE t1
  (column1, @dummy, column2, @dummy, column3);

from http://dev.mysql.com/doc/refman/5.0/en/load-data.html


`````````````````````````````

which will skip the columns with the @ in front of them
what is column1? Is it a name of the 1st column in the table?
yes - it is so if your data was like :

2007-01-01->SomeData->33.45->34839

*-> = tab charactr

and you did

LOAD DATA INFILE 'file.txt'   INTO TABLE yourTableName  (yourDateColumn, @dummy, YourMoneyColumn, @dummy);

this would put the 2007-01-01 into the yourDateColumn and 33.45 into the yourmoneycolumn -


> Delete the fields after the table has been created ..
why would feed your table with 80% useless data?
filter your data first, then import it as Raynard7 suggested.

> I am not sure how to apply a regular expression to the data file ..
if it is always the same column you nedd or you want to skip, then (g)awk is the tool to use, if you need more sophisticated pattern matches, then use perl

Can you please post a sample line of your data, and tell us which parts to use.
What is wrong with this syntax:

LOAD DATA INFILE LOCAL 'http://www.xxx.com/qzip/extr.txt' INTO TABLE draf2
      FIELDS TERMINATED BY '\t'
      LINES TERMINATED BY '\n'

I am running the query directly via SQLyog. It returns

Error Code : 1064
You have an error in your SQL syntax; .....
the correct syntax is LOAD DATA LOCAL  INFILE; However it returns an error

File '' not found (Errcode: 2)

If I run a php script in some directory, how should I reference the input file (extr.txt) located in the same directory?
Here is the code with the absolute path to the server, but it does not work either - File " not found:

$source_file = '/usr/local/psa/home/vhosts/mydomain.com/httpdocs/qzip/extr.txt';

      $sql = "LOAD DATA LOCAL INFILE \"$source_file\" INTO TABLE draf2
      FIELDS TERMINATED BY '\t'
      LINES TERMINATED BY '\n'";
      mysql_query($sql,$myconn) or die(mysql_error());

What am I missing here?
> What is wrong with this syntax:
you cannot use an URL for the file, it has to be a file path where the file is physically located on the host wher you start the command

> File " not found:
your syntax is wrong, should be like:

  $sql "LOAD ......" . $source_file . " ..... ";

Also not that you pass \n  and \t to the string which evaluates these escape characters, but you most likely want to hav \n and \t verbatim in your SQL statement, hence you need to write \\n and \\t in your string
ok, just to leave the main syntax only:

      $sql = "LOAD DATA LOCAL INFILE '/usr/local/psa/home/vhosts/mydimain.com/httpdocs/qzip/extr.txt' INTO TABLE draf2
      FIELDS TERMINATED BY '\t'
      LINES TERMINATED BY '\n'";
      mysql_query($sql,$myconn) or die(mysql_error());

Still the same error - file not found, though the path is correct
i tried adding $client_flags to db connection:

$client_flags ¦= 128;
$dblink = mysql_connect($dbhost, $dbuser, $dbpass, FALSE, $client_flags);

it makes no difference - file not found
did you verify that the path and file exists (just wondering about ...mydimain.com/httpdocs...)
yes, it is a full path on the server. I run the php script from the same directory qzip (where the input file is located)
checked again via ssh pwd. The path and the file name are correct
I am puzzled- how is that possible; running a php file with name db.php:


      $sql = "LOAD DATA LOCAL INFILE 'extr.txt' INTO TABLE draf2
      FIELDS TERMINATED BY '\\t'
      LINES TERMINATED BY '\\n'";
      mysql_query($sql,$myconn) or die(mysql_error());

returns mysql error:

File 'docs/qzip/db.php' not found (Errcode: 0)

what is this? where is the referfence to docs/qzip/db.php?

Please, give me some clue
> .. I run the php script from the same directory qzip (where the input file is located

then use ./ as path, nothing else

> what is this? where is the referfence to docs/qzip/db.php?
sounds like a dangling error message in php's cache for exception messages ...
it does not work either - the same "file not found". Here is the full code:

<?
set_time_limit(600);

$hostname_myconn = "localhost";
$database_myconn = "xxxx-2";
$username_myconn = "xxxxxxx";
$password_myconn = "xxxxxxx";
$myconn = mysql_connect($hostname_myconn, $username_myconn, $password_myconn, FALSE, 128) or trigger_error(mysql_error(),E_USER_ERROR);

mysql_select_db($database_myconn,$myconn);

$sql = "LOAD DATA LOCAL INFILE './extr.txt' INTO TABLE draf2
            FIELDS TERMINATED BY '\\t'
            LINES TERMINATED BY '\\n'";
mysql_query($sql,$myconn) or die(mysql_error());

?>

Amy ideas?
if i run it from the SQLyog command prompt as

LOAD DATA LOCAL INFILE 'c:/temp/extr.txt' INTO TABLE draf2 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'

it works fine, but i cannot make it work from the server, i.e. running via php
> .. it works fine, ..
then you need to ask your server admin which owner and which permissions your script has and in which physical directory it is started.
Finally I have managed to fix this: removed LOCAL and set up user file privileges. Works perfect - 60,000 records (100Mb text file) is inflated for 3-5 sec. One more question before we close the topic:

1. When i create the 1st table it has not keys. Next, I assign one of the fields (id) as a primary key.
2. Next, when I load some updated data to the same table from another text file - how should I specify the command?

LOAD DATA INFILE 'c:/temp/extr.txt' REPLACE INTO TABLE draf2 FIELDS TERMINATED BY '\\t' LINES TERMINATED BY '\\n'

Is it correct, if i just define a primary key?

dooh, don't tell your admin that mysql is configured to read data which should have  root/mysql permissions into user space tables ... sounds like a security problem which you can use as feature too :-/

> .. REPLACE ..
REPLACE works only if the there is a primary key or unique index on the table, not sure if LOAD DATA generates it and REPLACE behaves as you expect in subsequent LOAD DATA calls to the same table. Soryy for incomplete infor, you need to dig into MySQL docs ...