Link to home
Start Free TrialLog in
Avatar of stephenmp
stephenmpFlag for United States of America

asked on

What is the best option for importing a large (700,000 rows) of data into a MYSQL Database? This has to be done daily.

I am trying to import a larger XML file into my MYSQL DB.  I have already written the script to ftp a zip file, unzip it, and parse the XML.

The problem I am having is the that doing a while loop and inserting each row one at a time takes forever!  I am planning on creating a cron job for this, but it seems that there would be a more efficient way of importing a large amount of data.  

I had thought about creating an .sql file (via PHP) and then run that script file from a php script using SHELL commands if possible.  Even maybe a stored procedure?!?!

So, is running an INSERT via mysql_query in PHP for every row the best option in PHP?  I know that you cannot run multiple statements in a PHP script.

What are my options?  If I just had to do this once, I would wait.  But, I will be running this once a day...  It seems that this will take 2 hours the way it is now!

One last question on this:  Is there a more efficient way of deciding whether an item needs updated or not?  Basically, I'm either updating if found and inserting if not found.  But, it seems that it's a waste of time to update rows that don't need updated.  But, wouldn't it take more time to compare every column in the row to see if it's been changed???

Thanks you in advance!
Avatar of Kevin Cross
Kevin Cross
Flag of United States of America image

You could loop on the PHP side to generate the insert script...instead insert into ... values, you can use insert into ... select ... union all select ... so basically you would loop the union select portion.  Not sure how nice this would behave with 70,000 union statements but you could modify the thought by batching (I.e., split up the insert but less than 70,000 individual).  After this is built you can utilize mysql's on duplicate key update feature to update those rows that already exist.

So this would be a piece of static sql you append to same sql string and then you simply send the large sql string to mysql query from php.

I am on my mobile, but was intrigued so posted ; however , I apologize for not being able to post more fluidly.  For example another approach would be to parse the XML out using a trick I wrote about.  You can find a link in my profile.  I recently had 10,000 rows of data in xml format and I simply used load_file() on mysql to upload the file to a table that had a longblob field I called xmltmp ... you can have two fields to keep it simple and id and the blob.  The id is so you can reference a consistent value each time your process runs and just update the xml -- otherwise you could add a new row and use last_insert_id for next step.

Next you can use my xml parse trick but instead of @xml variable you use column called xml from xmltmp.  Since you are in mysql you can use things like not exists to only insert new rows and then update to amend existing ones...but with mysql upsert capabilities you can do it at once...the trick makes it nice and faster than 2 hours though it does crank a bit given all the xml parsing...but for a batch process running on a schedule you may not care.  Although it seems that is a concern if I am reading correctly.

Best regards,

Kevin
Avatar of stephenmp

ASKER

ok... I might let it crawl at night...  If I have to do this for 2 tables, 1 for the main table & 1 for the photos table...  Will allowing 2 scripts to run around the same time be ok?  Or should I wait until one finishes?
They can probably run together, but it is probably best to schedule them sequentially -- just to be safe!
SOLUTION
Avatar of WilliamStam
WilliamStam
Flag of South Africa image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Can I do this on updates as well?  Multiple update chunks at once?

I could hold the updates in an array, and the inserts in an array.... And when they get too large, run the mysql...

So, what is a proper chunk size?  500, 1000, 10000?

Thanks!
UPDATE table
SET col1 = 'a', col2 = 'b' WHERE ID = 1,
SET col1 = 'c', col2 = 'd' WHERE ID = 2;

not too sure on block sizes.. you mgiht want to experiment a bit with it

what you trying to do btw?
http://www.kavoir.com/2009/05/mysql-update-multiple-rows-with-one-single-query.html

maybe try that... im thinking this would be a little better than the LONG update string
What version of MySQL can you do that UPDATE statement in, @WilliamStam?  Doesn't not work for 5.1 or below nor 5.5 to my knowledge.  The query limit will be based on "max_allowed_packet" setting.
drop table if exists

CREATE TABLE IF NOT EXISTS `temp_table` (
  `ID` int(7) NOT NULL AUTO_INCREMENT,
  `value` varchar(250) DEFAULT NULL
 PRIMARY KEY (`ID`)
)

// might just wanna take the auto incriment stuff off i rate... have a normal 2 colums (or more... )

INSERT INTO temp_table (ID, value) VALUES
 ('1','a'),
 ('2','c'),
 ('3','e');

UPDATE table, temp_table SET table.value = temp_table.value WHERE table.ID = temp_table.ID;

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
...and by "The link" I was referring to the one from http:#a33711584 ... as you will see the XML parsing trick yields a result set, so if you put that in a derived table and join it to your table you are updating, you get the same update with join syntax shown in that link.

Speaking of which, you can see the "dos and don'ts" of updates with joins here:
https://www.experts-exchange.com/Database/Miscellaneous/A_1517-UPDATES-with-JOIN-for-everybody.html
UPDATE table SET value = CASE
WHEN id = 1 THEN 'a';
WHEN id = 2 THEN 'b';
ELSE value
END;
@mwvisa1: not too sure on the version numebr thing.. something i saw on the net ages ago... now i cant find again lol.. so it prob wont work (i didnt test it, just stored it in that dark place you store ideas for future projects)
In the link you just posted http:#33711660 it has the syntax you mentioned in http:#a33711571 which is what I was referring to, but notice it says "you can’t do multiple changes to a table in a single UPDATE query like this". *smile*

Not a big deal, just was shocked to see that and thought I was going to learn something new of some new version of MySQL coming out.
Thanks so much for your help...  I'm still trying the ideas you have given...

The temporary table idea would work if I could use it in conjunction with the insert (on duplicate update) command...

The insert on duplicate update would work if I could figure out how to do it with multiple rows...  I don't think it would work with multiple rows...

Example:  INSERT INTO vehicle_main.dsi_links (Vevo, VIN) VALUES ('testit', '1G1BL52WXRR116077' ) ON DUPLICATE KEY UPDATE Vevo = 'testit';

works...  But, I would need to be able to do this for 500-1000 at a time...  php/mysql forces me to do one statement at a time...  I don't think we could combine it with this statement could we?

INSERT INTO temp_table (ID, value) VALUES
 ('1','a'),
 ('2','c'),
 ('3','e');


You can use ON DUPLICATE KEY update with the multi-insert, so that should be fine.  So if you opt to build the multi-line sql string in PHP and pass at once in batches that should be fine.  If you find that to be a challenge, then you can upload the file to the server where it can be accessed and execute the LOAD_FILE statement from PHP.  If the data is in MySQL temp table where you can use INSERT INTO ... SELECT you can still use ON DUPLICATE KEY.
Avatar of Gatherer_Hade
Gatherer_Hade

Another approach to speeding it up is to dump your data to a text file and use the LOAD DATA statement, which is designed for loading large number of records at very high speed.

here is a link to more info on this
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
That is pretty much what I have been saying is an alternative.  It is just that LOAD DATA INFILE is typically used with a data dump created from LOAD DATA OUTFILE.  The question states this is originally in an XML file, which is why I was suggesting LOAD_FILE but the concept is the same.  Utilize the server and MySQL utilities to quickly load the data in bulk.  Still have the issue of parsing and inserting/updating the other table, but from MySQL the multiple statement SQL becomes a non-concern as stored procedure or something similar can be used.