Solved

What is the best option for importing a large (700,000 rows) of data into a MYSQL Database?  This has to be done daily.

Posted on 2010-09-18
19
366 Views
Last Modified: 2013-12-12
I am trying to import a larger XML file into my MYSQL DB.  I have already written the script to ftp a zip file, unzip it, and parse the XML.

The problem I am having is the that doing a while loop and inserting each row one at a time takes forever!  I am planning on creating a cron job for this, but it seems that there would be a more efficient way of importing a large amount of data.  

I had thought about creating an .sql file (via PHP) and then run that script file from a php script using SHELL commands if possible.  Even maybe a stored procedure?!?!

So, is running an INSERT via mysql_query in PHP for every row the best option in PHP?  I know that you cannot run multiple statements in a PHP script.

What are my options?  If I just had to do this once, I would wait.  But, I will be running this once a day...  It seems that this will take 2 hours the way it is now!

One last question on this:  Is there a more efficient way of deciding whether an item needs updated or not?  Basically, I'm either updating if found and inserting if not found.  But, it seems that it's a waste of time to update rows that don't need updated.  But, wouldn't it take more time to compare every column in the row to see if it's been changed???

Thanks you in advance!
0
Comment
Question by:stephenmp
  • 8
  • 7
  • 3
  • +1
19 Comments
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
You could loop on the PHP side to generate the insert script...instead insert into ... values, you can use insert into ... select ... union all select ... so basically you would loop the union select portion.  Not sure how nice this would behave with 70,000 union statements but you could modify the thought by batching (I.e., split up the insert but less than 70,000 individual).  After this is built you can utilize mysql's on duplicate key update feature to update those rows that already exist.

So this would be a piece of static sql you append to same sql string and then you simply send the large sql string to mysql query from php.

I am on my mobile, but was intrigued so posted ; however , I apologize for not being able to post more fluidly.  For example another approach would be to parse the XML out using a trick I wrote about.  You can find a link in my profile.  I recently had 10,000 rows of data in xml format and I simply used load_file() on mysql to upload the file to a table that had a longblob field I called xmltmp ... you can have two fields to keep it simple and id and the blob.  The id is so you can reference a consistent value each time your process runs and just update the xml -- otherwise you could add a new row and use last_insert_id for next step.

Next you can use my xml parse trick but instead of @xml variable you use column called xml from xmltmp.  Since you are in mysql you can use things like not exists to only insert new rows and then update to amend existing ones...but with mysql upsert capabilities you can do it at once...the trick makes it nice and faster than 2 hours though it does crank a bit given all the xml parsing...but for a batch process running on a schedule you may not care.  Although it seems that is a concern if I am reading correctly.

Best regards,

Kevin
0
 

Author Comment

by:stephenmp
Comment Utility
ok... I might let it crawl at night...  If I have to do this for 2 tables, 1 for the main table & 1 for the photos table...  Will allowing 2 scripts to run around the same time be ok?  Or should I wait until one finishes?
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
They can probably run together, but it is probably best to schedule them sequentially -- just to be safe!
0
 
LVL 2

Assisted Solution

by:WilliamStam
WilliamStam earned 200 total points
Comment Utility
INSERT INTO table (ID. col1, col2) VALUES
 ('1','a','b'),
 ('2','c','d'),
 ('3','e','f'');


you probably better off splitting the records to go in that it does chunks instead of all the things at once
0
 

Author Comment

by:stephenmp
Comment Utility
Can I do this on updates as well?  Multiple update chunks at once?

I could hold the updates in an array, and the inserts in an array.... And when they get too large, run the mysql...

So, what is a proper chunk size?  500, 1000, 10000?

Thanks!
0
 
LVL 2

Expert Comment

by:WilliamStam
Comment Utility
UPDATE table
SET col1 = 'a', col2 = 'b' WHERE ID = 1,
SET col1 = 'c', col2 = 'd' WHERE ID = 2;

not too sure on block sizes.. you mgiht want to experiment a bit with it

what you trying to do btw?
0
 
LVL 2

Expert Comment

by:WilliamStam
Comment Utility
http://www.kavoir.com/2009/05/mysql-update-multiple-rows-with-one-single-query.html

maybe try that... im thinking this would be a little better than the LONG update string
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
What version of MySQL can you do that UPDATE statement in, @WilliamStam?  Doesn't not work for 5.1 or below nor 5.5 to my knowledge.  The query limit will be based on "max_allowed_packet" setting.
0
 
LVL 2

Expert Comment

by:WilliamStam
Comment Utility
drop table if exists

CREATE TABLE IF NOT EXISTS `temp_table` (
  `ID` int(7) NOT NULL AUTO_INCREMENT,
  `value` varchar(250) DEFAULT NULL
 PRIMARY KEY (`ID`)
)

// might just wanna take the auto incriment stuff off i rate... have a normal 2 colums (or more... )

INSERT INTO temp_table (ID, value) VALUES
 ('1','a'),
 ('2','c'),
 ('3','e');

UPDATE table, temp_table SET table.value = temp_table.value WHERE table.ID = temp_table.ID;

0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 59

Accepted Solution

by:
Kevin Cross earned 300 total points
Comment Utility
The link just provided is exactly what the technique is to bring the data into MySQL either via XML or other MySQL import tools and then do your inserts / joins in mass.  If you do insert statement, you can tell it on duplicate key to update.

http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

Here is the inline XML parsing Article link:
http://www.experts-exchange.com/Database/MySQL/A_3574-A-MySQL-Tidbit-In-line-XML-Parsing.html

Here is an example create DDL for xmltmp:
CREATE TABLE `xmltmp` (
  `xml` longblob,
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1$$

Example update of row containing your nightly batch XML:
UPDATE xmltmp
SET xml = LOAD_FILE('path_to_xml_file.xml')
WHERE id = 1;

You can insert (showing with INSERT INTO ... SET because easy to change back and forth from UPDATE):
INSERT INTO xmltmp
SET xml = LOAD_FILE(...)
;

SELECT LAST_INSERT_ID();

Same as:
INSERT INTO xmltmp(xml) SELECT LOAD_FILE(...);

The part of the XML trick shown that uses util.numbers to grab the XML can be altered as such:
from util.numbers, (select xml from xmltmp where id = {id of insert above}) x
where n between 1 and extractvalue(xml, 'count({xpath that identifies a row node})')
and @rn:=n

Hope that helps!
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
...and by "The link" I was referring to the one from http:#a33711584 ... as you will see the XML parsing trick yields a result set, so if you put that in a derived table and join it to your table you are updating, you get the same update with join syntax shown in that link.

Speaking of which, you can see the "dos and don'ts" of updates with joins here:
http://www.experts-exchange.com/Database/Miscellaneous/A_1517-UPDATES-with-JOIN-for-everybody.html
0
 
LVL 2

Expert Comment

by:WilliamStam
Comment Utility
UPDATE table SET value = CASE
WHEN id = 1 THEN 'a';
WHEN id = 2 THEN 'b';
ELSE value
END;
0
 
LVL 2

Expert Comment

by:WilliamStam
Comment Utility
0
 
LVL 2

Expert Comment

by:WilliamStam
Comment Utility
@mwvisa1: not too sure on the version numebr thing.. something i saw on the net ages ago... now i cant find again lol.. so it prob wont work (i didnt test it, just stored it in that dark place you store ideas for future projects)
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
In the link you just posted http:#33711660 it has the syntax you mentioned in http:#a33711571 which is what I was referring to, but notice it says "you can’t do multiple changes to a table in a single UPDATE query like this". *smile*

Not a big deal, just was shocked to see that and thought I was going to learn something new of some new version of MySQL coming out.
0
 

Author Comment

by:stephenmp
Comment Utility
Thanks so much for your help...  I'm still trying the ideas you have given...

The temporary table idea would work if I could use it in conjunction with the insert (on duplicate update) command...

The insert on duplicate update would work if I could figure out how to do it with multiple rows...  I don't think it would work with multiple rows...

Example:  INSERT INTO vehicle_main.dsi_links (Vevo, VIN) VALUES ('testit', '1G1BL52WXRR116077' ) ON DUPLICATE KEY UPDATE Vevo = 'testit';

works...  But, I would need to be able to do this for 500-1000 at a time...  php/mysql forces me to do one statement at a time...  I don't think we could combine it with this statement could we?

INSERT INTO temp_table (ID, value) VALUES
 ('1','a'),
 ('2','c'),
 ('3','e');


0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
You can use ON DUPLICATE KEY update with the multi-insert, so that should be fine.  So if you opt to build the multi-line sql string in PHP and pass at once in batches that should be fine.  If you find that to be a challenge, then you can upload the file to the server where it can be accessed and execute the LOAD_FILE statement from PHP.  If the data is in MySQL temp table where you can use INSERT INTO ... SELECT you can still use ON DUPLICATE KEY.
0
 
LVL 2

Expert Comment

by:Gatherer_Hade
Comment Utility
Another approach to speeding it up is to dump your data to a text file and use the LOAD DATA statement, which is designed for loading large number of records at very high speed.

here is a link to more info on this
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
That is pretty much what I have been saying is an alternative.  It is just that LOAD DATA INFILE is typically used with a data dump created from LOAD DATA OUTFILE.  The question states this is originally in an XML file, which is why I was suggesting LOAD_FILE but the concept is the same.  Utilize the server and MySQL utilities to quickly load the data in bulk.  Still have the issue of parsing and inserting/updating the other table, but from MySQL the multiple statement SQL becomes a non-concern as stored procedure or something similar can be used.
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this.Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it is …
This article describes how to use the timestamp of existing data in a database to allow Tableau to calculate the prior work day instead of relying on case statements or if statements to calculate the days of the week.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now