Using PHP to import 600,000 records from csv into mySQL - timing out

I have a 212MB text file containing about 600,000 records I am trying to import into mySQL using php.  There is a lot of manipulation that must be done so I can not do a straight import into mySQL.  The import works great and handles about 8,000 recrods per minute.  My problem is I am running the php script by calling it thorough a web browser.  My issues are:

1) When it gets to about the 93,000 record, it stops.  I have set the set_time_limit property to 0 so I'm not sure why this is happening

2) How can I monitor the progress of the script?  I am currently sending output to the webbrowser with the count of the record number but I'm thinking there has to be a better way.  Is there a way to run/monitor the script locally on the server?  If so, can you pause the execution somehow?

Who is Participating?
Marcus BointonCommented:
That would take a very long time - maybe 1 record per second if you're lucky!

You could run it from a command line on the server, that way you're not subject to HTTP overhead, timeouts or apache memory constraints. You can have it print progress lines as it imports. It's doesn't require any special preparation - just say 'php myscript.php' on a command line, and off it goes. Bear in mind that you want plain text and not HTML for output.

While it's running you could open another terminal and use the mysql client to monitor it (e.g. 'SELECT count(*) FROM mytable;'). To pause it, you could set a shared memory value that it polls (see, or a database value, or add a signal handler and send it signals (see: These are not often mentioned in PHP-land, but they can work very nicely.
what you need to do is a script that perform one insert a time.
the output of the script should produce a form with the next record to be processed and this form should be submitted automatically when the page is finished loaded
<body onLoad="myForm.submit()">
<form action="myScript.php" action="post"
<input type=hidden name=id value="<? echo $id ?>">
<? echo "Processing next record is " . $id ?>

this way you only process one record at a time and you are not limited by any time out or size restriction in mysql!
doughomanAuthor Commented:
Wouldn't this require the script to open the text file with every call and then read through all the previous records?  I am currently reading the text file like this:
$fp = fopen('/home/doug/import.txt','r') or die("can't open file");
while($csv_line = fgetcsv($fp,1024)) {
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

yes you will need to read the file everythime, however, I will suggest to work using a copy of the file, and every time that you do an insert, trim the record from the file that has been processed. this way, the beginning of your file will be the next record to process. therefore as your program process more and more record, your working file become smaller and smaller.
I also intend to agree with squinky, one record at a time is maybe too much, however nothing stop you to use the same logic by group of 10,100 or x number of record to process! You are just storing the key of the next record to process. therefore you can do 100 insert, storing the key of the last record process and start again by calling itself.
There is a set_time_limit(int seconds) function in PHP. You can check it out. I think if you set to 0, it will work for no limit.

You can also check out the value of max_execution_time in /etc/php.ini. Refer

a few things:
- set_time_limit won't work if your server is running safe_mode
- depending on the server, you won't be able to override the default without modifying PHP.INI directly (as gtkfreak noted).
- I would consider doing this as a step by step process:
1. script that starts off by taking the input file, and splitting it into a bunch of output files in a particular subdir, say 25-50K records per output for safety.  then launches script 2:
2. script with a form that shows last processed steps output (as a POST or GET param), and a GO button to do the next set.  When GO is clicked, the script finds the next 'split' file in the subdir, processes it into the DB, then DELETES it, then posts state back to itself for the next run.
3. if the script runs out of files in the subdir to process, it prints out that it's done, gives a quick SELECT count(*) from the database to prove how many records were inserted (though, you could do that every step when the form reloads...).

At 8K records/sec, if you split into 40K files, you'd get 'feedback' by completion of a step and the form reloading after 5s from clicking GO.  click GO 15 times, and you should be all set for 60K records.

doughomanAuthor Commented:
So far, I've like Squinky's suggestion of running the script from the command prompt using 'php myscript.php' on a command line.  That was the answer I was looking for.  How do I get the program to display output to the commandline window?  My echo and print commands are not doing anything.  I am able to see that the script is updating the database by going into phpmyadmin and seeing the tables growing.

The links provided to control the execution in the script using shared memory and signals were beyond my abilities.  Is there any easier way to control this or can a sample be provided for how I would set either of these up?

Maybe MySQL is timing out. in /ect/my.cnf, you can tweak mysql's settings. See the mysql manual for details.

MySQL by default ca only take a 16MB file at once, I think it is 16MB, could be off a bit.

You could also just manually break it up into about 10 different files.

my my.cnf:

set-variable = innodb_buffer_pool_size=500M
set-variable = innodb_additional_mem_pool_size=100M
set-variable = innodb_file_io_threads=4
set-variable = innodb_lock_wait_timeout=50




# Remove the next comment character if you are not familiar with SQL



for checking status, you can simply write to a file on disk as you progress through the process.  or just write a webserver page that refreshes every few seconds with a query of the number of records in the database...

echo/print should output to stdout, which should be the console.  hmmm.  wonder if you have some odd setting in your php.ini.

if you are running under linux, there should be a man page.  otherwise, check out:

doughomanAuthor Commented:
Running the script from the command line worked perfectly and did not timeout.  So that is not an issue anymore.

The only additional thing I would like to do is be able to see the progress and hopefully be able to pause execution.  Is there any kind of debugger program where you can step through each statement of your PHP script?  I am used to programming in VBA and being able to step through each line of code and pause execution.
Marcus BointonCommented:
You can't do that directly in plain PHP, but you can in some PHP IDEs (also breakpoints, call tracing etc), such as:

It's also nice to work with xdebug in regular PHP as it gives much better error messages and call stacks.

I've had trouble in the past with some kind of output buffering for stdout, where nothing appears until the script is finished, but for the most part if you run from a command line and just use print or echo, it will just appear on the console. One thing I find is that some shells effectively hide single lines of output as they get overwritten by the prompt, so it's important to add a line break on the end of anything you print. Try this as a test:

<?php echo "Hello\n"; ?>
Marcus BointonCommented:
BTW the MySQL packet size limits don't really have anything to do with the data file size, it's only the size of an individual query that matters. I've had no trouble importing multi-gigabyte SQL dumps through the mysql command line client. In this case, it sounds like it's a CSV file that's being turned into SQL with PHP and submitted one record at a time.
Actually, I meant to ask a similar question.

Why can't you pre-process the CSV file into the 'format' it needs to be in, and then use any of the mysql tools to do the insert?  I mean, this question has 'evolved' a lot since you originally asked it... ;)

Is the objective a tool that can be used again by you, by other people?  Is it a one time tool, but you are trying to use PHP and MySQL to 'learn on the job'?

I mean, seems like there's a number of fine solutions -- but the way you originally asked the question, I don't normally assume someone processing massive datasets actually can run commandline php, or a php debugger at that point.

Anyway, sounds like you are close to finished here.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.