We help IT Professionals succeed at work.

php with large mysql result set

rumblefiz
rumblefiz asked
on
here is my situation:

i have a client (travel agency) that has built up thier member database to over 450,000 members (that have requested to be e-mailed of updates etc... - the reason i mention this is because the news groups wouldnt help because they insisted i was doing spam). we have a custom built smtp engine to handle large volume e-mails that we wrote specifically for this client. the member info is stored in a mysql databse on a freebsd box. we ran a test with 15,000 of the e-mails and everything was sent and received in less than 4 minutes (roughly 220,000/hour). i am opening a socket to our smtp server and issuing smtp commands instead of using php's mail function as i don't want to open un-needed connections. so with 15,000 everything worked fine.

then we tried 80,000. php keeps timing out. i executed the sql statement:

SELECT DISTINCT email_address FROM master_list

at the sql command prompt and it returned in a couple seconds.

if i execute the following php code:

mysql_select_db("pvc");
$sSql = "SELECT DISTINCT email_address FROM master_list";
$iRes = mysql_query($sSql);
echo mysql_num_rows($iRes);

it times out. i guess i could break it into batches of 15,000 but i would like to grab all of the e-mails in one query. does anyone know of a way around this or to handle this?

again, i am NOT spamming. if i were, i would use majordomo or something already written instead of wasting my time coding this from scratch. however, it is critical that we code this our clients specs so that it will function seamlessly with the rest of the site.

thanks

- rumblefiz
Comment
Watch Question

Author

Commented:
i must also mention i have raised the timeout to over 10 minutes and it doesnt matter. it timesout no matter what i set the timeout to.

- rumblefiz
Commented:
We faced a similar problem with 300,000 DVD, CD and VHS tape titles normalized across a half-dozen tables on a shared server (don't ask!), with a somewhat more complex query ... our "magic number" was just over 10,000 results then it timed out.

Our solution was to do a single query (eventually we made it a sys procedure), but to do it several times as we looped through the tables 9,000 at a time ...

$sql = "SELECT count(id) as tot FROM whatever WHERE whatever";
$db->query("$sql");
$db->next_record();
$total = $db->f("tot");
$soFar = 0;
while ($soFar < $total) {
   $sql = "SELECT whatever FROM whatever WHERE whatever LIMIT $soFar,9000";
   $db->query("$sql");
   while ($db->next_record()) {
      ?>
      <TR><TD>
           <!-- ...... -->
     </TD></TR>
     <?
   }
   $soFar += 9000;
}

There was a bit more code to verify and handle the $total%9000, but this was the basic idea ... and it worked like a charm, in a weekly 2am cromjob, no less ...

Good luck!

Bob.

Commented:
the only problem I see with bowker's answer is that this creates a race condition.

If the DB is updated while he is in the middle of this query someone will either get left out or someone will get mailed twice (or both!).  This might not be a problem with a shared server (that was probably on a LAN somewhere) - but it could be a potentially very large problem with the travel agency where the DB is probably in constant turmoil throughout the entire day.

The reason this is a race condition is that the code depends on everything in the DB being just the way it is throughout the entire execution - if it changes bad stuff happens.

Derek

Explore More ContentExplore courses, solutions, and other research materials related to this topic.