Solved

PHP/MYSQL Problems -- Limit/Query issue?

Posted on 2012-04-09
6
340 Views
Last Modified: 2012-04-18
Hello,

I have been having a problem with my preparation of files lately when using mysql/php. Overall, it works in general, until you start using it in a heavy/often mannor.

Let me jump right into how it works:
-I have a list of emails, can be rather large.
-I have a list of suppression list emails or MD5s, aka emails/md5s that have opted out, don't want to be mailed with, etc. The list that MUST be used, as obviously we don't want to mail to people who have requested to be removed, or by the affiliates opt-out lists.


This generally is a simple thing, it works fine for most cases, but we been seeing some wierd issues. Let me give the query first:

$suppresionTablesArr etc are the tables that were found previously attached to whatever I'm preparing with in a array.

$selectInfo = 'select l.* from mta_list'.getListIDFormated($listID).' l ';

$supp = 0;
$wsupp = 0;
$msupp = 0;

foreach ($suppresionTablesArr as $suppresionTable) {
	$supp++;
	$selectInfo .= 'left join '.$suppresionTable.' s'.$supp.' on s'.$supp.'.email = l.email ';
}

foreach ($wordsuppresionTablesArr as $suppresionTable) {
	$qIery = "SELECT word from " . $suppresionTable . " LIMIT 1";
	$resultX = mysql_query($qIery);
	$wtype = 1;
	while ($rowAllMta = mysql_fetch_object($resultX)) {
		$pos = strpos($rowAllMta->word, ".");
		if ($pos === false) {
			$wtype=0;
		}
	}

	if ($wtype == 1) {
		$wsupp++;
		$selectInfo .= 'left join '.$suppresionTable.' w'.$wsupp.' on w'.$wsupp.'.word = substring_index(l.email,\'@\',-1) ';
	} else {
		$msupp++;
		$selectInfo .= 'left join '.$suppresionTable.' m'.$msupp.' on m'.$msupp.'.word = l.id2 ';
	}
}

$selectInfo .= 'WHERE l.ID > '.$lastSent.' AND l.ListDate IN ('.$campaignListDate.') ';

$firsttime = 0;
if ($supp >= 1 ||$msupp >= 1 || $wsupp >= 1) {
	$selectInfo .= 'and (';

	for ($ii = 1; $ii <= $supp; $ii++) {
			if ($firsttime == 0) {
			$selectInfo .= 's'.$ii.'.email is null ';
			$firsttime = 1;
		} else {
			$selectInfo .= 'or s'.$ii.'.email is null ';
		}
	}
	for ($ii = 1; $ii <= $msupp; $ii++) {
			if ($firsttime == 0) {
			$selectInfo .= 'm'.$ii.'.word is null ';
			$firsttime = 1;
		} else {
			$selectInfo .= 'or m'.$ii.'.word is null ';
		}
	}
	for ($ii = 1; $ii <= $wsupp; $ii++) {
			if ($firsttime == 0) {
			$selectInfo .= 'w'.$ii.'.word is null ';
			$firsttime = 1;
		} else {
			$selectInfo .= 'or w'.$ii.'.word is null ';
		}
	}
	$selectInfo .= ') ';
}

$selectInfo .= 'LIMIT '.$recordsInAFile;

$result = mysql_query($selectInfo);

while ($row = mysql_fetch_object($result)) {
...Dostuff
$lastID = $row->ID;
}

Open in new window


What it does here is this coding is called every 2000 'valid' records until there are no more records. It sends the $lastID, which is the LAST ID that is has parsed until there are no more records left to be parsed.

An example on how the query looks like when its in use

select l.*
from mta_list0510 l left join mta_suppList2296 s1 on s1.email = l.email left join mta_suppresionWordsLists0085 w1 on w1.word = substring_index(l.email,'@',-1) left join mta_suppresionWordsLists0083 m1 on m1.word = l.id2 WHERE l.ID > 0
 AND l.ListDate IN (1) and (s1.email is null or m1.word is null or w1.word is null ) LIMIT 2000

Open in new window



The following above is part of a change I did with some help by people here at EE, it works faster now, but still produces the same error it had been issuing on. The error is that with giant suppression lists (20-30 million) that SOMETIMES (randomly) it will not prepare properly. If I echo the $lastID in the while loop, you can see it going up in HIGH increments, sometimes in the 100000-300000 region per increment, almost as if the suppression lists are suppressing alot of emails/MD5s but shouldn't be. It ends up preparing only 1-5 files (2000-10000 records maybe used), and it's not due to emails/md5s being suppressed. If I set it to run more then once, sometimes it will actually work and prepare properly, preparing all the files that need to be prepared, almost like a indexing/etc issue.

the email/md5s are indexed, the IDs are primary in my database. So it should be correct as well. Obviously if they were not indexed with this amount of data it would crawl.


I am wondering if LIMIT is working as it should, its almost as if for some reason it doesn't grab the table in order (I've tried ORDER BY l.ID LIMIT 2000 before too, no change) and grabs IDs in a random fashion that meet the criteria of being greater then the last ID used.

So if I had the IDs
1
2
5
6
8
9
10
15
20
21
22

Typically it would grab this in order, but sometimes I think it is grabbing it by ANYTHING meeting the criteria, so l.ID > 6 LIMIT 3  means that 8,9,10 would be next, but maybe instead it does 15,20,22 because the criteria is still valid there in some odd case, could be wrong though, it's hard to say. I do know that it IS jumping a large amount of records for some reason.


Anyone have any ideas? Let me know if you need more to say, I'm trying to explain it but it's difficult for myself. THank you.
0
Comment
Question by:Valleriani
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
6 Comments
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 37826590
That doesn't look right to me.  To handle so many records in batches, I would be using LIMIT 0,2000 for the first batch, LIMIT 2000,2000 for the second, LIMIT 4000,2000 for the third and so on.  In that syntax, the first number after LIMIT is the starting row and the second number is the number of rows.

http://dev.mysql.com/doc/refman/5.1/en/select.html
0
 
LVL 29

Assisted Solution

by:Olaf Doschke
Olaf Doschke earned 300 total points
ID: 37826631
Dave's solution should work.

I also think your observation l.ID > 6 LIMIT 3  might not take the first 3 value 8,9,10 is correct, you would only ensure to get 8,9,10 if you also ORDER BY ID. Using both limit parameters should solve your problem, too.

Bye, Olaf.
0
 
LVL 7

Author Comment

by:Valleriani
ID: 37828805
I tried the LIMIT 2000,2000, LIMIT 4000,2000, etc but it ended up creating a tiny amount of files too, didn't seem to work as it should of. It seemed like it was more consistant but it was skipping a huge amount, even when not using any suppression lists and just a basic grab. Have to also remember that sometimes stuff is deleted from the main lists so the IDS might not be 1,2,3,4,5 .. could be 1,3,4,5,6,8,10,11,12 etc.. Maybe that affects something but I am not sure. This was consistent though and it 'never' worked. Unlike the current one which works most of the time but fails randomly with the bigger word suppression lists.

I have been using ORDER BY l.ID LIMIT 2000 as well (as stated in the first post), but that hasn't worked either. It still seems to skip randomly and didn't make a difference.
0
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

 
LVL 83

Accepted Solution

by:
Dave Baldwin earned 200 total points
ID: 37828919
I don't know what you mean by 'skip randomly'.  The LIMIT 2000 statement means return the first 2000 records for that query.  LIMIT 2000,2000 means go to the 2000th row that matches the query and get the next 2000 rows that match after that.  The numbers in a LIMIT statement are counters and are not related to your ids.

If you are deleting matching rows each time, then you probably don't need the starting row count and LIMIT 2000 may well work fine for you because it just keeps getting 2000 matching rows starting at the first record it finds according to the rest of the query.  And yes, you need ORDER BY to enforce any kind of order.  LIMIT does not do that at all.  It just LIMITs the number of rows returned and optionally sets the count for the start of the response.

For test purposes, you may need to break your query into smaller pieces and make sure the pieces are returning the info you need.
0
 
LVL 29

Assisted Solution

by:Olaf Doschke
Olaf Doschke earned 300 total points
ID: 37831352
Also: If using the LIMIT with both parameters, don't change the query by a where clause  a la ID>minimumID, make the same query only with different starting row.

Bye, Olaf.
0
 
LVL 7

Author Closing Comment

by:Valleriani
ID: 37859362
Ah that would be it, thanks!
0

Featured Post

Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As technology users and professionals, we’re always learning. Our universal interest in advancing our knowledge of the trade is unmatched by most industries. It’s a curiosity that makes sense, given the climate of change. Within that, there lies a…
Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

729 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question