Solved

PHP/MYSQL Problems -- Limit/Query issue?

Posted on 2012-04-09
6
333 Views
Last Modified: 2012-04-18
Hello,

I have been having a problem with my preparation of files lately when using mysql/php. Overall, it works in general, until you start using it in a heavy/often mannor.

Let me jump right into how it works:
-I have a list of emails, can be rather large.
-I have a list of suppression list emails or MD5s, aka emails/md5s that have opted out, don't want to be mailed with, etc. The list that MUST be used, as obviously we don't want to mail to people who have requested to be removed, or by the affiliates opt-out lists.


This generally is a simple thing, it works fine for most cases, but we been seeing some wierd issues. Let me give the query first:

$suppresionTablesArr etc are the tables that were found previously attached to whatever I'm preparing with in a array.

$selectInfo = 'select l.* from mta_list'.getListIDFormated($listID).' l ';

$supp = 0;
$wsupp = 0;
$msupp = 0;

foreach ($suppresionTablesArr as $suppresionTable) {
	$supp++;
	$selectInfo .= 'left join '.$suppresionTable.' s'.$supp.' on s'.$supp.'.email = l.email ';
}

foreach ($wordsuppresionTablesArr as $suppresionTable) {
	$qIery = "SELECT word from " . $suppresionTable . " LIMIT 1";
	$resultX = mysql_query($qIery);
	$wtype = 1;
	while ($rowAllMta = mysql_fetch_object($resultX)) {
		$pos = strpos($rowAllMta->word, ".");
		if ($pos === false) {
			$wtype=0;
		}
	}

	if ($wtype == 1) {
		$wsupp++;
		$selectInfo .= 'left join '.$suppresionTable.' w'.$wsupp.' on w'.$wsupp.'.word = substring_index(l.email,\'@\',-1) ';
	} else {
		$msupp++;
		$selectInfo .= 'left join '.$suppresionTable.' m'.$msupp.' on m'.$msupp.'.word = l.id2 ';
	}
}

$selectInfo .= 'WHERE l.ID > '.$lastSent.' AND l.ListDate IN ('.$campaignListDate.') ';

$firsttime = 0;
if ($supp >= 1 ||$msupp >= 1 || $wsupp >= 1) {
	$selectInfo .= 'and (';

	for ($ii = 1; $ii <= $supp; $ii++) {
			if ($firsttime == 0) {
			$selectInfo .= 's'.$ii.'.email is null ';
			$firsttime = 1;
		} else {
			$selectInfo .= 'or s'.$ii.'.email is null ';
		}
	}
	for ($ii = 1; $ii <= $msupp; $ii++) {
			if ($firsttime == 0) {
			$selectInfo .= 'm'.$ii.'.word is null ';
			$firsttime = 1;
		} else {
			$selectInfo .= 'or m'.$ii.'.word is null ';
		}
	}
	for ($ii = 1; $ii <= $wsupp; $ii++) {
			if ($firsttime == 0) {
			$selectInfo .= 'w'.$ii.'.word is null ';
			$firsttime = 1;
		} else {
			$selectInfo .= 'or w'.$ii.'.word is null ';
		}
	}
	$selectInfo .= ') ';
}

$selectInfo .= 'LIMIT '.$recordsInAFile;

$result = mysql_query($selectInfo);

while ($row = mysql_fetch_object($result)) {
...Dostuff
$lastID = $row->ID;
}

Open in new window


What it does here is this coding is called every 2000 'valid' records until there are no more records. It sends the $lastID, which is the LAST ID that is has parsed until there are no more records left to be parsed.

An example on how the query looks like when its in use

select l.*
from mta_list0510 l left join mta_suppList2296 s1 on s1.email = l.email left join mta_suppresionWordsLists0085 w1 on w1.word = substring_index(l.email,'@',-1) left join mta_suppresionWordsLists0083 m1 on m1.word = l.id2 WHERE l.ID > 0
 AND l.ListDate IN (1) and (s1.email is null or m1.word is null or w1.word is null ) LIMIT 2000

Open in new window



The following above is part of a change I did with some help by people here at EE, it works faster now, but still produces the same error it had been issuing on. The error is that with giant suppression lists (20-30 million) that SOMETIMES (randomly) it will not prepare properly. If I echo the $lastID in the while loop, you can see it going up in HIGH increments, sometimes in the 100000-300000 region per increment, almost as if the suppression lists are suppressing alot of emails/MD5s but shouldn't be. It ends up preparing only 1-5 files (2000-10000 records maybe used), and it's not due to emails/md5s being suppressed. If I set it to run more then once, sometimes it will actually work and prepare properly, preparing all the files that need to be prepared, almost like a indexing/etc issue.

the email/md5s are indexed, the IDs are primary in my database. So it should be correct as well. Obviously if they were not indexed with this amount of data it would crawl.


I am wondering if LIMIT is working as it should, its almost as if for some reason it doesn't grab the table in order (I've tried ORDER BY l.ID LIMIT 2000 before too, no change) and grabs IDs in a random fashion that meet the criteria of being greater then the last ID used.

So if I had the IDs
1
2
5
6
8
9
10
15
20
21
22

Typically it would grab this in order, but sometimes I think it is grabbing it by ANYTHING meeting the criteria, so l.ID > 6 LIMIT 3  means that 8,9,10 would be next, but maybe instead it does 15,20,22 because the criteria is still valid there in some odd case, could be wrong though, it's hard to say. I do know that it IS jumping a large amount of records for some reason.


Anyone have any ideas? Let me know if you need more to say, I'm trying to explain it but it's difficult for myself. THank you.
0
Comment
Question by:Valleriani
  • 2
  • 2
  • 2
6 Comments
 
LVL 82

Expert Comment

by:Dave Baldwin
Comment Utility
That doesn't look right to me.  To handle so many records in batches, I would be using LIMIT 0,2000 for the first batch, LIMIT 2000,2000 for the second, LIMIT 4000,2000 for the third and so on.  In that syntax, the first number after LIMIT is the starting row and the second number is the number of rows.

http://dev.mysql.com/doc/refman/5.1/en/select.html
0
 
LVL 29

Assisted Solution

by:Olaf Doschke
Olaf Doschke earned 300 total points
Comment Utility
Dave's solution should work.

I also think your observation l.ID > 6 LIMIT 3  might not take the first 3 value 8,9,10 is correct, you would only ensure to get 8,9,10 if you also ORDER BY ID. Using both limit parameters should solve your problem, too.

Bye, Olaf.
0
 
LVL 7

Author Comment

by:Valleriani
Comment Utility
I tried the LIMIT 2000,2000, LIMIT 4000,2000, etc but it ended up creating a tiny amount of files too, didn't seem to work as it should of. It seemed like it was more consistant but it was skipping a huge amount, even when not using any suppression lists and just a basic grab. Have to also remember that sometimes stuff is deleted from the main lists so the IDS might not be 1,2,3,4,5 .. could be 1,3,4,5,6,8,10,11,12 etc.. Maybe that affects something but I am not sure. This was consistent though and it 'never' worked. Unlike the current one which works most of the time but fails randomly with the bigger word suppression lists.

I have been using ORDER BY l.ID LIMIT 2000 as well (as stated in the first post), but that hasn't worked either. It still seems to skip randomly and didn't make a difference.
0
Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

 
LVL 82

Accepted Solution

by:
Dave Baldwin earned 200 total points
Comment Utility
I don't know what you mean by 'skip randomly'.  The LIMIT 2000 statement means return the first 2000 records for that query.  LIMIT 2000,2000 means go to the 2000th row that matches the query and get the next 2000 rows that match after that.  The numbers in a LIMIT statement are counters and are not related to your ids.

If you are deleting matching rows each time, then you probably don't need the starting row count and LIMIT 2000 may well work fine for you because it just keeps getting 2000 matching rows starting at the first record it finds according to the rest of the query.  And yes, you need ORDER BY to enforce any kind of order.  LIMIT does not do that at all.  It just LIMITs the number of rows returned and optionally sets the count for the start of the response.

For test purposes, you may need to break your query into smaller pieces and make sure the pieces are returning the info you need.
0
 
LVL 29

Assisted Solution

by:Olaf Doschke
Olaf Doschke earned 300 total points
Comment Utility
Also: If using the LIMIT with both parameters, don't change the query by a where clause  a la ID>minimumID, make the same query only with different starting row.

Bye, Olaf.
0
 
LVL 7

Author Closing Comment

by:Valleriani
Comment Utility
Ah that would be it, thanks!
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
This article discusses four methods for overlaying images in a container on a web page
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now