Solved

PHP/MYSQL Problems -- Limit/Query issue?

Posted on 2012-04-09
6
334 Views
Last Modified: 2012-04-18
Hello,

I have been having a problem with my preparation of files lately when using mysql/php. Overall, it works in general, until you start using it in a heavy/often mannor.

Let me jump right into how it works:
-I have a list of emails, can be rather large.
-I have a list of suppression list emails or MD5s, aka emails/md5s that have opted out, don't want to be mailed with, etc. The list that MUST be used, as obviously we don't want to mail to people who have requested to be removed, or by the affiliates opt-out lists.


This generally is a simple thing, it works fine for most cases, but we been seeing some wierd issues. Let me give the query first:

$suppresionTablesArr etc are the tables that were found previously attached to whatever I'm preparing with in a array.

$selectInfo = 'select l.* from mta_list'.getListIDFormated($listID).' l ';

$supp = 0;
$wsupp = 0;
$msupp = 0;

foreach ($suppresionTablesArr as $suppresionTable) {
	$supp++;
	$selectInfo .= 'left join '.$suppresionTable.' s'.$supp.' on s'.$supp.'.email = l.email ';
}

foreach ($wordsuppresionTablesArr as $suppresionTable) {
	$qIery = "SELECT word from " . $suppresionTable . " LIMIT 1";
	$resultX = mysql_query($qIery);
	$wtype = 1;
	while ($rowAllMta = mysql_fetch_object($resultX)) {
		$pos = strpos($rowAllMta->word, ".");
		if ($pos === false) {
			$wtype=0;
		}
	}

	if ($wtype == 1) {
		$wsupp++;
		$selectInfo .= 'left join '.$suppresionTable.' w'.$wsupp.' on w'.$wsupp.'.word = substring_index(l.email,\'@\',-1) ';
	} else {
		$msupp++;
		$selectInfo .= 'left join '.$suppresionTable.' m'.$msupp.' on m'.$msupp.'.word = l.id2 ';
	}
}

$selectInfo .= 'WHERE l.ID > '.$lastSent.' AND l.ListDate IN ('.$campaignListDate.') ';

$firsttime = 0;
if ($supp >= 1 ||$msupp >= 1 || $wsupp >= 1) {
	$selectInfo .= 'and (';

	for ($ii = 1; $ii <= $supp; $ii++) {
			if ($firsttime == 0) {
			$selectInfo .= 's'.$ii.'.email is null ';
			$firsttime = 1;
		} else {
			$selectInfo .= 'or s'.$ii.'.email is null ';
		}
	}
	for ($ii = 1; $ii <= $msupp; $ii++) {
			if ($firsttime == 0) {
			$selectInfo .= 'm'.$ii.'.word is null ';
			$firsttime = 1;
		} else {
			$selectInfo .= 'or m'.$ii.'.word is null ';
		}
	}
	for ($ii = 1; $ii <= $wsupp; $ii++) {
			if ($firsttime == 0) {
			$selectInfo .= 'w'.$ii.'.word is null ';
			$firsttime = 1;
		} else {
			$selectInfo .= 'or w'.$ii.'.word is null ';
		}
	}
	$selectInfo .= ') ';
}

$selectInfo .= 'LIMIT '.$recordsInAFile;

$result = mysql_query($selectInfo);

while ($row = mysql_fetch_object($result)) {
...Dostuff
$lastID = $row->ID;
}

Open in new window


What it does here is this coding is called every 2000 'valid' records until there are no more records. It sends the $lastID, which is the LAST ID that is has parsed until there are no more records left to be parsed.

An example on how the query looks like when its in use

select l.*
from mta_list0510 l left join mta_suppList2296 s1 on s1.email = l.email left join mta_suppresionWordsLists0085 w1 on w1.word = substring_index(l.email,'@',-1) left join mta_suppresionWordsLists0083 m1 on m1.word = l.id2 WHERE l.ID > 0
 AND l.ListDate IN (1) and (s1.email is null or m1.word is null or w1.word is null ) LIMIT 2000

Open in new window



The following above is part of a change I did with some help by people here at EE, it works faster now, but still produces the same error it had been issuing on. The error is that with giant suppression lists (20-30 million) that SOMETIMES (randomly) it will not prepare properly. If I echo the $lastID in the while loop, you can see it going up in HIGH increments, sometimes in the 100000-300000 region per increment, almost as if the suppression lists are suppressing alot of emails/MD5s but shouldn't be. It ends up preparing only 1-5 files (2000-10000 records maybe used), and it's not due to emails/md5s being suppressed. If I set it to run more then once, sometimes it will actually work and prepare properly, preparing all the files that need to be prepared, almost like a indexing/etc issue.

the email/md5s are indexed, the IDs are primary in my database. So it should be correct as well. Obviously if they were not indexed with this amount of data it would crawl.


I am wondering if LIMIT is working as it should, its almost as if for some reason it doesn't grab the table in order (I've tried ORDER BY l.ID LIMIT 2000 before too, no change) and grabs IDs in a random fashion that meet the criteria of being greater then the last ID used.

So if I had the IDs
1
2
5
6
8
9
10
15
20
21
22

Typically it would grab this in order, but sometimes I think it is grabbing it by ANYTHING meeting the criteria, so l.ID > 6 LIMIT 3  means that 8,9,10 would be next, but maybe instead it does 15,20,22 because the criteria is still valid there in some odd case, could be wrong though, it's hard to say. I do know that it IS jumping a large amount of records for some reason.


Anyone have any ideas? Let me know if you need more to say, I'm trying to explain it but it's difficult for myself. THank you.
0
Comment
Question by:Valleriani
  • 2
  • 2
  • 2
6 Comments
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 37826590
That doesn't look right to me.  To handle so many records in batches, I would be using LIMIT 0,2000 for the first batch, LIMIT 2000,2000 for the second, LIMIT 4000,2000 for the third and so on.  In that syntax, the first number after LIMIT is the starting row and the second number is the number of rows.

http://dev.mysql.com/doc/refman/5.1/en/select.html
0
 
LVL 29

Assisted Solution

by:Olaf Doschke
Olaf Doschke earned 300 total points
ID: 37826631
Dave's solution should work.

I also think your observation l.ID > 6 LIMIT 3  might not take the first 3 value 8,9,10 is correct, you would only ensure to get 8,9,10 if you also ORDER BY ID. Using both limit parameters should solve your problem, too.

Bye, Olaf.
0
 
LVL 7

Author Comment

by:Valleriani
ID: 37828805
I tried the LIMIT 2000,2000, LIMIT 4000,2000, etc but it ended up creating a tiny amount of files too, didn't seem to work as it should of. It seemed like it was more consistant but it was skipping a huge amount, even when not using any suppression lists and just a basic grab. Have to also remember that sometimes stuff is deleted from the main lists so the IDS might not be 1,2,3,4,5 .. could be 1,3,4,5,6,8,10,11,12 etc.. Maybe that affects something but I am not sure. This was consistent though and it 'never' worked. Unlike the current one which works most of the time but fails randomly with the bigger word suppression lists.

I have been using ORDER BY l.ID LIMIT 2000 as well (as stated in the first post), but that hasn't worked either. It still seems to skip randomly and didn't make a difference.
0
Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

 
LVL 83

Accepted Solution

by:
Dave Baldwin earned 200 total points
ID: 37828919
I don't know what you mean by 'skip randomly'.  The LIMIT 2000 statement means return the first 2000 records for that query.  LIMIT 2000,2000 means go to the 2000th row that matches the query and get the next 2000 rows that match after that.  The numbers in a LIMIT statement are counters and are not related to your ids.

If you are deleting matching rows each time, then you probably don't need the starting row count and LIMIT 2000 may well work fine for you because it just keeps getting 2000 matching rows starting at the first record it finds according to the rest of the query.  And yes, you need ORDER BY to enforce any kind of order.  LIMIT does not do that at all.  It just LIMITs the number of rows returned and optionally sets the count for the start of the response.

For test purposes, you may need to break your query into smaller pieces and make sure the pieces are returning the info you need.
0
 
LVL 29

Assisted Solution

by:Olaf Doschke
Olaf Doschke earned 300 total points
ID: 37831352
Also: If using the LIMIT with both parameters, don't change the query by a where clause  a la ID>minimumID, make the same query only with different starting row.

Bye, Olaf.
0
 
LVL 7

Author Closing Comment

by:Valleriani
ID: 37859362
Ah that would be it, thanks!
0

Featured Post

Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
Many companies are looking to get out of the datacenter business and to services like Microsoft Azure to provide Infrastructure as a Service (IaaS) solutions for legacy client server workloads, rather than continuing to make capital investments in h…
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

895 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now