Why does this query take so long?

Here's my query:

EXPLAIN SELECT id, actor_id, actor_display_name, posted_time, display_name, geo_coords_lat, geo_coords_lon, location_name, posted_day FROM verizon WHERE (posted_day BETWEEN '2014-03-05' and '2014-03-18') and (geo_coords_lat BETWEEN '26' and '26.25') and (geo_coords_lon BETWEEN '-80.25' and '-80') order by id ASC LIMIT 10000

When I run the "explain" function, I get this:

i      select     table         type        possible_keys                                                            key                  k_len           ref           rows      
1    SIMPLE   Verizon    index       posted_day, geo_coords_lat, geo_coords_lon     PRIMARY        4                  NULL     223953

The last column was "Extra" which read "Using where."

At first glance, I'm stoked because it appears as though my indexes are doing exactly as they're supposed to do in that they're taking the 250,000,000 rows and reducing it to a very manageable collection of rows.

But the process, which I have below, is taking anywhere from 20-25 minutes, which makes no sense in that 223953 rows should sing.

What am I doing that's clogging the pipes. Theoretically, everything looks great. Practically, we're needing some major improvement.


$crystal="SELECT id, actor_id, actor_display_name, posted_time, display_name, geo_coords_lat, geo_coords_lon, location_name, posted_day FROM verizon WHERE (posted_day BETWEEN '$start_date' and '$end_date') and (geo_coords_lat BETWEEN '$latitude_1' and '$latitude_2') and (geo_coords_lon BETWEEN '$longitude_1' and '$longitude_2') order by id ASC LIMIT 10000";
$crystal_query=mysqli_query($cxn, $crystal)
or die("Crystal didn't happen.");
	$verizon_id=mysqli_real_escape_string($cxn, $crystal_row['id']);
	$the_actor_id= mysqli_real_escape_string($cxn,$crystal_row['actor_id']);
	$the_actor_display_name= mysqli_real_escape_string($cxn,$crystal_row['actor_display_name']);
	$the_posted_time= mysqli_real_escape_string($cxn,$crystal_row['posted_time']);
	$the_geo_coords_lat= mysqli_real_escape_string($cxn,$crystal_row['geo_coords_lat']);
	$the_geo_coords_lon= mysqli_real_escape_string($cxn,$crystal_row['geo_coords_lon']);
	$the_location_name= mysqli_real_escape_string($cxn,$crystal_row['location_name']);
	$insert = "insert into twitter_csv (verizon_id, actor_id, actor_display_name, posted_time, geo_coords_lat, geo_coords_lon, location_name, posted_day) 
	values ('$verizon_id', '$the_actor_id', '$the_actor_display_name', '$the_posted_time', '$the_geo_coords_lat', '$the_geo_coords_lon', '$the_location_name', '$the_posted_day')";
		$insertexe = mysqli_query($cxn, $insert);
		if(!$insertexe) {
		$error = mysqli_errno($cxn).': '.mysqli_error($cxn);

Open in new window

PS: Don't be distracted by the LIMIT 10000. I did that thinking that by breaking things down into bite sized chunks, I was streamlining the process. Maybe, maybe not. But the problem is in the amount of time the initial query is taking. Once I saw the EXPLAIN, I was certain that I'm missing something.
Bruce GustPHP DeveloperAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

I would make sure that you have properly indexed those tables for optimal performance in your database. Indexing the tables properly will severely reduce the time :)
Dave BaldwinFixer of ProblemsCommented:
The key to the slowness is the use of BETWEEN in 3 different WHERE clauses.  I believe that MySQL has to create 3 different sorted indexes and cross reference them to find the ones where there is a match for all three clauses.  I don't think that creating indexes will help much either because the BETWEEN clauses force MySQL to go thru the whole table each time.  Try a limit of 10 and I think you will see very little change in the amount of time that it takes.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
David -
I only saw 2 BETWEENS the first time i looked :(  
good eye

IT Pros Agree: AI and Machine Learning Key

We’d all like to think our company’s data is well protected, but when you ask IT professionals they admit the data probably is not as safe as it could be.

Bruce GustPHP DeveloperAuthor Commented:
Dave, I've been reading while I've been waiting for some feedback and your counsel resonates with what I've discovered thus far.

I know what you're saying is correct only because as I've played with the database directly, I can see how things absolutely fly when I'm doing a specific equality as opposed to a "between."

Can you think of a creative way in which I can break things up so I can serve my user (who's going to be using a range of geo_coords as well as dates) so I can get them their answer without having to clog the pipes?
Dave BaldwinFixer of ProblemsCommented:
Nope.  You have created a pipe-clogging scenario.  What you are currently trying to do will never be quick.  Too much data combined with a slow method.
Dan CraciunIT ConsultantCommented:
Looks like it should be a bit faster in MySQL 5.6 or newer: https://dev.mysql.com/doc/refman/5.6/en/index-condition-pushdown-optimization.html

WIth the new "Index Condition Pushdown Optimization" it should limit the full table scans.

Ray PaseurCommented:
You might want to do this in two queries.  Whether this is a good idea or not may depend on the number of rows you expect to get in the results set.  A sensible design might go something like this (pidgin code):

( SELECT * FROM verizon WHERE WHERE posted_day BETWEEN '$start_date' and '$end_date' )

Now you would have a smaller table.  Not sure how much smaller, but...

See the proximity calculator in this article for a way to down-select into a temporary table.
Bruce GustPHP DeveloperAuthor Commented:
Ray, I was thinking about doing that and I was playing with the idea in phpMyAdmin and I got an error that indicated my innodb_buffer_pool_size was too small.

Does creating a temp table via php eliminate that problem?
Ray PaseurCommented:
I don't know; it's a data-dependent problem and only you have the data.  It's something to test, but like I wrote in earlier questions, you're working with data at a very large scale.  You might want to break this one giant table up into tables by day, so you would be working with 365 tables, each of a more manageable size.  I'm guessing you haven't tried that yet?

You might also want to consider making up a test data set and posting it for us here.  I would recommend selecting every 1,000th row out of your big table.  That would create a test data set that contained about 250,000 rows, presumably with a more-or-less representative and well-distributed subset of the big collection.  Once you have the data uploaded, you can make reference to the uploaded file in this and future questions.  You can use the "Attach File" link below the comment box.

If I have that small test data set, I can show you tested examples of the logic for things like a down-select into a memory table or a design that uses tables per day or per month, etc.
Are the lat and lng indexed? That could certainly slow you down. Which is why you should be using a geo spatial index.
Brian TaoSenior Business Solutions ConsultantCommented:
I think the bottleneck would be in the while loop with the insert statement.  You were trying to insert 223953 rows one by one, meaning that the DB server has to process your insert as many times.
Have you tried commenting out the insert part and see how long it takes?
Ray PaseurCommented:
Agree with taoyipai: There are just too many moving parts to this application, compounded by millions of lines of data that seems to get copied over and over.  Check this idea and see if it can help you consolidate some things:

The query string would look something like this (untested, awaiting test data) -- not sure about display_name column.
INSERT INTO twitter_csv 
( verizon_id
, actor_id
, actor_display_name
, posted_time
, display_name
, geo_coords_lat
, geo_coords_lon
, location_name
, posted_day
, actor_id
, actor_display_name
, posted_time
, display_name
, geo_coords_lat
, geo_coords_lon
, location_name
, posted_day 
FROM verizon 
WHERE (posted_day     BETWEEN '$start_date'  AND '$end_date') 
AND   (geo_coords_lat BETWEEN '$latitude_1'  AND '$latitude_2') 
AND   (geo_coords_lon BETWEEN '$longitude_1' AND '$longitude_2') 
LIMIT 10000

Open in new window

I'd also hope you come to understand the danger in this line of code.  Don't ever write the extract() function again or for that matter, compact().  These functions blur the line between code and data in ways that can cause your scripts to fail without warning when a variable name collision occurs, thus they constitute a code smell.  You do not want that on your resume!  The function is uncalled for in this context and it causes a proliferation of variables in your symbol table.  More variables means more potential failure points, so it's best to just leave it out.
// extract($crystal_row); OMIT THIS

Open in new window

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.