Solved

Why does this query take so long?

Posted on 2014-10-14
12
380 Views
Last Modified: 2014-10-16
Here's my query:

EXPLAIN SELECT id, actor_id, actor_display_name, posted_time, display_name, geo_coords_lat, geo_coords_lon, location_name, posted_day FROM verizon WHERE (posted_day BETWEEN '2014-03-05' and '2014-03-18') and (geo_coords_lat BETWEEN '26' and '26.25') and (geo_coords_lon BETWEEN '-80.25' and '-80') order by id ASC LIMIT 10000

When I run the "explain" function, I get this:

i      select     table         type        possible_keys                                                            key                  k_len           ref           rows      
1    SIMPLE   Verizon    index       posted_day, geo_coords_lat, geo_coords_lon     PRIMARY        4                  NULL     223953

The last column was "Extra" which read "Using where."

At first glance, I'm stoked because it appears as though my indexes are doing exactly as they're supposed to do in that they're taking the 250,000,000 rows and reducing it to a very manageable collection of rows.

But the process, which I have below, is taking anywhere from 20-25 minutes, which makes no sense in that 223953 rows should sing.

What am I doing that's clogging the pipes. Theoretically, everything looks great. Practically, we're needing some major improvement.

Thoughts?

$crystal="SELECT id, actor_id, actor_display_name, posted_time, display_name, geo_coords_lat, geo_coords_lon, location_name, posted_day FROM verizon WHERE (posted_day BETWEEN '$start_date' and '$end_date') and (geo_coords_lat BETWEEN '$latitude_1' and '$latitude_2') and (geo_coords_lon BETWEEN '$longitude_1' and '$longitude_2') order by id ASC LIMIT 10000";
$crystal_query=mysqli_query($cxn, $crystal)
or die("Crystal didn't happen.");
	while($crystal_row=mysqli_fetch_assoc($crystal_query))
	{
	extract($crystal_row);
	$verizon_id=mysqli_real_escape_string($cxn, $crystal_row['id']);
	$the_actor_id= mysqli_real_escape_string($cxn,$crystal_row['actor_id']);
	$the_actor_display_name= mysqli_real_escape_string($cxn,$crystal_row['actor_display_name']);
	$the_posted_time= mysqli_real_escape_string($cxn,$crystal_row['posted_time']);
	$the_geo_coords_lat= mysqli_real_escape_string($cxn,$crystal_row['geo_coords_lat']);
	$the_geo_coords_lon= mysqli_real_escape_string($cxn,$crystal_row['geo_coords_lon']);
	$the_location_name= mysqli_real_escape_string($cxn,$crystal_row['location_name']);
	$the_posted_day=$crystal_row['posted_day'];
	$insert = "insert into twitter_csv (verizon_id, actor_id, actor_display_name, posted_time, geo_coords_lat, geo_coords_lon, location_name, posted_day) 
	values ('$verizon_id', '$the_actor_id', '$the_actor_display_name', '$the_posted_time', '$the_geo_coords_lat', '$the_geo_coords_lon', '$the_location_name', '$the_posted_day')";
		$insertexe = mysqli_query($cxn, $insert);
		if(!$insertexe) {
		$error = mysqli_errno($cxn).': '.mysqli_error($cxn);
		die($error);
		}
	}

Open in new window


PS: Don't be distracted by the LIMIT 10000. I did that thinking that by breaking things down into bite sized chunks, I was streamlining the process. Maybe, maybe not. But the problem is in the amount of time the initial query is taking. Once I saw the EXPLAIN, I was certain that I'm missing something.
0
Comment
Question by:brucegust
  • 3
  • 2
  • 2
  • +4
12 Comments
 
LVL 22

Expert Comment

by:plusone3055
ID: 40380671
I would make sure that you have properly indexed those tables for optimal performance in your database. Indexing the tables properly will severely reduce the time :)
0
 
LVL 83

Accepted Solution

by:
Dave Baldwin earned 167 total points
ID: 40380674
The key to the slowness is the use of BETWEEN in 3 different WHERE clauses.  I believe that MySQL has to create 3 different sorted indexes and cross reference them to find the ones where there is a match for all three clauses.  I don't think that creating indexes will help much either because the BETWEEN clauses force MySQL to go thru the whole table each time.  Try a limit of 10 and I think you will see very little change in the amount of time that it takes.
0
 
LVL 22

Expert Comment

by:plusone3055
ID: 40380678
David -
I only saw 2 BETWEENS the first time i looked :(  
good eye

*Bows*
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:brucegust
ID: 40380686
Dave, I've been reading while I've been waiting for some feedback and your counsel resonates with what I've discovered thus far.

I know what you're saying is correct only because as I've played with the database directly, I can see how things absolutely fly when I'm doing a specific equality as opposed to a "between."

Can you think of a creative way in which I can break things up so I can serve my user (who's going to be using a range of geo_coords as well as dates) so I can get them their answer without having to clog the pipes?
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 40380697
Nope.  You have created a pipe-clogging scenario.  What you are currently trying to do will never be quick.  Too much data combined with a slow method.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 40380710
Looks like it should be a bit faster in MySQL 5.6 or newer: https://dev.mysql.com/doc/refman/5.6/en/index-condition-pushdown-optimization.html

WIth the new "Index Condition Pushdown Optimization" it should limit the full table scans.

HTH,
Dan
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 40380714
You might want to do this in two queries.  Whether this is a good idea or not may depend on the number of rows you expect to get in the results set.  A sensible design might go something like this (pidgin code):

CREATE TEMPORARY TABLE x
( SELECT * FROM verizon WHERE WHERE posted_day BETWEEN '$start_date' and '$end_date' )
ENGINE=MEMORY

Now you would have a smaller table.  Not sure how much smaller, but...

See the proximity calculator in this article for a way to down-select into a temporary table.
http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/A_4276-What-is-near-me-Proximity-calculations-using-PHP-and-MySQL.html
0
 

Author Comment

by:brucegust
ID: 40380774
Ray, I was thinking about doing that and I was playing with the idea in phpMyAdmin and I got an error that indicated my innodb_buffer_pool_size was too small.

Does creating a temp table via php eliminate that problem?
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 40380877
I don't know; it's a data-dependent problem and only you have the data.  It's something to test, but like I wrote in earlier questions, you're working with data at a very large scale.  You might want to break this one giant table up into tables by day, so you would be working with 365 tables, each of a more manageable size.  I'm guessing you haven't tried that yet?

You might also want to consider making up a test data set and posting it for us here.  I would recommend selecting every 1,000th row out of your big table.  That would create a test data set that contained about 250,000 rows, presumably with a more-or-less representative and well-distributed subset of the big collection.  Once you have the data uploaded, you can make reference to the uploaded file in this and future questions.  You can use the "Attach File" link below the comment box.

If I have that small test data set, I can show you tested examples of the logic for things like a down-select into a memory table or a design that uses tables per day or per month, etc.
0
 
LVL 58

Expert Comment

by:Gary
ID: 40380956
Are the lat and lng indexed? That could certainly slow you down. Which is why you should be using a geo spatial index.
0
 
LVL 9

Assisted Solution

by:Brian Tao
Brian Tao earned 167 total points
ID: 40381499
I think the bottleneck would be in the while loop with the insert statement.  You were trying to insert 223953 rows one by one, meaning that the DB server has to process your insert as many times.
Have you tried commenting out the insert part and see how long it takes?
0
 
LVL 110

Assisted Solution

by:Ray Paseur
Ray Paseur earned 166 total points
ID: 40381830
Agree with taoyipai: There are just too many moving parts to this application, compounded by millions of lines of data that seems to get copied over and over.  Check this idea and see if it can help you consolidate some things:
http://dev.mysql.com/doc/refman/5.0/en/insert-select.html

The query string would look something like this (untested, awaiting test data) -- not sure about display_name column.
$insert 
= 
"
INSERT INTO twitter_csv 
( verizon_id
, actor_id
, actor_display_name
, posted_time
, display_name
, geo_coords_lat
, geo_coords_lon
, location_name
, posted_day
) 
SELECT
  id
, actor_id
, actor_display_name
, posted_time
, display_name
, geo_coords_lat
, geo_coords_lon
, location_name
, posted_day 
FROM verizon 
WHERE (posted_day     BETWEEN '$start_date'  AND '$end_date') 
AND   (geo_coords_lat BETWEEN '$latitude_1'  AND '$latitude_2') 
AND   (geo_coords_lon BETWEEN '$longitude_1' AND '$longitude_2') 
ORDER BY id ASC 
LIMIT 10000
"
;

Open in new window


I'd also hope you come to understand the danger in this line of code.  Don't ever write the extract() function again or for that matter, compact().  These functions blur the line between code and data in ways that can cause your scripts to fail without warning when a variable name collision occurs, thus they constitute a code smell.  You do not want that on your resume!  The function is uncalled for in this context and it causes a proliferation of variables in your symbol table.  More variables means more potential failure points, so it's best to just leave it out.
// extract($crystal_row); OMIT THIS

Open in new window

0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
When table data gets too large to manage or queries take too long to execute the solution is often to buy bigger hardware or assign more CPUs and memory resources to the machine to solve the problem. However, the best, cheapest and most effective so…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question