Solved

Why does this query take so long?

Posted on 2014-10-14
12
240 Views
Last Modified: 2014-10-16
Here's my query:

EXPLAIN SELECT id, actor_id, actor_display_name, posted_time, display_name, geo_coords_lat, geo_coords_lon, location_name, posted_day FROM verizon WHERE (posted_day BETWEEN '2014-03-05' and '2014-03-18') and (geo_coords_lat BETWEEN '26' and '26.25') and (geo_coords_lon BETWEEN '-80.25' and '-80') order by id ASC LIMIT 10000

When I run the "explain" function, I get this:

i      select     table         type        possible_keys                                                            key                  k_len           ref           rows      
1    SIMPLE   Verizon    index       posted_day, geo_coords_lat, geo_coords_lon     PRIMARY        4                  NULL     223953

The last column was "Extra" which read "Using where."

At first glance, I'm stoked because it appears as though my indexes are doing exactly as they're supposed to do in that they're taking the 250,000,000 rows and reducing it to a very manageable collection of rows.

But the process, which I have below, is taking anywhere from 20-25 minutes, which makes no sense in that 223953 rows should sing.

What am I doing that's clogging the pipes. Theoretically, everything looks great. Practically, we're needing some major improvement.

Thoughts?

$crystal="SELECT id, actor_id, actor_display_name, posted_time, display_name, geo_coords_lat, geo_coords_lon, location_name, posted_day FROM verizon WHERE (posted_day BETWEEN '$start_date' and '$end_date') and (geo_coords_lat BETWEEN '$latitude_1' and '$latitude_2') and (geo_coords_lon BETWEEN '$longitude_1' and '$longitude_2') order by id ASC LIMIT 10000";
$crystal_query=mysqli_query($cxn, $crystal)
or die("Crystal didn't happen.");
	while($crystal_row=mysqli_fetch_assoc($crystal_query))
	{
	extract($crystal_row);
	$verizon_id=mysqli_real_escape_string($cxn, $crystal_row['id']);
	$the_actor_id= mysqli_real_escape_string($cxn,$crystal_row['actor_id']);
	$the_actor_display_name= mysqli_real_escape_string($cxn,$crystal_row['actor_display_name']);
	$the_posted_time= mysqli_real_escape_string($cxn,$crystal_row['posted_time']);
	$the_geo_coords_lat= mysqli_real_escape_string($cxn,$crystal_row['geo_coords_lat']);
	$the_geo_coords_lon= mysqli_real_escape_string($cxn,$crystal_row['geo_coords_lon']);
	$the_location_name= mysqli_real_escape_string($cxn,$crystal_row['location_name']);
	$the_posted_day=$crystal_row['posted_day'];
	$insert = "insert into twitter_csv (verizon_id, actor_id, actor_display_name, posted_time, geo_coords_lat, geo_coords_lon, location_name, posted_day) 
	values ('$verizon_id', '$the_actor_id', '$the_actor_display_name', '$the_posted_time', '$the_geo_coords_lat', '$the_geo_coords_lon', '$the_location_name', '$the_posted_day')";
		$insertexe = mysqli_query($cxn, $insert);
		if(!$insertexe) {
		$error = mysqli_errno($cxn).': '.mysqli_error($cxn);
		die($error);
		}
	}

Open in new window


PS: Don't be distracted by the LIMIT 10000. I did that thinking that by breaking things down into bite sized chunks, I was streamlining the process. Maybe, maybe not. But the problem is in the amount of time the initial query is taking. Once I saw the EXPLAIN, I was certain that I'm missing something.
0
Comment
Question by:brucegust
  • 3
  • 2
  • 2
  • +4
12 Comments
 
LVL 22

Expert Comment

by:plusone3055
Comment Utility
I would make sure that you have properly indexed those tables for optimal performance in your database. Indexing the tables properly will severely reduce the time :)
0
 
LVL 82

Accepted Solution

by:
Dave Baldwin earned 167 total points
Comment Utility
The key to the slowness is the use of BETWEEN in 3 different WHERE clauses.  I believe that MySQL has to create 3 different sorted indexes and cross reference them to find the ones where there is a match for all three clauses.  I don't think that creating indexes will help much either because the BETWEEN clauses force MySQL to go thru the whole table each time.  Try a limit of 10 and I think you will see very little change in the amount of time that it takes.
0
 
LVL 22

Expert Comment

by:plusone3055
Comment Utility
David -
I only saw 2 BETWEENS the first time i looked :(  
good eye

*Bows*
0
 

Author Comment

by:brucegust
Comment Utility
Dave, I've been reading while I've been waiting for some feedback and your counsel resonates with what I've discovered thus far.

I know what you're saying is correct only because as I've played with the database directly, I can see how things absolutely fly when I'm doing a specific equality as opposed to a "between."

Can you think of a creative way in which I can break things up so I can serve my user (who's going to be using a range of geo_coords as well as dates) so I can get them their answer without having to clog the pipes?
0
 
LVL 82

Expert Comment

by:Dave Baldwin
Comment Utility
Nope.  You have created a pipe-clogging scenario.  What you are currently trying to do will never be quick.  Too much data combined with a slow method.
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
Looks like it should be a bit faster in MySQL 5.6 or newer: https://dev.mysql.com/doc/refman/5.6/en/index-condition-pushdown-optimization.html

WIth the new "Index Condition Pushdown Optimization" it should limit the full table scans.

HTH,
Dan
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
You might want to do this in two queries.  Whether this is a good idea or not may depend on the number of rows you expect to get in the results set.  A sensible design might go something like this (pidgin code):

CREATE TEMPORARY TABLE x
( SELECT * FROM verizon WHERE WHERE posted_day BETWEEN '$start_date' and '$end_date' )
ENGINE=MEMORY

Now you would have a smaller table.  Not sure how much smaller, but...

See the proximity calculator in this article for a way to down-select into a temporary table.
http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/A_4276-What-is-near-me-Proximity-calculations-using-PHP-and-MySQL.html
0
 

Author Comment

by:brucegust
Comment Utility
Ray, I was thinking about doing that and I was playing with the idea in phpMyAdmin and I got an error that indicated my innodb_buffer_pool_size was too small.

Does creating a temp table via php eliminate that problem?
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
I don't know; it's a data-dependent problem and only you have the data.  It's something to test, but like I wrote in earlier questions, you're working with data at a very large scale.  You might want to break this one giant table up into tables by day, so you would be working with 365 tables, each of a more manageable size.  I'm guessing you haven't tried that yet?

You might also want to consider making up a test data set and posting it for us here.  I would recommend selecting every 1,000th row out of your big table.  That would create a test data set that contained about 250,000 rows, presumably with a more-or-less representative and well-distributed subset of the big collection.  Once you have the data uploaded, you can make reference to the uploaded file in this and future questions.  You can use the "Attach File" link below the comment box.

If I have that small test data set, I can show you tested examples of the logic for things like a down-select into a memory table or a design that uses tables per day or per month, etc.
0
 
LVL 58

Expert Comment

by:Gary
Comment Utility
Are the lat and lng indexed? That could certainly slow you down. Which is why you should be using a geo spatial index.
0
 
LVL 9

Assisted Solution

by:Brian Tao
Brian Tao earned 167 total points
Comment Utility
I think the bottleneck would be in the while loop with the insert statement.  You were trying to insert 223953 rows one by one, meaning that the DB server has to process your insert as many times.
Have you tried commenting out the insert part and see how long it takes?
0
 
LVL 108

Assisted Solution

by:Ray Paseur
Ray Paseur earned 166 total points
Comment Utility
Agree with taoyipai: There are just too many moving parts to this application, compounded by millions of lines of data that seems to get copied over and over.  Check this idea and see if it can help you consolidate some things:
http://dev.mysql.com/doc/refman/5.0/en/insert-select.html

The query string would look something like this (untested, awaiting test data) -- not sure about display_name column.
$insert 
= 
"
INSERT INTO twitter_csv 
( verizon_id
, actor_id
, actor_display_name
, posted_time
, display_name
, geo_coords_lat
, geo_coords_lon
, location_name
, posted_day
) 
SELECT
  id
, actor_id
, actor_display_name
, posted_time
, display_name
, geo_coords_lat
, geo_coords_lon
, location_name
, posted_day 
FROM verizon 
WHERE (posted_day     BETWEEN '$start_date'  AND '$end_date') 
AND   (geo_coords_lat BETWEEN '$latitude_1'  AND '$latitude_2') 
AND   (geo_coords_lon BETWEEN '$longitude_1' AND '$longitude_2') 
ORDER BY id ASC 
LIMIT 10000
"
;

Open in new window


I'd also hope you come to understand the danger in this line of code.  Don't ever write the extract() function again or for that matter, compact().  These functions blur the line between code and data in ways that can cause your scripts to fail without warning when a variable name collision occurs, thus they constitute a code smell.  You do not want that on your resume!  The function is uncalled for in this context and it causes a proliferation of variables in your symbol table.  More variables means more potential failure points, so it's best to just leave it out.
// extract($crystal_row); OMIT THIS

Open in new window

0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Suggested Solutions

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now