Solved

Why does this query take so long?

Posted on 2014-10-14
12
376 Views
Last Modified: 2014-10-16
Here's my query:

EXPLAIN SELECT id, actor_id, actor_display_name, posted_time, display_name, geo_coords_lat, geo_coords_lon, location_name, posted_day FROM verizon WHERE (posted_day BETWEEN '2014-03-05' and '2014-03-18') and (geo_coords_lat BETWEEN '26' and '26.25') and (geo_coords_lon BETWEEN '-80.25' and '-80') order by id ASC LIMIT 10000

When I run the "explain" function, I get this:

i      select     table         type        possible_keys                                                            key                  k_len           ref           rows      
1    SIMPLE   Verizon    index       posted_day, geo_coords_lat, geo_coords_lon     PRIMARY        4                  NULL     223953

The last column was "Extra" which read "Using where."

At first glance, I'm stoked because it appears as though my indexes are doing exactly as they're supposed to do in that they're taking the 250,000,000 rows and reducing it to a very manageable collection of rows.

But the process, which I have below, is taking anywhere from 20-25 minutes, which makes no sense in that 223953 rows should sing.

What am I doing that's clogging the pipes. Theoretically, everything looks great. Practically, we're needing some major improvement.

Thoughts?

$crystal="SELECT id, actor_id, actor_display_name, posted_time, display_name, geo_coords_lat, geo_coords_lon, location_name, posted_day FROM verizon WHERE (posted_day BETWEEN '$start_date' and '$end_date') and (geo_coords_lat BETWEEN '$latitude_1' and '$latitude_2') and (geo_coords_lon BETWEEN '$longitude_1' and '$longitude_2') order by id ASC LIMIT 10000";
$crystal_query=mysqli_query($cxn, $crystal)
or die("Crystal didn't happen.");
	while($crystal_row=mysqli_fetch_assoc($crystal_query))
	{
	extract($crystal_row);
	$verizon_id=mysqli_real_escape_string($cxn, $crystal_row['id']);
	$the_actor_id= mysqli_real_escape_string($cxn,$crystal_row['actor_id']);
	$the_actor_display_name= mysqli_real_escape_string($cxn,$crystal_row['actor_display_name']);
	$the_posted_time= mysqli_real_escape_string($cxn,$crystal_row['posted_time']);
	$the_geo_coords_lat= mysqli_real_escape_string($cxn,$crystal_row['geo_coords_lat']);
	$the_geo_coords_lon= mysqli_real_escape_string($cxn,$crystal_row['geo_coords_lon']);
	$the_location_name= mysqli_real_escape_string($cxn,$crystal_row['location_name']);
	$the_posted_day=$crystal_row['posted_day'];
	$insert = "insert into twitter_csv (verizon_id, actor_id, actor_display_name, posted_time, geo_coords_lat, geo_coords_lon, location_name, posted_day) 
	values ('$verizon_id', '$the_actor_id', '$the_actor_display_name', '$the_posted_time', '$the_geo_coords_lat', '$the_geo_coords_lon', '$the_location_name', '$the_posted_day')";
		$insertexe = mysqli_query($cxn, $insert);
		if(!$insertexe) {
		$error = mysqli_errno($cxn).': '.mysqli_error($cxn);
		die($error);
		}
	}

Open in new window


PS: Don't be distracted by the LIMIT 10000. I did that thinking that by breaking things down into bite sized chunks, I was streamlining the process. Maybe, maybe not. But the problem is in the amount of time the initial query is taking. Once I saw the EXPLAIN, I was certain that I'm missing something.
0
Comment
Question by:brucegust
  • 3
  • 2
  • 2
  • +4
12 Comments
 
LVL 22

Expert Comment

by:plusone3055
ID: 40380671
I would make sure that you have properly indexed those tables for optimal performance in your database. Indexing the tables properly will severely reduce the time :)
0
 
LVL 83

Accepted Solution

by:
Dave Baldwin earned 167 total points
ID: 40380674
The key to the slowness is the use of BETWEEN in 3 different WHERE clauses.  I believe that MySQL has to create 3 different sorted indexes and cross reference them to find the ones where there is a match for all three clauses.  I don't think that creating indexes will help much either because the BETWEEN clauses force MySQL to go thru the whole table each time.  Try a limit of 10 and I think you will see very little change in the amount of time that it takes.
0
 
LVL 22

Expert Comment

by:plusone3055
ID: 40380678
David -
I only saw 2 BETWEENS the first time i looked :(  
good eye

*Bows*
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 

Author Comment

by:brucegust
ID: 40380686
Dave, I've been reading while I've been waiting for some feedback and your counsel resonates with what I've discovered thus far.

I know what you're saying is correct only because as I've played with the database directly, I can see how things absolutely fly when I'm doing a specific equality as opposed to a "between."

Can you think of a creative way in which I can break things up so I can serve my user (who's going to be using a range of geo_coords as well as dates) so I can get them their answer without having to clog the pipes?
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 40380697
Nope.  You have created a pipe-clogging scenario.  What you are currently trying to do will never be quick.  Too much data combined with a slow method.
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 40380710
Looks like it should be a bit faster in MySQL 5.6 or newer: https://dev.mysql.com/doc/refman/5.6/en/index-condition-pushdown-optimization.html

WIth the new "Index Condition Pushdown Optimization" it should limit the full table scans.

HTH,
Dan
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 40380714
You might want to do this in two queries.  Whether this is a good idea or not may depend on the number of rows you expect to get in the results set.  A sensible design might go something like this (pidgin code):

CREATE TEMPORARY TABLE x
( SELECT * FROM verizon WHERE WHERE posted_day BETWEEN '$start_date' and '$end_date' )
ENGINE=MEMORY

Now you would have a smaller table.  Not sure how much smaller, but...

See the proximity calculator in this article for a way to down-select into a temporary table.
http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/A_4276-What-is-near-me-Proximity-calculations-using-PHP-and-MySQL.html
0
 

Author Comment

by:brucegust
ID: 40380774
Ray, I was thinking about doing that and I was playing with the idea in phpMyAdmin and I got an error that indicated my innodb_buffer_pool_size was too small.

Does creating a temp table via php eliminate that problem?
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 40380877
I don't know; it's a data-dependent problem and only you have the data.  It's something to test, but like I wrote in earlier questions, you're working with data at a very large scale.  You might want to break this one giant table up into tables by day, so you would be working with 365 tables, each of a more manageable size.  I'm guessing you haven't tried that yet?

You might also want to consider making up a test data set and posting it for us here.  I would recommend selecting every 1,000th row out of your big table.  That would create a test data set that contained about 250,000 rows, presumably with a more-or-less representative and well-distributed subset of the big collection.  Once you have the data uploaded, you can make reference to the uploaded file in this and future questions.  You can use the "Attach File" link below the comment box.

If I have that small test data set, I can show you tested examples of the logic for things like a down-select into a memory table or a design that uses tables per day or per month, etc.
0
 
LVL 58

Expert Comment

by:Gary
ID: 40380956
Are the lat and lng indexed? That could certainly slow you down. Which is why you should be using a geo spatial index.
0
 
LVL 9

Assisted Solution

by:Brian Tao
Brian Tao earned 167 total points
ID: 40381499
I think the bottleneck would be in the while loop with the insert statement.  You were trying to insert 223953 rows one by one, meaning that the DB server has to process your insert as many times.
Have you tried commenting out the insert part and see how long it takes?
0
 
LVL 109

Assisted Solution

by:Ray Paseur
Ray Paseur earned 166 total points
ID: 40381830
Agree with taoyipai: There are just too many moving parts to this application, compounded by millions of lines of data that seems to get copied over and over.  Check this idea and see if it can help you consolidate some things:
http://dev.mysql.com/doc/refman/5.0/en/insert-select.html

The query string would look something like this (untested, awaiting test data) -- not sure about display_name column.
$insert 
= 
"
INSERT INTO twitter_csv 
( verizon_id
, actor_id
, actor_display_name
, posted_time
, display_name
, geo_coords_lat
, geo_coords_lon
, location_name
, posted_day
) 
SELECT
  id
, actor_id
, actor_display_name
, posted_time
, display_name
, geo_coords_lat
, geo_coords_lon
, location_name
, posted_day 
FROM verizon 
WHERE (posted_day     BETWEEN '$start_date'  AND '$end_date') 
AND   (geo_coords_lat BETWEEN '$latitude_1'  AND '$latitude_2') 
AND   (geo_coords_lon BETWEEN '$longitude_1' AND '$longitude_2') 
ORDER BY id ASC 
LIMIT 10000
"
;

Open in new window


I'd also hope you come to understand the danger in this line of code.  Don't ever write the extract() function again or for that matter, compact().  These functions blur the line between code and data in ways that can cause your scripts to fail without warning when a variable name collision occurs, thus they constitute a code smell.  You do not want that on your resume!  The function is uncalled for in this context and it causes a proliferation of variables in your symbol table.  More variables means more potential failure points, so it's best to just leave it out.
// extract($crystal_row); OMIT THIS

Open in new window

0

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
mysql disables rename 4 68
Select Query - Group By Function Producing Unexpected Results 8 19
PHP and google maps 13 44
mysqli insert query problems 4 22
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to dynamically set the form action using jQuery.

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question