Solved

Slow SQL query

Posted on 2014-02-25
18
290 Views
Last Modified: 2014-02-26
Hi,

I have the following query that's being generated by a PHP script based on a search form, the problem is however that the query is taking a long time (10+ seconds) to execute.

Can anyone give me some pointers into how I can make this query quicker?

SELECT event.id,event.category_id, date.id AS date_id , 6371.04 * acos( cos( pi( ) /2 - radians( 90 - venue.lat) ) * cos( pi( ) /2 - radians( 90 - '52.486243' ) ) * cos( radians( venue.lon) - radians( '-1.890401' ) ) + sin( pi( ) /2 - radians( 90 - venue.lat) ) * sin( pi( ) /2 - radians( 90 - '52.486243' ) ) ) AS distance FROM events AS event, venues AS venue,event_dates AS date WHERE event.venue_id = venue.id AND ((event.cost_from >= '10.00' OR event.cost_from <= '75.00') OR (event.cost_to >= '10.00' OR event.cost_to <= '75.00')) GROUP BY event.id HAVING distance < '25' ORDER BY distance ASC LIMIT 0,20

Open in new window

0
Comment
Question by:SheppardDigital
  • 6
  • 5
  • 5
  • +1
18 Comments
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 39886954
It looks like some kind of Haversine distance calculation.  Please give us a little more to go on.  How many rows in the data set?  What columns are indexed?  Etc.

SELECT 
  event.id
, event.category_id
, date.id AS date_id 
, 6371.04 * acos( cos( pi( ) /2 - radians( 90 - venue.lat) ) * cos( pi( ) /2 - radians( 90 - '52.486243' ) ) * cos( radians( venue.lon) - radians( '-1.890401' ) ) + sin( pi( ) /2 - radians( 90 - venue.lat) ) * sin( pi( ) /2 - radians( 90 - '52.486243' ) ) ) AS distance 
FROM events AS event
, venues AS venue
, event_dates AS date 
WHERE event.venue_id = venue.id 
AND ((event.cost_from >= '10.00' OR event.cost_from <= '75.00') OR (event.cost_to >= '10.00' OR event.cost_to <= '75.00')) 
GROUP BY event.id HAVING distance < '25' 
ORDER BY distance ASC 
LIMIT 0,20

Open in new window

0
 

Author Comment

by:SheppardDigital
ID: 39886987
Hi Ray,

The database is a list of events, and events are allocated to venues.

Each venue is given a latitude and longitude, and we are using this to perform a query on the database to return venues/events within a radius of a given location.

The only columns that are indexed are the primary key of each table.

There's currently;
Venues = 138 rows
Events = 2,293 rows

Is that any help?

Is it worthwhile applying indexes to fields such as latitude/longitude?
0
 
LVL 109

Assisted Solution

by:Ray Paseur
Ray Paseur earned 166 total points
ID: 39887027
The short answer is "yes," indexes on any column used in a WHERE, GROUP, ORDER, HAVING, JOIN, etc., are a good thing.

Have a look at this article, then come back to the question.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_4276-What-is-near-me-Proximity-calculations-using-PHP-and-MySQL.html
0
Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

 
LVL 58

Expert Comment

by:Gary
ID: 39887046
You would probably be better using spatial indexing, I use this on a table with over 100,000 rows and it takes millisecs to do an area search
Just trying to find a good tutorial on it..
0
 

Author Comment

by:SheppardDigital
ID: 39887052
Thanks Cathal.

Ray, I'll take a look over your link in the morning.
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 39887094
The strategy used in the proximity calculations article is a "down-select" that extracts the nearby points from the large collection of locations.  With this smaller temporary table, we can run a very fast set of queries and distance calculations.

You might try it with something like this...  If your geocode is 52.486243,-1.890401 your starting point is more-or-less in the middle of the UK.  You might try your down-select coordinates like this:
51.486243,-0.890401
53.486243,-2.890401

Expect to have to tinker with them a little bit to get to the optimum offsets from the initial Lat,Lon pair in the geocode.  In my California demo script, I found 0.3 to be the right number.  If you're getting too many proximal results with an offset of 1.0, try reducing it a little bit.  You will not know the distance in the initial query to load the temporary table, just the fact that the location is within the boundary box, and with speed being an objective, decreasing the size of the boundary box is a worthwhile approach.

Also, though not directly related to the issues here, the article is a bit old and uses the obsolete MySQL extension.  I have another article here showing how to get off MySQL.  I've found that it's easiest to make the conversion if you use MySQLi in the object-oriented format.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/PHP_Databases/A_11177-PHP-MySQL-Deprecated-as-of-PHP-5-5-0.html
0
 
LVL 58

Accepted Solution

by:
Gary earned 167 total points
ID: 39887104
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 39887177
You can test the script from the Proximity Calculations article here:
http://www.laprbass.com/RAY_EE_proximity_calculator.php?z=94111

It's probably worth noting that if your calculations are looking for points less than about 100 miles apart, and you're not very close to the poles, the Haversine formula will not give meaningfully different values than you would get from plane geometry.  I've never tested it but I expect that plane geometry would be faster.
0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 167 total points
ID: 39887346
How is the table event_dates related?

Seems to me you are creating a cartesian product as I cannot see how you have joined that table by any other means.
SELECT
      event.id
    , event.category_id
    , DATE.id                                                                          AS date_id
    , 6371.04 * ACOS(COS(PI() / 2 - RADIANS(90 - venue.lat)) * COS(PI() / 2 - RADIANS(90 -
      '52.486243')) * COS(RADIANS(venue.lon) - RADIANS('-1.890401')) + SIN(
      PI() / 2 - RADIANS(90 - venue.lat)) * SIN(PI() / 2 - RADIANS(90 - '52.486243'))) AS
      distance
FROM events AS event
   , venues AS venue
   , event_dates AS DATE /* how is this joined ??????? */
WHERE event.venue_id = venue.id
  AND (
         (event.cost_from >= '10.00' OR event.cost_from <= '75.00')
      OR (event.cost_to >= '10.00'OR event.cost_to <= '75.00')
      )
GROUP BY
      event.id
HAVING distance < '25'
ORDER BY
      distance

Open in new window


I do wish folk would use ANSI join syntax...
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 39887428
@PortletPaul:  Smart observation and excellent point.  But I think if there is a way to get to work on a smaller data set, it would help.  In the existing query, thousands of retrievals and calculations have to be done to get "distance" so you can process the HAVING and ORDER clauses.

The "event.cost_from" and "event.cost_to" could be collected with BETWEEN; I think that would reduce the number of SQL calculations in that part of WHERE.

Maybe EXPLAIN SELECT would be useful, too.  My sense is that this should be a sub-10-millisecond query, certainly not something that takes a second or more.
0
 
LVL 58

Expert Comment

by:Gary
ID: 39887454
And that's why a spatial indexed column is a good way to do this
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 39887479
@GaryC123:  One of our favorite Cathal's in and around Washington, DC.
http://myirishtable.com/cathal-armstrong/
0
 

Author Comment

by:SheppardDigital
ID: 39888164
Just to try and explain how the table work.

Venues - This table contains information about each venue, it's name, latitude/longitude
Events - This table contains a list of events, each event is linked to a venue.
Event Dates - This table contains the dates of each event, as events can have multiple dates. These are linked to the event.

In some cases the user will select a date range in the search on the website, and also specify a location and a price range. So in this instance we have to link all three tables together to get the desired results.

I'm going to look through some of the comments above now and see how I get on.
0
 

Author Comment

by:SheppardDigital
ID: 39888196
Hi Cathal,

I've applied a spartial index to a new latlng field, that seems to have increase the query speed from 13 seconds to 6 seconds.

I need to order the results by distance, is there a way using this method to return the distance?

Portlet Paul,

Thanks for pointing out that the event_dates table wasn't linked. When I remove the reference to that table the search results come back almost instantly, a huge improvement. I'll re-work that part of the query to get dates working and linked to other tables correctly.
0
 
LVL 58

Expert Comment

by:Gary
ID: 39888824
6 seconds is very slow.
Have you indexed the spatial columns?
0
 

Author Comment

by:SheppardDigital
ID: 39888983
Hi Cathal,

Yes, the spatial columns are indexed.

The reason the query was slow was because I was referencing the event_dates table without having any way to link dates to events, so it was pulling in all 2300 rows from that table for event event.
0
 
LVL 58

Expert Comment

by:Gary
ID: 39889308
What version of MySQL are you using.  MySQL doesn't really have any good spatial support.
v5.6 is better but not perfect. The method I pointed you to above uses a box and finds anything within the box but there is no distance.
Some reading how it works
http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-MySQL
http://www.mysqlperformanceblog.com/2013/10/21/using-the-new-spatial-functions-in-mysql-5-6-for-geo-enabled-applications/
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39890457
Although I've already mentioned it I'd just like to again point out that adopting ANSI join syntax will ensure you avoid the Cartesian trap, e.g.
FROM events AS event
INNER JOIN venues AS venue ON event.venue_id = venue.id
INNER JOIN event_dates AS date ON  

Open in new window

it's impossible to leave line 3 like that (the query would just fail) and hence you are forced to provide the relationship(s).

Thanks & cheers, Paul
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

831 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question