Calculating 'nearest' 10 locations to a postcode


I am putting together a website that needs to be able to show the nearest X locations to a given postcode.

In order to do so, I have imported a very large table of 1.7 million records, which is a complete list of all postcodes in the UK. My intention was to use MySQL to calculate the nearest locations, through using the Haversine formula in conjunction with the latitude and longitude, taken from the postcode table.

However, I have only just started test this and have come to realise the massive resource implication, given that in order to find the 20 nearest, I would need to individually calculate the distance from X of all 1.7 million records, then sort by distance...

Here is the MySQL that i was originally playing around with:
SELECT type_place.postcode, type_place.place_title, postcodelatlng.latitude, postcodelatlng.longitude,
       111.045*haversine(latitude,longitude,latpoint, longpoint) AS distance_in_km
 FROM postcodelatlng
 INNER JOIN type_place
 ON type_place.postcode=postcodelatlng.postcode
     SELECT  54.97  AS latpoint,  -1.607 AS longpoint
   ) AS  p
   WHERE postcodelatlng.latitude < 56 AND postcodelatlng.latitude > 54 AND postcodelatlng.longitude<-1.5 AND postcodelatlng.longitude>-1.7
 ORDER BY distance_in_km

Open in new window

As youi can see, I added a WHERE clause to limit the search to +/- 1 point on lat/long. However, when i ran the test it took 30 secs to complete..

Could someone suggest a better approach to the problem?

LVL 12
Dean OBrienAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Dan CraciunIT ConsultantCommented:
You could use a denormalized table.

Just calculate the nearest 10 locations and store them in a new table. It probably will take days, but it can be done.

That way, when you need them, you just need a SELECT statement, no calculations whatsoever.

You could also use memcached or similar and cache the query results. This way, only the first query takes 30 seconds, the next will be served out of RAM.

Dean OBrienAuthor Commented:
Hi Dan,

Thanks for your comment.

I think i perhaps wasnt clear enough. When the system is in use, I would like to be able to calculate the closest 10 locations to any defined location. Therefore given the location could change, it would need calculating at runtime.

Dan CraciunIT ConsultantCommented:
From what I understood, you have 1.7 million possible locations.
Create a table with 1.7 million records, each line containing the location and it's 10 neighbors.
Need More Insight Into What’s Killing Your Network

Flow data analysis from SolarWinds NetFlow Traffic Analyzer (NTA), along with Network Performance Monitor (NPM), can give you deeper visibility into your network’s traffic.

Dave BaldwinFixer of ProblemsCommented:
Google maps has a Store Locator app that might be of use.
So you're saying even with the +1/-1 lat lng restriction, it still took >30 seconds to run?
Done similar things before, with pretty good results.

Are all your items in UK or is this a global program?  If just UK, realize that +1/-1 full point still results in a pretty large area.
Regardless, this should still do an initial limit on the query and then do the haversine on a much smaller result list.

Another add on (though not perfectly precise), is to take your initial query WITHOUT the haversine (you can still use the where limitation), and get a "rough" sorting based on abs difference between lat and lng.

ie: order by (lat1 - lat2) + (lng1 - lng2)

And take a larger result set, like top 50.  Then use that as a subquery, to do the actual haversine on 50 records to return the "real" closest 10.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Ray PaseurCommented:
Here is how I've solved the problem.  Code example are included.  The general theory is a "down-select" to choose only the nearby locations.  These are put into a temporary table, and the distances are calculated for only the nearby location.  Then a SELECT from the temporary table can ORDER BY distance.
Dean OBrienAuthor Commented:
Thanks to everyone for their comments, apologies for the massive delay responding. I have been meaning too, but wanted to run some test before commenting, and never found the time...

I suspect one of the reasons for the poor performance is that the Db is stored on a shared host, so im competing for processing power. I have ran some tests and it looks like the main reason for the delay is simply querying the large postcode table (1.7 million records).

I performed the haversine calc on table of 50,100,150 and 200 records only and it was pretty much instant.
SELECT post4.latitude, post4.longitude, 111.045*haversine(latitude,longitude,latpoint, longpoint) AS distance_in_km FROM post4 INNER JOIN ( SELECT 57.144165160000000 AS latpoint, -2.114847768000000 longpoint ) AS p ORDER BY distance_in_km

Open in new window

Yet, simply joining my place table to the postcode table to get the lat/lng info took 6 -7 secs...
SELECT * FROM type_place AS p JOIN postcodelatlng AS pc ON p.postcode=pc.postcode

Open in new window

That increased to 14 secs when i added a WHERE clause
SELECT * FROM type_place AS p JOIN postcodelatlng AS pc ON p.postcode=pc.postcode WHERE pc.latitude>54 AND pc.latitude<56 AND pc.longitude<-1.5 AND pc.longitude>-1.7

Open in new window

Finally joining all records and performing the haversine calc took 180secs.....
SELECT 111.045*haversine(latitude,longitude,latpoint, longpoint) AS distance_in_km FROM type_place AS p JOIN postcodelatlng AS pc ON p.postcode=pc.postcode JOIN ( SELECT 57.144165160000000 AS latpoint, -2.114847768000000 longpoint ) AS c ORDER BY distance_in_km

Open in new window

I realised that i dont actually need to access the large table, as i only need to know what places are near other places in my smaller table of 1000 rows. Therefore i plan to just add the lat/lng information to the place table and perform the select and calc on that instead.

Thanks again
Ray PaseurCommented:
Just curious - did you read the article and try the pattern with CREATE TEMPORARY TABLE?  I'd be interested to know what the speed was like.  My guess is that if the LAT and LON are indexed columns, you would see nearly instantaneous results.
Dean OBrienAuthor Commented:
Hi Ray,

Yes i had read you article, but to be honest i kind of discounted the temporary table to begin with, because i thought if the initial JOIN was taking 6+ seconds then it would surely have similar results... lazy i know.

I have been trying this morning to see if that would have indeed been the case, but the script im using doesnt seem to like creating the temporary table with the JOIN.

Im using the PHP below:
ini_set("display_errors",1<wbr ></wbr>);

$alpha_time = microtime(TRUE);

/* Open a connection */
$mysqli = new mysqli("localhost", "xxx", "xxx", "xxx");

if (mysqli_connect_errno()) {
    printf("Connect failed: %s\n", mysqli_connect_error());

  temp_table ( INDEX(place_id) ) 
AS (
SELECT * FROM type_place 

if ($result = $mysqli->query("SELECT * FROM temp_table")) {
    printf("Select returned %d rows.\n", $result->num_rows);
    /* free result set */


	$omega_time = microtime(TRUE);
    $lapse_time = $omega_time - $alpha_time;
    $lapse_msec = $lapse_time * 1000.0;
    $lapse_echo = number_format($lapse_msec,<wbr ></wbr> 1);
    echo "<br/>SCRIPT TIME: $lapse_echo MILLISECONDS <br/>";

Open in new window

The script works fine when simply selecting all from the type_place table(returns 750 rows). However, when i swap that for the following sql:
SELECT p.place_title, pc.latitude, pc.longitude FROM type_place AS p JOIN postcodelatlng AS pc ON p.postcode=pc.postcode WHERE pc.latitude>55 AND pc.latitude<58 AND pc.longitude<-1.5 AND pc.longitude>-2.2

Open in new window

It doesnt return a result, even though the same sql returns 26 records when i run it directly in phpmyadmin.

Finally, not sure if it works in the same way as when called from script, but when i run the create temporary table script in phpmyadmin (making the join to the large table) it also takes 18secs:
CREATE TEMPORARY TABLE nearby ( distance DECIMAL(6,1) ) ENGINE=MEMORY SELECT p.place_title, pc.latitude, pc.longitude FROM type_place AS p JOIN postcodelatlng AS pc ON p.postcode=pc.postcode WHERE pc.latitude>55 AND pc.latitude<58 AND pc.longitude<-1.5 AND pc.longitude>-2.2

Open in new window

Ray PaseurCommented:
I think the general idea of the temporary table is to winnow down the number of possible results to some number that is small enough to run fast.  Since I don't have your data, I'm not going to be able to tell you why this is taking so long, but the output of EXPLAIN SELECT might be helpful.  Also, you want to have indexes and matching data types on any columns that are used in WHERE, JOIN, GROUP, ORDER, etc.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
MySQL Server

From novice to tech pro — start learning today.