Avatar of Dean OBrien
Dean OBrien
Flag for United Kingdom of Great Britain and Northern Ireland asked on

Calculating 'nearest' 10 locations to a postcode

Experts,

I am putting together a website that needs to be able to show the nearest X locations to a given postcode.

In order to do so, I have imported a very large table of 1.7 million records, which is a complete list of all postcodes in the UK. My intention was to use MySQL to calculate the nearest locations, through using the Haversine formula in conjunction with the latitude and longitude, taken from the postcode table.

However, I have only just started test this and have come to realise the massive resource implication, given that in order to find the 20 nearest, I would need to individually calculate the distance from X of all 1.7 million records, then sort by distance...

Here is the MySQL that i was originally playing around with:
SELECT type_place.postcode, type_place.place_title, postcodelatlng.latitude, postcodelatlng.longitude,
       111.045*haversine(latitude,longitude,latpoint, longpoint) AS distance_in_km
 FROM postcodelatlng
 INNER JOIN type_place
 ON type_place.postcode=postcodelatlng.postcode
 INNER JOIN (
     SELECT  54.97  AS latpoint,  -1.607 AS longpoint
   ) AS  p
   WHERE postcodelatlng.latitude < 56 AND postcodelatlng.latitude > 54 AND postcodelatlng.longitude<-1.5 AND postcodelatlng.longitude>-1.7
 ORDER BY distance_in_km
 LIMIT 15

Open in new window


As youi can see, I added a WHERE clause to limit the search to +/- 1 point on lat/long. However, when i ran the test it took 30 secs to complete..

Could someone suggest a better approach to the problem?

Regards
Dean
MySQL ServerPHPWeb DevelopmentGoogle WorkspaceDatabases

Avatar of undefined
Last Comment
Ray Paseur

8/22/2022 - Mon
SOLUTION
Dan Craciun

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Dean OBrien

ASKER
Hi Dan,

Thanks for your comment.

I think i perhaps wasnt clear enough. When the system is in use, I would like to be able to calculate the closest 10 locations to any defined location. Therefore given the location could change, it would need calculating at runtime.

Regards
Dean
Dan Craciun

From what I understood, you have 1.7 million possible locations.
Create a table with 1.7 million records, each line containing the location and it's 10 neighbors.
SOLUTION
Dave Baldwin

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
ASKER CERTIFIED SOLUTION
Snarf0001

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
SOLUTION
Ray Paseur

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Dean OBrien

ASKER
Thanks to everyone for their comments, apologies for the massive delay responding. I have been meaning too, but wanted to run some test before commenting, and never found the time...

I suspect one of the reasons for the poor performance is that the Db is stored on a shared host, so im competing for processing power. I have ran some tests and it looks like the main reason for the delay is simply querying the large postcode table (1.7 million records).

I performed the haversine calc on table of 50,100,150 and 200 records only and it was pretty much instant.
SELECT post4.latitude, post4.longitude, 111.045*haversine(latitude,longitude,latpoint, longpoint) AS distance_in_km FROM post4 INNER JOIN ( SELECT 57.144165160000000 AS latpoint, -2.114847768000000 longpoint ) AS p ORDER BY distance_in_km

Open in new window


Yet, simply joining my place table to the postcode table to get the lat/lng info took 6 -7 secs...
 
SELECT * FROM type_place AS p JOIN postcodelatlng AS pc ON p.postcode=pc.postcode

Open in new window


That increased to 14 secs when i added a WHERE clause
SELECT * FROM type_place AS p JOIN postcodelatlng AS pc ON p.postcode=pc.postcode WHERE pc.latitude>54 AND pc.latitude<56 AND pc.longitude<-1.5 AND pc.longitude>-1.7

Open in new window


Finally joining all records and performing the haversine calc took 180secs.....
SELECT 111.045*haversine(latitude,longitude,latpoint, longpoint) AS distance_in_km FROM type_place AS p JOIN postcodelatlng AS pc ON p.postcode=pc.postcode JOIN ( SELECT 57.144165160000000 AS latpoint, -2.114847768000000 longpoint ) AS c ORDER BY distance_in_km

Open in new window


I realised that i dont actually need to access the large table, as i only need to know what places are near other places in my smaller table of 1000 rows. Therefore i plan to just add the lat/lng information to the place table and perform the select and calc on that instead.

Thanks again
Dean
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
Ray Paseur

Just curious - did you read the article and try the pattern with CREATE TEMPORARY TABLE?  I'd be interested to know what the speed was like.  My guess is that if the LAT and LON are indexed columns, you would see nearly instantaneous results.
Dean OBrien

ASKER
Hi Ray,

Yes i had read you article, but to be honest i kind of discounted the temporary table to begin with, because i thought if the initial JOIN was taking 6+ seconds then it would surely have similar results... lazy i know.

I have been trying this morning to see if that would have indeed been the case, but the script im using doesnt seem to like creating the temporary table with the JOIN.

Im using the PHP below:
<?php
ini_set("display_errors",1<wbr ></wbr>);
error_reporting(E_ALL);

// A SCRIPT TIMER
$alpha_time = microtime(TRUE);

/* Open a connection */
$mysqli = new mysqli("localhost", "xxx", "xxx", "xxx");

if (mysqli_connect_errno()) {
    printf("Connect failed: %s\n", mysqli_connect_error());
    exit();
}

$mysqli->query("
CREATE TEMPORARY TABLE IF NOT EXISTS 
  temp_table ( INDEX(place_id) ) 
ENGINE=MEMORY
AS (
SELECT * FROM type_place 
)
");

if ($result = $mysqli->query("SELECT * FROM temp_table")) {
    printf("Select returned %d rows.\n", $result->num_rows);
    /* free result set */
    $result->close();
}

$mysqli->close();

    // A SCRIPT TIMER
	$omega_time = microtime(TRUE);
    $lapse_time = $omega_time - $alpha_time;
    $lapse_msec = $lapse_time * 1000.0;
    $lapse_echo = number_format($lapse_msec,<wbr ></wbr> 1);
    echo "<br/>SCRIPT TIME: $lapse_echo MILLISECONDS <br/>";
?>

Open in new window


The script works fine when simply selecting all from the type_place table(returns 750 rows). However, when i swap that for the following sql:
SELECT p.place_title, pc.latitude, pc.longitude FROM type_place AS p JOIN postcodelatlng AS pc ON p.postcode=pc.postcode WHERE pc.latitude>55 AND pc.latitude<58 AND pc.longitude<-1.5 AND pc.longitude>-2.2

Open in new window


It doesnt return a result, even though the same sql returns 26 records when i run it directly in phpmyadmin.

Finally, not sure if it works in the same way as when called from script, but when i run the create temporary table script in phpmyadmin (making the join to the large table) it also takes 18secs:
CREATE TEMPORARY TABLE nearby ( distance DECIMAL(6,1) ) ENGINE=MEMORY SELECT p.place_title, pc.latitude, pc.longitude FROM type_place AS p JOIN postcodelatlng AS pc ON p.postcode=pc.postcode WHERE pc.latitude>55 AND pc.latitude<58 AND pc.longitude<-1.5 AND pc.longitude>-2.2

Open in new window


Regards
Dean
Ray Paseur

I think the general idea of the temporary table is to winnow down the number of possible results to some number that is small enough to run fast.  Since I don't have your data, I'm not going to be able to tell you why this is taking so long, but the output of EXPLAIN SELECT might be helpful.  Also, you want to have indexes and matching data types on any columns that are used in WHERE, JOIN, GROUP, ORDER, etc.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.