Link to home
Start Free TrialLog in
Avatar of wwarby
wwarby

asked on

SQL - Populate Column with Closest Lat/Lng From another Table

I have a SQL Server (2000) table containing 120,000 addresses, each with latitude/longitude co-ordinates. I have several other tables for different places of interest - as an example, I'll use museums. Each record in museums also has latitude/longitude co-ordinates. Every night, I want to populate a column in the addresses table with the name of the closest museum by distance, as the crow flies. I'll be doing this with at least 10 columns so I need a fairly efficient method. I am stumped about the best way to implement this, but supposing it were as a user-defined function, it would look something like this:

UPDATE Addresses SET NearestMuseum = GetNearest('MuseumsTable', 'MuseumNameField', LatColumn, LngColumn)

Where it would be possible to use one user defined function that could accept the name of the table you want to query, the field you want to return and the Lat/Lng you want to find the nearest record to.

I have triggers set up on all of the tables to create columns ProximityX, ProximityY and ProximityZ, the purpose of which is to pre-calculate trigonometric functions so that spatial queries can be faster at runtime. I have a stored procedure that gets the closest n records to a given lat/lng which uses them. The code sample is an example of one of these triggers.

I need a good solution to this problem and quickly, so I'm assigning the maximum points I have at my disposal.
CREATE TRIGGER [ProximityColumnsUpdate_Addresses] ON GazetteerUser.Properties
FOR INSERT, UPDATE
AS
 
UPDATE Addresses
    SET ProximityX = (COS(RADIANS(i.Latitude)) * COS(RADIANS(i.Longitude))),
        ProximityY = (COS(RADIANS(i.Latitude)) * SIN(RADIANS(i.Longitude))),
        ProximityZ = (SIN(RADIANS(i.Latitude)))
    FROM inserted i LEFT JOIN Addresses a ON i.id = p.id

Open in new window

Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

do you plan to include a "limit" to the "closest"?
I mean, when you want to say "max=100km/miles", then you can add, on the update limit on the brute difference of Lat and Long.

now:
>UPDATE Addresses SET NearestMuseum = GetNearest('MuseumsTable', 'MuseumNameField', LatColumn, LngColumn)

means it would be some dynamic sql, which is not possible in functions.
so, you will need to create 1 function per source table...
Avatar of wwarby
wwarby

ASKER

Angellll,

There are two scenarios for the columns I need to populate and I have only articulated the type for which I don't care about the radius limit. The other type is:

UPDATE Addresses SET NearbyStreetworksCount = CountWithinRadius('StreetworksTable', LatColumn, LngColumn, 1 [mile])

where I want the field in the addresses to be populated with the count of records from the streetworks table that fall within a specified search radius. I figured if somebody could help me solve the first problem I'd be able to solve the second on my own, but judging by what you've said, this second scenario makes a difference to the overall solution.

If necessary I can live with a one function per source table solution although obviously I'd rather avoid that scenario. The other way I thought could work was with a stored procedure that loops through every record in the addresses table and does it's business.
the "distance" is something to care about when speaking performance.

that say, you are not going yet the sql 2008 way?
that version has the geometry data types built-in (including indexes on geometry data types), and your query would be a plain "1-liner" ...

so, as this is a new development, you should go that way, actually.
there has been plenty of efforts made for that data type, you should use it (instead of reinventing the wheel) ...
Avatar of wwarby

ASKER

Angellll,

I could provide a distance if required but the datasets are all confined to a London borough of about 40 square miles and in most cases I'm talking very small data sets. We have 2 museums, 13 libraries about 20 parks... you get the idea. The number of addresses is the problem - 120,000 of them.

There's a good reason why we're not going the 2008 route just yet. Corporately, we use MapInfo for GIS, and MapInfo Professional 9.5 only supports SQL Server 2008's new geometry data types in read-only mode. Since we have to have users maintaining this information in a desktop GIS, we can't go there yet. Also, I have to go live with this project in November. There's no way I could upgrade our main SQL Server in time.

I can either find a solution in T-SQL or I can do it with MapInfo MapXtreme 2008 which I am using for other purposes in my data aggregation program that runs on a nightly schedule. I've spent about 16 hours flipping between both approaches so far and come up completely empty handed.
ok. the syntax would be this:
UPDATE Addresses 
   SET NearestMuseum = ( SELECT TOP 1 m.MuseumNameField
               FROM MuseumsTable m
               ORDER BY <expression_for_distance> DESC
         )

Open in new window

Avatar of wwarby

ASKER

Angellll,

Thanks, but I've kind of got that far on my own. I tried something very similar to this but couldn't work out exactly what to put in <expression_for_distance>. I'll have another go at working out <expression_for_distance> this evening.
so, you just need the formula for the distance?
see here:
http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=2671587&SiteID=17

create function dbo.Distance( @lat1 float , @long1 float , @lat2 float , @long2 float)
returns float
 
as
 
begin
 
declare @DegToRad as float
declare @Ans as float
declare @Miles as float
 
set @DegToRad = 57.29577951
set @Ans = 0
set @Miles = 0
 
if @lat1 is null or @lat1 = 0 or @long1 is null or @long1 = 0 or @lat2 is
null or @lat2 = 0 or @long2 is null or @long2 = 0
 
begin
 
return ( @Miles )
 
end
 
set @Ans = SIN(@lat1 / @DegToRad) * SIN(@lat2 / @DegToRad) + COS(@lat1 / @DegToRad ) * COS( @lat2 / @DegToRad ) * COS(ABS(@long2 - @long1 )/@DegToRad)
 
set @Miles = 3959 * ATAN(SQRT(1 - SQUARE(@Ans)) / @Ans)
 
set @Miles = CEILING(@Miles)
 
return ( @Miles )
 
end

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of wwarby

ASKER

Angellll,

Thanks very much for your efforts so far. I've implemented the code you provided and on the face of it, everything seemed to work. However, something isn't right. I have 120,000 addresses that cover an area of some 40 square miles and dotted around inside that area are 13 libraries. I ran the code against my libraries table to populate a "NearestLibrary" field but when I then do "SELECT DISTINCT NearestLibrary FROM Addresses", there are only three records. There should be 13 because every library is surrounded by residential properties.

I need to work out where the error is which I will do tomorrow because it's getting late in England. I'll use MapInfo Professional to check the locations of my Lat/Lng values in both tables and perform some spot checks on the distance calculations. I'll get back to you with the results of these tests.
run this query to see the output of the distances:
SELECT a.*, m.*, dbo.Distance( a.latitude, a.longitude, m.latitude , m.longitude ) distance
FROM MuseumsTable m
CROSS JOIN Addresses a
ORDER BY m.MuseumNameField

Open in new window

Avatar of wwarby

ASKER

I figured it out. the DESC keyword in your ORDER BY clause shouldn't have been there. I was finding the FURTHEST rather than the NEAREST library in each case. Once I removed that keyword it worked flawlessly. Incidentally, when run against 13 libraries on 120,000 properties it takes less than 90 seconds. Another approach I was working on using MapXtreme was running into 6 hours!

Thanks ever so much for your help with this - you've got me out of a real bind!
Grading Comments:
I figured it out. the DESC keyword in your ORDER BY clause shouldn't have been there. I was finding the FURTHEST rather than the NEAREST library in each case. Once I removed that keyword it worked flawlessly. Incidentally, when run against 13 libraries on 120,000 properties it takes less than 90 seconds. Another approach I was working on using MapXtreme was running into 6 hours!

Thanks ever so much for your help with this - you've got me out of a real bind!



glad I could help. sorry for the DESC :...