T-SQL Query Optimization Minimum Distance

Posted on 2009-05-11
Last Modified: 2012-06-27
I have a table that is a cross join on itself to find the distance between all points.  The structure is basically.

initial table: Parcels
ParcelID, X, Y

the derived distance table is built via the attached code and it's structure is:
ParcelID1, ParcelID2, Distance.

I now need a table that has the following structure:
ParcelID, ClosestNegihbor1,ClosestNegihbor2,...ClosestNegihbor5, Distance1, Distance2...Distance5

There are 37 million records in the derived table.  There are 6130 ParcelID's for which I need to calculate their 5 closest neighbors.  How should I approach building the final output table?

Select	p.parcelID as ParcelID,

		l.parcelID as ParcelID2, 






		) as Distance

into DistanceTable

from liberty p

cross join liberty l 

where p.parcelID <> l.parcelID

Open in new window

Question by:Paulconsulting

Expert Comment

ID: 24360853
Can you provide the actual structure of the table along with the datatypes of each column? And could you provide some rows from the table for me to better understand your needs?

Accepted Solution

FVER earned 500 total points
ID: 24362182
The query below should do it.
To boost performance, you should consider adding the row_number data in the distance table using the row_nomber function in the filling query.

NB: in case of equal distance, the query retrieves the parcel with min ParcelId
--query to retreive 5 closest using the curent DistanceTable


select ParcelID, ParcelID2, Distance, ROW_NUMBER() over (partition by ParcelID order by Distance, ParcelID2) as RN

 from DistanceTable


SELECT R1.ParcelID, R1.ParcelID2, R2.ParcelID2, R3.ParcelID2, R4.ParcelID2, R5.ParcelID2, 

                    R1.Distance, R2.Distance, R3.Distance, R4.Distance, R5.Distance


       left join RANK_TABLE R2 on R1.ParcelID=R2.ParcelID and R2.RN=2

       left join RANK_TABLE R3 on R1.ParcelID=R3.ParcelID and R3.RN=3

       left join RANK_TABLE R4 on R1.ParcelID=R4.ParcelID and R4.RN=4

       left join RANK_TABLE R5 on R1.ParcelID=R5.ParcelID and R5.RN=5


--query to compute RN directly in the DistanceTable at build time

Select ParcelID, ParcelID2, Distance,

       ROW_NUMBER() over (partition by ParcelID order by Distance, ParcelID2) as RN

from (

Select	p.parcelID as ParcelID,

		l.parcelID as ParcelID2, 






		) as Distancefrom liberty p

cross join liberty l 

where p.parcelID <> l.parcelID) T

 into DistanceTable

Open in new window

LVL 15

Expert Comment

ID: 24365001
Ok... I wouldn't use scalar functions to create your table - that must take a long time in itself. There's lots of information out there about why this might be a bad idea.


Why not use SQL 2008, use a geometry type, and put some spatial indexes in place. Then it should be very easy to find the five closest items, because the indexing will help. The query above should then work better, because the system will be better able to find the top 5 closest items.

LVL 75

Expert Comment

by:Anthony Perkins
ID: 24365176
This is what Rob is referring to:
Investigating the new Spatial Types in SQL Server 2008 - Part 1
Investigating the new Spatial Types in SQL Server 2008 - Part 2

Author Closing Comment

ID: 31580381
Excellent!  Exactly what I needed thank you.

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
check the deletion of SQL job on who delete/disable it 12 29
MSSQL: Replace text (typo) 7 33
Sql Join Problem 2 32
MS SQL Pivot table help 4 14
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now