T-SQL Query Optimization Minimum Distance

I have a table that is a cross join on itself to find the distance between all points.  The structure is basically.

initial table: Parcels
ParcelID, X, Y

the derived distance table is built via the attached code and it's structure is:
ParcelID1, ParcelID2, Distance.

I now need a table that has the following structure:
ParcelID, ClosestNegihbor1,ClosestNegihbor2,...ClosestNegihbor5, Distance1, Distance2...Distance5

There are 37 million records in the derived table.  There are 6130 ParcelID's for which I need to calculate their 5 closest neighbors.  How should I approach building the final output table?

Select	p.parcelID as ParcelID,
		l.parcelID as ParcelID2, 
		dbo.distance(
			(dbo.x(p.parcelid))
			,(dbo.x(l.parcelid))
			,(dbo.y(p.parcelid))
			,(dbo.y(l.parcelid))
		) as Distance
into DistanceTable
from liberty p
cross join liberty l 
where p.parcelID <> l.parcelID

Open in new window

PaulconsultingAsked:
Who is Participating?
 
FVERConnect With a Mentor Commented:
The query below should do it.
To boost performance, you should consider adding the row_number data in the distance table using the row_nomber function in the filling query.

NB: in case of equal distance, the query retrieves the parcel with min ParcelId
--query to retreive 5 closest using the curent DistanceTable
WITH RANK_TABLE AS (
select ParcelID, ParcelID2, Distance, ROW_NUMBER() over (partition by ParcelID order by Distance, ParcelID2) as RN
 from DistanceTable
)
SELECT R1.ParcelID, R1.ParcelID2, R2.ParcelID2, R3.ParcelID2, R4.ParcelID2, R5.ParcelID2, 
                    R1.Distance, R2.Distance, R3.Distance, R4.Distance, R5.Distance
  FROM RANK_TABLE R1
       left join RANK_TABLE R2 on R1.ParcelID=R2.ParcelID and R2.RN=2
       left join RANK_TABLE R3 on R1.ParcelID=R3.ParcelID and R3.RN=3
       left join RANK_TABLE R4 on R1.ParcelID=R4.ParcelID and R4.RN=4
       left join RANK_TABLE R5 on R1.ParcelID=R5.ParcelID and R5.RN=5
WHERE R1.RN=1
 
 
--query to compute RN directly in the DistanceTable at build time
Select ParcelID, ParcelID2, Distance,
       ROW_NUMBER() over (partition by ParcelID order by Distance, ParcelID2) as RN
from (
Select	p.parcelID as ParcelID,
		l.parcelID as ParcelID2, 
		dbo.distance(
			(dbo.x(p.parcelid))
			,(dbo.x(l.parcelid))
			,(dbo.y(p.parcelid))
			,(dbo.y(l.parcelid))
		) as Distancefrom liberty p
cross join liberty l 
where p.parcelID <> l.parcelID) T
 into DistanceTable

Open in new window

0
 
bull_riderCommented:
Can you provide the actual structure of the table along with the datatypes of each column? And could you provide some rows from the table for me to better understand your needs?
0
 
rob_farleyCommented:
Ok... I wouldn't use scalar functions to create your table - that must take a long time in itself. There's lots of information out there about why this might be a bad idea.

But...

Why not use SQL 2008, use a geometry type, and put some spatial indexes in place. Then it should be very easy to find the five closest items, because the indexing will help. The query above should then work better, because the system will be better able to find the top 5 closest items.

Rob
0
 
Anthony PerkinsCommented:
This is what Rob is referring to:
Investigating the new Spatial Types in SQL Server 2008 - Part 1
http://www.sqlservercentral.com/articles/SQL+Server+2008/64601/
Investigating the new Spatial Types in SQL Server 2008 - Part 2
http://www.sqlservercentral.com/articles/Spatial+Data/64734/
0
 
PaulconsultingAuthor Commented:
Excellent!  Exactly what I needed thank you.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.