Solved

Spatial Columns and Indexing; what do the Levels / Cells_per_Object mean and what are good setting?

Posted on 2011-09-08
15
742 Views
Last Modified: 2012-06-21
I have historically treated location data the brute force way, but as the database has grown the calculations are becoming slow. I have a table containing Latitude, Longitude, and Road_Name, where the specific region has been carved up for me into a row every 50m (i.e. there is a Name, and location every 50 meters along the road).
When someone reports in, there is a lookup that takes their current Latitude/Longitude and "finds" the closest road so the name can be displayed. Very crude.

The starting point to improving this is to store the Spatial data (representing Lat/Lon), and instead of looking for a "nearest" on the Latitude, and the Longitude, to instead do a lookup against the Spatial column.

This created the need for a Spatial Index, but while creating the index I realised I had no idea what the index options meant. I settled on the index below, but would appreciate someone explaining to me the various parameters in the index, and suggesting how I decide on (specifically) the values for LEVEL, and the CELLS_PER_OBJECT.

My "Names" table contains 2,000,000 rows, and I perform the query (to find the Road Name) every 100 ms.

Table:[cRoadNames]
iKey Int IDENTITY(1,1) NOT NULL
fLatitude [decimal](11, 7)
fLongitude [decimal](11, 7)
cRoadName [varchar](80)
geogRd [geography]

CREATE SPATIAL INDEX [geogRd_SIdx] ON [dbo].[cRoadNames]
(
      [geogRd]
)USING  GEOGRAPHY_GRID
WITH (
GRIDS =(LEVEL_1 = LOW,LEVEL_2 = MEDIUM,LEVEL_3 = HIGH,LEVEL_4 = MEDIUM),
CELLS_PER_OBJECT = 1024, PAD_INDEX  = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO
0
Comment
Question by:Bird757
  • 8
  • 3
  • 3
15 Comments
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
Until you get some responses, here are some references of note on Technet. First the BOL for CREATE SPATIAL INDEX.

-- Spatial Indexing Overview
-- SQL Server 2008 Spatial Index Performance
0
 
LVL 14

Accepted Solution

by:
rob_farley earned 500 total points
Comment Utility
Spatial indexes are really useful, and for 2M rows, you'll want to consider your density levels. Try using high density to give you some more differentiation.

But for "nearest" queries, there's a significant problem, but there's a straight forward solution.

If you have:

SELECT TOP (1) ...
ORDER BY @place1.STDistance(geogRd);

...then it will try to work out all the STDistances.

But a spatial index works really well for STIntersects.

If you expect that your nearest road is within 100m, then include a WHERE clause that uses:

WHERE @place.STBuffer(100).STIntersects(geogRd) = 1

Use a variable instead of 100, and increase it if you find nothing.

Hope this helps get around your sargability problem. :)

Rob
0
 

Author Closing Comment

by:Bird757
Comment Utility
The real question was what the parameters mean, and that remains unclear (although the suggestion was to use HIGH density, the implication of this is unclear). There was also no mention of Cells_Per_ (which by default is 16 but allowed values are up to 8192).
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
Bird757, might I suggest in the future, instead of waiting two days and then awarding solution w/o full details you needed that you post feedback right away as you may get additional insight. It just so happened that the Expert who commented above was a Microsoft MVP for SQL Server that I asked personally to comment in the question because he is extremely knowledgeable. I think you need to take a look again at the suggestion and you will see his point is that the index actually won't help you unless you are using the correct function. What will is using a technique as he showed to keep the column sargable (search argument-able).

Anyway, not a big deal. Just helping you for future questions.

Best regards and happy coding,

Kevin
0
 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
Not a big deal Kevin. I think you deserved points for providing the information he was after - a link to the details about what the various options do.

Anyway - Bird757, please post back here to let us know how you went with your problem.
0
 

Author Comment

by:Bird757
Comment Utility
Hi Guys,

My query is using the syntax suggested and for completeness I am posting the function below. The concern I am having is using Spatial relations is resulting in a query no faster than a brute force Nearest (where I select a sub-set of latitudes / longitudes that bracket my point of interest, and then look for the nearest). I believe this is because my spatial index is not optimal. I was hoping someone knew what the relevance of the CELLS_PER_GRID parameter is because that may be influencing my query.

What I did do was some trials and I found setting CELLS_PER_GRID to 16 resulted in a 200ms query, and running the same query with the index re-built using CELLS_PER_GRID of 8192 the execution time was 25ms. Using different latitude/longitude resulted in less difference between these 2 index extremes.

So with all respect to Rob, and his input is appreciated, I was looking into insight on the design of a spatial index.

Regards,
Robert

ALTER FUNCTION [dbo].[func_LatLon_RoadKey]
(
@Lat Numeric(11, 7),
@Lon Numeric(11, 7)
)
RETURNS Int

AS
/*******************************************************************************
Select TOP 100 * From [dbo].[iRoadsX100] Order By iKey DESC
Select [dbo].[func_LatLong_RoadKey] (-25.275615, 25.7139817)
Select [dbo].[func_LatLon_RoadKey] (-25.275615, 25.7139817)
Select [dbo].[func_LatLong_RoadKey] (-33.9679100, 22.4502833)
Select [dbo].[func_LatLon_RoadKey] (-33.9679100, 22.4502833)
Select [dbo].[func_RoadKey_to_cPlace](1618855)
*******************************************************************************/
BEGIN
Declare
@iRdKey      Int,
@gPt            Geography,
@geoPt      Geography,
@iRadius      Int

IF @Lat IS NULL Or @Lon IS NULL
RETURN 2409999

Select @geoPt = [Geography]::Point(@Lat, @Lon, (4326))

Set @iRadius = 75
Select @gPt = @geoPt.STBuffer(@iRadius)
/*
Select 'A', iKey AS iRdKey, geogRd.STDistance(@geoPt) AS STDist From [dbo].[cRoadNames] WHERE geogRd.STIntersects(@gPt) = 1 Order By STDist
*/
Set @iRdKey = isnull((
Select TOP 1 iKey AS iRdKey From [dbo].[cRoadNames] WHERE geogRd.STIntersects(@gPt) = 1 Order By geogRd.STDistance(@geoPt)      -- STDist
), 0)
IF @iRdKey = 0
BEGIN
Set @iRadius = 1500
Select @gPt = @geoPt.STBuffer(@iRadius)

Set @iRdKey =  isnull((
Select TOP 1 iKey AS iRdKey From [dbo].[cRoadNames] WHERE geogRd.STIntersects(@gPt) = 1 Order By geogRd.STDistance(@geoPt)      --STDist
), 0)
IF @iRdKey = 0
BEGIN
Set @iRadius = 25000
Select @gPt = @geoPt.STBuffer(@iRadius)

Set @iRdKey =  isnull((
Select TOP 1 iKey AS iRdKey From [dbo].[cRoadNames] WHERE [RD_NAME_F] IS NULL And [TYPE] <> 'roads' And geogRd.STIntersects(@gPt) = 1 Order By geogRd.STDistance(@geoPt)      --STDist
), 0)
END
IF @iRdKey = 0
BEGIN
Set @iRadius = 75000
Select @gPt = @geoPt.STBuffer(@iRadius)

Set @iRdKey =  isnull((
Select TOP 1 iKey AS iRdKey From [dbo].[cRoadNames] WHERE [RD_NAME_F] IS NULL And [TYPE] <> 'roads' And geogRd.STIntersects(@gPt) = 1 Order By geogRd.STDistance(@geoPt)      --STDist
), 0)
IF @iRdKey = 0
Set @iRdKey = 2409999
END
END

RETURN isnull(@iRdKey, 2409999)
END
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
That was in the links I provided. It showed all four GRIDS at HIGH as suggested by Rob and CELLS_PER_OBJECT = 64. Others match what you already have. Anyway, it appears you still have ORDER BY geogRd.STDistance() in code as well as a mixture of the STIntersects() suggestion combined with local parameters within a function which will probably experience some performance issues as SQL will not be able to sniff parameter value as with stored procedure. That is not necessarily a bad thing, but in your case I think your @radius is going to be somewhat consistent, right?

Anyway, I didn't mean to come across rude earlier. My comment was geared towards enlightening you that it would have been better to post your code as you just did and it wait for feedback. There is no rush to close questions if you still have doubts.

I can tell you that, though I am not strong in spatial data, I can spot there are some areas of concern in what you have in your code that may be more of your concern than the spatial index itself. Again, I might be wrong, it just looks very inefficient.
0
Complete Microsoft Windows PC® & Mac Backup

Backup and recovery solutions to protect all your PCs & Mac– on-premises or in remote locations. Acronis backs up entire PC or Mac with patented reliable disk imaging technology and you will be able to restore workstations to a new, dissimilar hardware in minutes.

 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
By the way, for the CELLS_PER_OBJECT, there is a good explanation at the Spatial Index Overview link. You may want to read that again.
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
Never mind about the @iRadius variable for now. I must be too sleepy, I thought that was coming from parameters of the function. Let's focus on the subtle difference in this:

-- rob's suggestion
@place.STBuffer(100).STIntersects(geogRd) = 1

versus:
-- what you have currently
Set @iRadius = 75
Select @gPt = @geoPt.STBuffer(@iRadius)
geogRd.STIntersects(@gPt) = 1
or simplified:
==> geogRd.STIntersects(@place.STBuffer(75)) = 1

Do you see difference?

Think you want this:
Set @iRadius = 75
@geoPt.STBuffer(@iRadius).STIntersects(geogRd) = 1

Additionally, I read the suggestion that if using this method you wouldn't need Top(1) with Order By geogRd.STDistance(@geoPt); however, you still have both. Try it w/o those pieces---at least the ORDER BY. I think the premise is you are zeroing in on a direct intersect.

And @gPt is the local variable I am think you can just drop. You already have @geoPt, just continue to use that. The convenience having the shorter "@gPt." string to type might be costing you performance.

Once you get past this part, I would take a look at maybe doing this in a WHILE loop. Not really for performance, but for maintenance as you have a lot of repeated code.
0
 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
Hi Robert,

In AdventureWorks2008, I do this to create a table with almost 2M locations.

create table dbo.rf_locations (id int identity(1,1) primary key, SpatialLocation geography);
go
insert dbo.rf_locations select SpatialLocation from person.Address;
go 100
create spatial index rf_spatial on dbo.rf_locations(SpatialLocation);
go

This next bit of code takes (on my machine), about 40 seconds, scanning the table.
declare @loc geography = geography::STGeomFromText('POINT (-1.57 55)',4326);
select top (1) a.ID
from dbo.rf_locations as a
order by a.SpatialLocation.STDistance(@loc);

Whereas, this bit of code takes well under a second.
declare @loc geography = geography::STGeomFromText('POINT (-1.57 55)',4326);
select top 1 a.ID
from dbo.rf_locations as a
where a.SpatialLocation.STIntersects(@loc.STBuffer(1000)) = 1
order by a.SpatialLocation.STDistance(@loc);

So there's definite evidence for using a WHERE clause for these kinds of queries.

It's also worth noting that the upcoming PASS Summit has a session on tuning spatial queries. These sessions will be made available for streaming a month or so after the Summit.

Rob
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
Rob, I should have caught that there was a session in Summit about spatial queries. Robert, I some remotely for 24 hours of PASS and even remote you can get wonderful benefits. I will second that recommendation.

Additionally, Rob, so the order of the SpatialLocation and @loc.STBuffer(1000) actually doesn't matter. That is very interesting. I see now the key is filtering to only the results that intersect first, so ORDER BY is not going to calculate a distance for all 2M rows only to get the top 1. Originally read it as only needing TOP 1 with the WHERE, but I guess you need to get the most relevant match which is the closest.

Robert, sorry for my lack of understanding. Still think code can be cleaned up, but will let you get the technical side of spatial index and manipulations from Rob first. ;)
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
I just tried on my system which is clearly slower than Rob's. First query 69 seconds. Second still <1 second.
0
 

Author Comment

by:Bird757
Comment Utility
Thanks very much for the posts above. Yes, I agree the coding design I have used will benefit from re-writing in line with insights you. This is my first foray into SPATIAL data-type!

I have added personal comments below, but before that let me explain my application; I process 100+ spatial data points a second. Because of the speed of determining Road Name for each point I replicate the co-ordinate data onto a 2nd server, and there I block-process (500 at a time) to determine a Road-Name (and store the Road Key against the co-ordinate). Historically I have done this brute-force, but am trying to improve the efficiency using SPATIAL. There are 2 design concerns I need to answer (for myself); Cursor / loop through the data to do the Road-Name determination 1-co-ord at a time? Or put a block of co-ords into an array and do the determination on a set at a time. It may make no difference which I use.
At present I process between 64 and 512 at a time using an array and distance (with row_number() and TOP to get the keys) - the process taking about a second.
With the spatial query above I average 25ms to 350ms per row; which appears longer than my old brute force. That is why I think the index, and now the actual query structure as has been pointed out, is in need of refining.

And again I must add I appreciate the time taken by yourselves to respond. I am unfortunately developer and company owner on this project and days can pass between me getting to look at my dev projects. Please undersand that is not a reflectio on how much I depend on the insights you and all the contributors make.

I will only get to work on this in a day again and will post comment as soon as I have re-written test code to incorporate these.

Thanks
Robert
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
Good luck!

Here is a look at what I meant by a loop by the way. And if the change in WHERE conditions was not intentional above 1500 radius, then you can consolidate that part too. I used 100 as starting point versus 75 as 100 * 15 gets to 1500, then 1500 * 15 gets to 22500, to it held closely to your pattern. 22500 * 15 jumps to 337500. If that is too far, you can change that piece of the code to maybe do a CASE WHEN logic and pick the radii you want in sequence. My point is, you can avoid having to rewrite the same code over.

Little easier to maintain if you have a rewrite. :)

And you are most welcome. It is why we are here.
ALTER FUNCTION [dbo].[func_LatLon_RoadKey] 
(
   @Lat Numeric(11, 7),
   @Lon Numeric(11, 7)
)
RETURNS Int

AS
/*******************************************************************************
Select TOP 100 * From [dbo].[iRoadsX100] Order By iKey DESC
Select [dbo].[func_LatLong_RoadKey] (-25.275615, 25.7139817)
Select [dbo].[func_LatLon_RoadKey] (-25.275615, 25.7139817)
Select [dbo].[func_LatLong_RoadKey] (-33.9679100, 22.4502833)
Select [dbo].[func_LatLon_RoadKey] (-33.9679100, 22.4502833)
Select [dbo].[func_RoadKey_to_cPlace](1618855)
*******************************************************************************/
BEGIN
	Declare
	@iRdKey      Int,
	@geoPt      Geography,
	@iRadius      Int

	IF @Lat IS NULL Or @Lon IS NULL
	RETURN 2409999

	Select @geoPt = [Geography]::Point(@Lat, @Lon, (4326))

	Set @iRadius = 100
	While (@iRadius <= 2409999) 
	   Begin
	   
	   -- check for spatial match
	   If (@iRadius <= 1500)
	      Begin
	      Select TOP 1 @iRdKey = iKey 
	      From [dbo].[cRoadNames] 
	      Where geogRd.STIntersects(@geoPt.STBuffer(@iRadius)) = 1 
	      Order By geogRd.STDistance(@geoPt);
		  End
	   Else
	      Begin
		  Select TOP 1 @iRdKey = iKey 
	      From [dbo].[cRoadNames] 
	      Where geogRd.STIntersects(@geoPt.STBuffer(@iRadius)) = 1 
		  And [RD_NAME_F] IS NULL And [TYPE] <> 'roads'
	      Order By geogRd.STDistance(@geoPt);
		  End

	   If (@iRdKey > 0)
	      Begin
	      -- early escape
		  Return @iRdKey;
	      End
	   Else
	      Begin
		  -- next iteration
		  Set @iRadius = (@iRadius * 15);
	      End
       End

    RETURN 2409999;
END

Open in new window

0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
SQL Query Syntax Error 9 29
Ranking Based On Value 3 27
Copy Database Wizard Error 3 18
IN with @variable 5 14
If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now