need help indexing my database

I run this query repeatedly, with different zip codes.

SELECT Address, City, State, Zip, Longitude, Latitude, Host, URL, Price, 
ImgSrc,PropertyType, Bedrooms, Bathrooms, Squarefeet, Acres, Yearbuilt FROM 
tblProperties WHERE ZIP IN (SELECT h.zip FROM zip g JOIN zip h on g.zip <> h.zip AND 
g.zip = '32244' AND h.zip <> '32244' WHERE g.GeogCol1.STDistance(h.GeogCol1)<=(10 * 
1609.344) ) UNION SELECT Address, City, State, Zip, Longitude, Latitude, Host, URL, Price, 
ImgSrc,PropertyType, Bedrooms, Bathrooms, Squarefeet, Acres, Yearbuilt FROM 
tblProperties WHERE ZIP = '32244';

Open in new window


How can I index my table to increase loading time. This query takes 55 seconds to return results, it's insane wait time.

The geogCol1 is geography with spatial index. On tblProperties, vvery column is a varchar and none are indexed yet. I'm not very good with indexing in sql. Can someone help me out?

Thanks.
LVL 6
poweraddictAsked:
Who is Participating?
 
Scott PletcherSenior DBACommented:
You don't need UNION [ALL] at all, just use an OR in the SELECT, like so:

SELECT Address, City, State, Zip, Longitude, Latitude, Host, URL, Price,
ImgSrc,PropertyType, Bedrooms, Bathrooms, Squarefeet, Acres, Yearbuilt
FROM tblProperties
WHERE ZIP = '32244' OR ZIP IN (SELECT h.zip FROM zip g JOIN zip h on g.zip <> h.zip AND
g.zip = '32244' AND h.zip <> '32244' WHERE g.GeogCol1.STDistance(h.GeogCol1)<=(10 *
1609.344) )

The zip table should be clustered on zip.

The tblProperties table should have at least a nonclus index on zip; don't have enough info to know if it should be clus on zip or not.

Since
    g.GeogCol1.STDistance(h.GeogCol1)
wouldn't seem to change, pre-compute it and store it in the zip table.  You can use a computed, persisted column or a native SQL column that you manually update/recompute periodically, if needed (? doesn't seem likely but could be needed).

For example, for a computed, persisted column:

ALTER TABLE zip
ADD std_distance AS GeogCol1.STDistance(GeogCol1) PERSISTED;

Then:

SELECT Address, City, State, Zip, Longitude, Latitude, Host, URL, Price,
ImgSrc,PropertyType, Bedrooms, Bathrooms, Squarefeet, Acres, Yearbuilt
FROM tblProperties
WHERE ZIP = '32244' OR ZIP IN (SELECT h.zip FROM zip g JOIN zip h on g.zip <> h.zip AND
g.zip = '32244' AND h.zip <> '32244' WHERE g.std_distance <=(10 * 1609.344) )
0
 
ianmills2002Commented:
Firstly, put an index on ZIP

Secondly, I don't think you need to use the sub query
...
zip in (SELECT h.zip FROM zip g JOIN zip h on g.zip <> h.zip AND 
g.zip = '32244' AND h.zip <> '32244' WHERE g.GeogCol1.STDistance(h.GeogCol1)<=(10 * 
1609.344)
)

Open in new window



try the following to remove the what looks like an unrequired table join

... zip in (SELECT h.zip FROM zip h where g.GeogCol1.STDistance(h.GeogCol1)<=(10 * 1609.344) and h.zip <> '32244')

Open in new window


Hopefully this helps.

Ian
0
 
Evan CutlerVolunteer Chief Information OfficerCommented:
Here is a training on indexes I have found useful:
http://sqlserverplanet.com/indexes/sql-server-indexes-tutorial

I would look into indexes on the ZIP column on those tables.
Try testing your execution plan over there....
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
poweraddictAuthor Commented:
SHOULD IT BE CLUSTERED OR NON CLUSTERED?

Thanks
0
 
Evan CutlerVolunteer Chief Information OfficerCommented:
Here's another article on it.
http://itknowledgeexchange.techtarget.com/sql-server/back-to-basics-clustered-vs-nonclustered-indexes-whats-the-difference/

http://shannonlowder.com/2009/01/sql-102-clustered-vs-nonclustered-indexes/

It depends on the rest of your application and how it interfaces with your SQL Server.  I hope that you find your answer here.
0
 
ianmills2002Commented:
Non Clustered Index should be fine.
0
 
Anthony PerkinsCommented:
Also, you should change UNION to UNION ALL.  There is no need for the SQL Server Engine to sort all the resulting data to remove duplicates, as there should not be any.
0
 
Anthony PerkinsCommented:
But actually your biggest bottleneck is the UDF GeogCol1.STDistance().  As it stands now it will execute once for practically all rows in your zip file (50K ?), and you need to find a way around that.
0
 
poweraddictAuthor Commented:
Thanks for the replies everyone.

Scott:


ALTER TABLE zip
ADD std_distance AS GeogCol1.STDistance(GeogCol1) PERSISTED;

The zip code is never the same, so I don't think this solution will work. It pulls all the zip codes close to the zip code specified. It's not always 32244...
0
 
poweraddictAuthor Commented:
I cluster indexed the zip codes in the zip table, it made a performance difference but still 3 seconds on my local server and 13 seconds on my web server to search 100k rows.

Still slow.
0
 
ianmills2002Commented:
What about an index on the ZIP column in the tblProperties table?
0
 
Anthony PerkinsCommented:
but still 3 seconds on my local server and 13 seconds on my web server to search 100k rows.
Right.  As I stated your problem is the UDF and you are right there is no way you can store that.  You could try pre-calculating it in an appropriately indexed temp table and then join against the temp table to see if that makes any difference.  Also UNION ALL should outperform a WHERE clause with an OR, but test it for yourself to see if there is any improvement.
0
 
poweraddictAuthor Commented:
ian, I did that.
0
 
ianmills2002Commented:
Sorry, I read your comment previously that you only added an index to the ZIP table, not to the tblProperties table.
0
 
poweraddictAuthor Commented:
I did it to both
0
 
poweraddictAuthor Commented:
Using just this takes 24 seconds

SELECT Address, City, State, Zip, Longitude, Latitude, Host, URL, Price,
ImgSrc,PropertyType, Bedrooms, Bathrooms, Squarefeet, Acres, Yearbuilt
FROM tblProperties WHERE zip in (SELECT zip FROM ZIP WHERE CITY='JACKSONVILLE' AND ST='FL') order by zip

This is insane...
0
 
ianmills2002Commented:
Is there an index that includes CITY and ST columns?

Check where the bottleneck is. Is it in the subquery

SELECT zip FROM ZIP WHERE CITY='JACKSONVILLE' AND ST='FL'

OR in the outer query

SELECT Address, City, State, Zip, Longitude, Latitude, Host, URL, Price,
ImgSrc,PropertyType, Bedrooms, Bathrooms, Squarefeet, Acres, Yearbuilt
FROM tblProperties WHERE zip in ('32244') order by zip
0
 
poweraddictAuthor Commented:
Its the outer query

I've attached the execution plan
execution plan
0
 
ianmills2002Commented:
Can you try running it without the ORDER BY ZIP

My understanding of a Clustered index is that the data is sorted in the order of the clustered index. If the index is on ZIP, then the order will already be correct.
0
 
poweraddictAuthor Commented:
ian, I just did it, same execution time pretty much
0
 
Anthony PerkinsCommented:
Using just this takes 24 seconds
How many rows are returned?  If it is thousands of rows, there may not be much you can do about that, at least not with SQL Server.

Assuming that you have an index on City+ST try it this way:
SELECT  Address,
        City,
        State,
        Zip,
        Longitude,
        Latitude,
        Host,
        URL,
        Price,
        ImgSrc,
        PropertyType,
        Bedrooms,
        Bathrooms,
        Squarefeet,
        Acres,
        Yearbuilt
FROM    tblProperties
WHERE   EXISTS ( SELECT 1
                 FROM   ZIP
                 WHERE  CITY = 'JACKSONVILLE'
                        AND ST = 'FL' )
ORDER BY zip

Open in new window


Or this:
SELECT  p.Address,
        p.City,
        p.State,
        p.Zip,
        p.Longitude,
        p.Latitude,
        p.Host,
        p.URL,
        p.Price,
        p.ImgSrc,
        p.PropertyType,
        p.Bedrooms,
        p.Bathrooms,
        p.Squarefeet,
        p.Acres,
        p.Yearbuilt
FROM    tblProperties p
        INNER JOIN (
                    SELECT  zip
                    FROM    ZIP
                    WHERE   CITY = 'JACKSONVILLE'
                            AND ST = 'FL'
                    GROUP BY zip
                   ) z ON p.zip = z.zip
ORDER BY p.zip

Open in new window

0
 
Anthony PerkinsCommented:
My understanding of a Clustered index is that the data is sorted in the order of the clustered index
You cannot and should not rely on that.
0
 
Scott PletcherSenior DBACommented:
>>
ALTER TABLE zip
ADD std_distance AS GeogCol1.STDistance(GeogCol1) PERSISTED;

The zip code is never the same, so I don't think this solution will work. It pulls all the zip codes close to the zip code specified. It's not always 32244...
<<

Sorry, I mis-coded the use of it!

But I'm still confused.

It looks to me like only 32244 is passed to the STDistance function ... is the other zip involved in the function somehow?

Even then, couldn't you create a table with from zip and to zip and the distances precalculated, then lookup from that table?
0
 
poweraddictAuthor Commented:
32244 was just used to show the code, in reality its a user entered parameter
0
 
Scott PletcherSenior DBACommented:
Right, but couldn't you create a table with from zip and to zip and the STDistance already computed?

Yes, the table would be large, but when clustered on from zip and to zip, the lookups should still be very fast.
0
 
poweraddictAuthor Commented:
well 48000 zip codes compared to 48000 zip codes every every combination would be enormous. Not even sure how to go about that.
0
 
Anthony PerkinsCommented:
well 48000 zip codes compared to 48000 zip codes every every combination would be enormous.
DECLARE @ZIPs integer = 48000,
      @CurrZIP integer = 1,
      @ZIPXRef integer = 0

WHILE @CurrZIP < @ZIPs
      SELECT @ZIPXRef += @ZIPs - @CurrZIP,
            @CurrZIP += 1

SELECT @ZIPXRef
0
 
Scott PletcherSenior DBACommented:
Hmm, technically can we only store from the lower zip to the higher zip?

That is, is 32244-->90210 the same value as 90210-->32244?

Ooh, do you use the 9 digit zip or just the 5 digit zips for this??
0
 
poweraddictAuthor Commented:
well it's find every zip within so many miles of a specified zip code.

5 digit zips are used.

This finds all zip codes within 10 miles of 32244

SELECT Address, City, State, Zip, Longitude, Latitude, Host, URL, Price,
ImgSrc,PropertyType, Bedrooms, Bathrooms, Squarefeet, Acres, Yearbuilt
FROM tblProperties
WHERE ZIP = '32244' OR ZIP IN (SELECT h.zip FROM zip g JOIN zip h on g.zip <> h.zip AND
g.zip = '32244' AND h.zip <> '32244' WHERE g.std_distance <=(10 * 1609.344) )
0
 
Anthony PerkinsCommented:
That is, is 32244-->90210 the same value as 90210-->32244?
Right it is a Combination.  I only included one pair in my calculation and of course did not include 32244 to 32244.
0
 
Anthony PerkinsCommented:
This finds all zip codes within 10 miles of 32244
That is good.  That means you can create a table like Scott is suggesting, but only include those rows that have a distance of 10 miles or less.  I suspect you can reduce the table down quite significantly from the 1 billion+ rows when combining them all.
0
 
Anthony PerkinsCommented:
I just read the given definition of a Combination in Wiki and in your case it would be:
48000! / 2! (48000 - 2)!

However, I would not suggest you try and calculate 48000! using a calculator or even T-SQL as you will overflow very quickly!  So the iterative method I suggested earlier would be the best option.

Incidentally, adding those ZIP codes to the table that only are within 10 miles could be an interesting script on its own, for the reason that it could take quite a long time to execute.  But once done should prove very performant if indexed appropriately.  

The other problem you will have is if you only add 32244 - 90210 (and not the reverse) that means that you will have to make sure to search on the first and not the second.  Perhaps it would be easier and not too expensive to include both.

Perhaps it is time to take a second look at those spatial data types...
0
 
Anthony PerkinsCommented:
I guess I should have read the entire article. :)

The formula can be simplified in your case to:
48000 * 47999 / 2

Which is a lot simpler than the T-SQL code I posted earlier!
0
 
Scott PletcherSenior DBACommented:
Exactly -- it's over a billion rows, but when properly indexed, that won't matter.  And each row would not be that large.

I would precompute all distances, so that you don't have to worry about what a missing value means -- does it mean it's less than nn miles or that it wasn't computed to begin with?

Since it's only two values, it's easy to check for the lesser one to make the first key.

If, however, you want to store both, you would still only have to compute one of them -- the rest could be inserted based on existing values.

That is, compute 32244-->90210, then insert the value from that computation into 90210-->32244.

Given that the values won't change, or will change only rarely (you could recompute them every week if you wanted to), I would pre-calc and store them.

Then you can do ALL distance lookups from zip to zip very quickly.
0
 
ianmills2002Commented:
I have missed so much while I have been asleep.

I go back to what acperkins wrote...
"How many rows are returned?  If it is thousands of rows, there may not be much you can do about that..."

...and back to your last post to me about the performance is with the Select on the tblProperties table.

You got the SQL to run in 3 seconds and on the Web server as 13 seconds.
When other columns might you need to search on? Especially from the Web side of things. Your original query contains many columns that could possible be filtered on by any web search. e.g. Price, PropertyType, Bedrooms, Bathrooms, Squarefeet, Acres.

If any of these others will be used to filter the search, then these columns could be added to either, the index containing ZIP that you have already created, or have other indexes built on the table containing any of the filtered columns. Exactly how many you might need depends on what filters may be required on the data.

Some examples, at my work we have a transaction table containing more than 14M rows. All reports and queries on this table contain at least APPROVED_DATE. This is one index. Each query may also filter on at least one other column. So I other indexes to improve response times for these queries. Other indexes are STATE and APPROVED_DATE, CUSTOMER and APPROVED_DATE and COST_CENTRE, CONTRACT and APPROVED_DATE. All indexes added to the table weren't setup initially, but found over time as the reporting requirements changed.
0
 
poweraddictAuthor Commented:
thanks, with all your help I was able to speed it up dramatically
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.