[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 338
  • Last Modified:

Advice on select statement

I have following table [ZipCodes]. Last column with values of 1 and 2 is [ZipCodeID Int Identity(1,1)] which is [Unique Clustered Index]
                                                                                                           
07009       S      Cedar Grove      D      New Jersey      NJ      973      40.854334999999999      -74.231713999999997    1
07009       S      Overbrook         N      New Jersey      NJ      973      40.854334999999999      -74.231713999999997    2
......
......
......
(70,500 Records Total)

then I have table [Jobs], when somebody places a new job they fill in the form with [Address1, Address2, City, State, ZipCode], but table [Jobs] doesn't really have all this columns, it only has [Address1, Address2] and  [ZipCodeID] which is [Nonclustered Index] Foreign Key referenced to [dbo.ZipCodes(ZipCodeID)], in order to find jobs I use following select statement:

SELECT j.JobID     --j.JobID is Unique Clustered Index
FROM
      CONTAINSTABLE(Jobs, JobDetails, 'Keyword') as j
      JOIN ZipCodes z ON j.ZipCodeID=z.ZipCodeID
      WHERE
            j.Datein between GETDATE() and '10/17/2004'
            and z.ZipCode='07009' and z.State='NJ' and z.City='Cedar Grove'


now [j.Datein] is Unique Clustered Index and [z.ZipCode, z.State and z.City] are Nonclustered Indexes, so what I'm trying to do in in first place is that, I want to save some space because my hosting company allows me to have 250MB space for database and my empty database right not has reached up to 25MB and I would like to ask you 2 questions regarding my select statement

1) when I join tables like this [JOIN ZipCodes z ON j.ZipCodeID=z.ZipCodeID] and then match [z.ZipCode='07009' and z.State='NJ' and z.City='Cedar Grove'] is this gonna scan whole [ZipCodess] table for matching values in [z.ZipCode, z.State, z.City] or is it gonna go straight to the referenced record?

2) and after all is it worth saving space? Otherwize I have to store [City State and ZipCode] in five seperate tables, this way I have to store only [ZipCodeID] in every table which is Foreign Key to [ZipCodes]

sorry if all this sounds confusing, but that's the problem I have right now...  :)
0
davidlars99
Asked:
davidlars99
  • 10
  • 8
  • 3
9 Solutions
 
sigmaconCommented:
1) If you have indexes on z.ZipCode, z.State and z.City, SQL Server is going to attempt an index seek ('going directly there' - which is good) - not a table scan. If you have a separate index for each, and you are specifying all, then the query plan may deteriorate, because it will have to join the results from each seek - if you are as specific as in your example above, that's ok because it will only have to join VERY few rows. It's always a good idea to match indexes to span the same columns that are going to show up in the where clause. That most often guarantees good results. So, if space allows it, you would want one index covering all the columns in the WHERE clause. I am assuming that ZipCodeID is the clustered index, so you are good there. Of course, space constraints require you to be picky about the indexes, but even if you specify two out of three columns covered by and index, the resulting index scan may be cheap if the WHERE clause had high specificity - much cheaper than a table scan anyway.

Before you change anything, test performance and look at query plans the way your indicies are in right now. If performance is not satisfactory, then reorganize indexes.

2) In your model, City and State are repeated many times for each ZIP Code or one ZIP Code may be repeated for more than one town. Your table stores an additional piece of data: The association between a Zip Code, a Town, and a State - and it is storing this association in a denormalized form. In the long term, when the number of Jobs approaches the number of rows in the ZipCodes table and your selection of ZipCodeIDs is wide, then you MAY save some space using this model. If you really need to safe space, you may want to normalize further, but your data retrieval operation may suffer:

a) store zip codes as int instead of char(5) -> save three(?) bytes -- lets not do that unless you need to deal with 9 digit strings, then you save 8 bytes * 70000 = 560,000 bytes ~ 0.5 MB still not worth it.
b) store all cities in their own table and give them a number - adding a new city every time a user enters one you don't know, then creating a mapping table between zip codes and cities ... mmh - if we ignore the increased maintenance, this might save a few MB, but you can test that quickly locally if you decompose the table as described.
c) storing states seperately ... why not, this is more a matter of taste - I've seen too many flame wars about this to make a statement. If you never care about the long state name, you might not have to do this. If your app's supposed to be international, then you want to use ISO 3166-1 and -2 codes properly put in respective tables.

If you want to maintain the association, you may not save much space with the decomposition because of the mapping tables between ZIP Codes, States and Cities. You also will not be able to use an index covering multiple columns and thus most of your queries will involve joins. Doing it the usual way, storing ZIP, City and State in some address record will cost less space until you have 70,000 jobs. And although I am a big fan of total normalization, in this case I would say: Leave everything the way it is. Your only problem is going to be when people try to enter ZIP, City and State combinations that you don't have in your table. If you are going to allow all those (including misspellings) then you are not going to save space. This is not a data base issue, this is an application issue where a too-rigit data model might bite you later.

I hope some other experts can share their opinion on this.
0
 
Brendt HessSenior DBACommented:
You also should look at how the data matches the real world.

If you have a zipcode, you can automatically get the state.  You need not join on the state, since any given zip code is wholly within one state.
Many people claim that there is only once City per zip code.  However, your sample data removes that as a possibility.  A zipcode can be within two or more city jurisdictions.

To speed things up, you should seek only on Zipcode + City, and have an index on those two fields.  Drop the state from the lookup.

Also, you may want a table for looking up the state separately.  A table structured as:

    State char(2) not null
    StateName Varchar(20) Not Null
    MinimumZipCode char(5) not null
    MaximumZipCode char(5) not null

With an index (MinimumZipCode, MaximumZipCode), and an index (State) [note - possible clustered key]

To find the state for a given zip code,

SELECT State FROM StateZipRange WHERE MinimumZipCode <= '07009' AND MaximumZipCode >='07009'

(32 bytes max) * (51 states + DC) is less than 2K data space total.  However, keeping the State and State Name in the individual Zip code records is (conservatively) an average of 7 characters for the state name + 2 for the abbreviation + a bit for overhead.  10 bytes * ... 100000 records?  200000 records?  is a lot more data space taken up.
0
 
sigmaconCommented:
I agree with bhess1 suggestions above, but I would put the clustered key for the other lookup table on MinimumZipCode and MaximumZipCode, because clustered indicies are perfect for range lookups. Of course all this will make your database even bigger.

Considering todays storage costs, it seems anachronistic of your host to limit DB size to such a small amount. Maybe you can find a better deal somewhere?
0
Learn to develop an Android App

Want to increase your earning potential in 2018? Pad your resume with app building experience. Learn how with this hands-on course.

 
davidlars99Author Commented:
I removed some records from ZipCodes database and I have 29564 rows (total before was 70500) of unique ZipCodes, I removed only those which are not used by US Postal Service and guess what, that's all I will ever need to use, so I have clean database now and I think my select statement's goal looks brighter. I can put Unique Clustered index on ZipCode itself...  :)

BTW, another question: what is the diference between how I create my index


>> ZipCode Char(5) Not null Primary Key Clustered
AND
>> Create Unique Clustered Index on dbo.ZipCodes(ZipCode)


especialy in my case? what if I do it this way, I mean not specifying the Primary key:


>> ZipCode Char(5) Not null
and then create index, it still going to be unique anyway, or is it all about [Primary Key]..?
>> Create Unique Clustered Index on dbo.ZipCodes(ZipCode)




0
 
davidlars99Author Commented:
have you ever worked with the database like that? I mean US zipcode database, it is very strange that two same zipcodes have two diferent Cities THAT IS NOT RIGHT, businesswize of course   :)
0
 
Brendt HessSenior DBACommented:
The clustered index on Zip Code should not be UNIQUE -- as noted in your data, there can be more than one city per zip code.

davidlars99 -- that is a very common misconception that is not true.  Many zip codes contain more than one named jurisdiction, especially in rural areas where a zip might take in 500 square miles containing a half dozen small (30 to 300 person) communities.  In fact, there is a Zip in Portland, OR right now that is partially within the city of Gresham as well.  That's how I know.
0
 
sigmaconCommented:
Of course, it's perfectly valid for two separate towns to share the same ZIP code. I think there are over 100,000 cities and towns incorporated in the US. If each of them would have it's own zip code, than big office buildings in downtown LA wouldn't be able to get the two ZIP codes they need ;-) ( I am exaggerating here! )
0
 
davidlars99Author Commented:
0
 
sigmaconCommented:
With the exaggeration I meant the TWO! I know of many buildings that have an entire ZIP code of their own. I don't know of any with more than one, but it's plausible, I guess.
0
 
davidlars99Author Commented:
I can assure you that all zipcodes are unique across the country and now in my database as well  :)
0
 
davidlars99Author Commented:
I guess US postal service cares about their database as much as we do  :)
0
 
sigmaconCommented:
From the product description:

>> With over 70,000 records – just under 43,000 ZIP Codes - you’ll be glad you have ...

If you have 43,000 ZIP codes, but 70,000 records, then mathematically speaking you are going to have some ZIP Codes in there more than once ... I take it your kidding?
0
 
davidlars99Author Commented:
S-  Standard ZIP Code
P - PO Box Only
U - Used for Organizations such as large businesses or buildings
M - Military, often used for military vessels, but mostly for military bases
 

D - Default City Name - recommended by the USPS
A - Alternate City Name, also used by the USPS, although not the primary name of that ZIP Code.
N - Not-recommended City Name, identifies the name of the ZIP Code - often used by locals as a place name. Not used, nor accepted, by the USPS.
 


these are all type of zipcodes and cityes that US has and I guess to setisfy my business needs S and U are more then enough. As for cities D is doing just fine, all that makes 29,564 of unique total rows in my database
0
 
sigmaconCommented:
ok, now it makes sense how you reduced the number of rows, but what if you get a client that has a different ZIP/City combination then the those you kept? What if someone from Gresham with that ZIP code that bhess1 mentioned is a client - and you only have that ZIP code for Portland?
0
 
davidlars99Author Commented:
>> if someone from Gresham with that ZIP code that bhess1 mentioned is a cli...
in this case of mistake I will display all possible cities coresponding that particular zipcode or whatever they enter, it is only matter of little SQL select statement...  :)


and my select statement should do well without join

declare @ZipCode Char(5)
select @ZipCode=ZipCode from ZipCodes where City='Cedar Grove' and State='NJ'

SELECT JobID
FROM
     CONTAINSTABLE(Jobs, JobDetails, 'Keyword')
     WHERE
          Datein between GETDATE() and '10/17/2004'  
          and ZipCode=@ZipCode


in this case I will make Index on (City, State) in my ZipCodes table
0
 
sigmaconCommented:
Well, I guess your question is answered then. Good luck.
0
 
davidlars99Author Commented:
>> in this case of mistake I will display all possible cities coresponding that particular zipcode or whatever they enter, it is only matter of little SQL select statement...  :)


I mean, that search will not be performed, instead there will be links like:


Please choose one or more specific area:
==========================

Cedar Grove, NJ
Cedar Grove, CA
Cedar Grove, WA
...
...
...

and then they are more then welcome to search :)
there are 9 cities most with the same name in US so I don't think it's gonna hurt performance and I don't either that 100% of the visitors are going to make this mistakes and even they do that should not produse any noticableoverhead
0
 
davidlars99Author Commented:
0
 
Brendt HessSenior DBACommented:
Personal opinion: The only substantive difference is that Primary Key automatically disallows NULL values in the underlying field(s), and that there can be only one Primary Key.  If the underlying data field(s) are defined as Not Null, then there is no substantive difference.
0
 
davidlars99Author Commented:
as long as there is no diference performancewize I don't care...  :)

thank you very much guys
0
 
sigmaconCommented:
You will not be able to create a foreign key unless it points to a primary key. This is one of those dualities that gets easily missed: SQL Server treats primary keys both as constraints from a conceptual model point a view and as indexes from a physical model point of view. A primary key (if nothing else is specified) is by default a unique clustered index, and a unique constraint. But if you don't call it a primary key, you will not be able to create a foreign key to it.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 10
  • 8
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now