Import Question & SQL Calculation

Posted on 2004-08-18
Last Modified: 2012-06-27

I am importing a table with geographic co-ordinates of about 5.5 million places on the globe. The table is rather large with 5.5 M records and about 30 data fields. The file is in text file format delimited by a tab. Now I have two questions regarding this data:

How can I best import this data into my SQL Server 2000 database? What I did now was to import it by uploading it to my server and importing it as a new table. Problem now is that I can easily drop the columns that I do not need but all other fields are set to CHAR 8000. Which is much too big. Changing these columns to for instance CHAR 200 takes a very long time... in fact it seems my server is never getting done with changing 1 field! Would it be a better idea to create an empty table first with the specs that I need and then import it?
Also: what is the SQL Server data type for decimal numbers? I need a data type that will support values like: 51.4166667 so at least 6-8 decimals.

The moment I have this table inserted I will then want to be able to use it to look up the coordinates of locations on the globe. So for instance if I have something that is linked to Washington then I know the coordinates of Washington. Now I want to be able to search in our database for things that are in a 10 mile radius from washington. For this I need to be able to query a table including a calculation is this possible? Perhaps in a stored procedure?

An example of those calculations:

Simplest Calculation:
Distance = sqrt(x * x + y * y)
where x = 69.1 * (lat2 - lat1)
and y = 53.0 * (lon2 - lon1)

More Precise Calculation:
Distance = sqrt(x * x + y * y)
where x = 69.1 * (lat2 - lat1)
and y = 69.1 * (lon2 - lon1) * cos(lat1/57.3)

Most Precise Calculation:
Distance = 3963.0 * arctan[sqrt(1-x^2))/x]
where x = [sin(lat1/57.2958) * sin(lat2/57.2958)] + [cos(lat1/57.2958) * cos(lat2/57.2958) * cos(lon2/57.2958 - lon1/57.2958)]

I am fine if only the simplest calculation is feasible. But how would I put this in a query? Let say that Washington is at latitude=x and longtitude=y how would I query for places 10 miles from there?

SELECT * FROM tblX WHERE (sqrt(69.1 * (LATITUDE- x) * 69.1 * (LATITUDE- x) + y = 53.0 * (LONGITUDE- y)  * y = 53.0 * (LONGITUDE- y)) < 10)

Can you query like this? If so: does SQL Server support sin, cos, tan, arcsin, arccos, arctan? What would be the expected impact of such a query? I suppose this would be a very heavy query to run, right?

Question by:pgkooijman
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Expert Comment

ID: 11830249
Regarding Import: Definitely create a table with the correct columns and not STRINGs . A subsequent alter table on 5.5 million rows will take a long time
Regarding decimals, in banking we use NUMERIC because it allows you to specify the precision (unlike FLOAT); eg NUMERIC(20,12)


Assisted Solution

JaffaKREE earned 100 total points
ID: 11830288
When first importing the table, you should control the columns more precisely.  char(8000) is a monster.

Real and Float are SQL server data types which support decimals.

Here's the functions for COS, TAN, etc :

Yes, this query should work if the math is right.  5 million rows really isn't all that much, but I'm not going to wager a guess as to how long that will run for.  

If this is something you are going to run regularly, I would consider making an extra column with computed ranges for distances from major cities (assuming this would be the majority of the queries).  You could have about a dozen or so extra columns such as DIST_FROM_NYC, DIST_FROM_PHILA.  This would cut the calculation time for those queries to seconds.

Let me know how it goes.

Expert Comment

ID: 11830337
Regarding locations, I'm no cartographer, but if you're storing coordinates of set places, then using your formulae above, you can find all sample points in your table within a certain tolerance limit (say 10%) of 10 miles
Business Impact of IT Communications

What are the business impacts of how well businesses communicate during an IT incident? Targeting, speed, and transparency all matter. Find out more in this infographic.

LVL 17

Accepted Solution

BillAn1 earned 250 total points
ID: 11830344
The best way to import the file would be to use a DTS.
YOu probably should create your target table yourself, because, as you found, with a text file SQL Server is not able to detect the correct data types automatically.

The data type you want is probably FLOAT, but you could possibly use DECIMAL(10,8)

You would be best off creating a function to do your trug - and you can do the most precise quite easily :

CREATE FUNCTION dbo.calc_distance (@lon1 float, @lat1 float, @lon2 float, @lat2 float)
declare @dist float
set @dist =  
(sin(@lat1/57.2958) * sin(@lat2/57.2958)) + (cos(@lat1/57.2958) * cos(@lat2/57.2958) * cos(@lon2/57.2958 - @lon1/57.2958))
return @dist

then you can query your table as :
WHERE dbo.calc_distance (longitude, latitide, X,Y) < 10.0

substituting X,Y with the actual values for Washington or wherever you want....

If your data is very big, this will take a long time.
One possible way to speed this up would be to put some extra where clauses, eg
AND longitude between X-10 AND X+10
AND latitude between Y - 10 AND Y + 10

or some such thing, i.e. without any calculations, or functions.
Then you could put an index on these columns, so it would be able to narrow down to this effective square, very quickly, then use the accurate calc only for those small number of potential places

LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 50 total points
ID: 11830367
>> How can I best import this data into my SQL Server 2000 database? What I did now was to import it by uploading it to my server and importing it as a new table. <<

Use BULK INSERT since that is the fastest way to import data.

Agree with others on data typing: *unless there is non-numeric data* in the input, use a numeric data type if the data is numeric.  And I too recommend using DECIMAL(nn, n) rather than float because of the exactness.
LVL 11

Assisted Solution

Otana earned 100 total points
ID: 11830387
About the calculations:

You should start with a select statement to find the points in a square around the search point. If you place an index on the LONGITUDE and LATITUDE tables, this should go reasonably fast. After that, you can select a calculated field which contains the distance, and perform a check on that field.

In a stored procedure, you could get something like:

CREATE PROCEDURE spSearchPoint @X bigint, @Y bigint


(SELECT * FROM tblCoordinates WHERE LATITUDE >= @X -10 AND LATITUDE <= @X + 10 AND LONGITUDE >= @Y - 10 AND LONGITUDE <= @Y + 10) t1

Author Comment

ID: 11830567
Wow, lot of good ideas. Will let you know how that works out.....
First things first.... I imported the text file again and all went very well. The table I imported contains place names and their co-ordinates so the queries that will be ran against it would mostly be: SELECT Lat,Lon FROM tblX WHERE PlaceName LIKE '%placename%' or SELECT Lat,Lon FROM tblX WHERE PlaceName='placename'

How can I speed these queries up? I tried making PlaceName a unique key but that took forever and then gave a duplicate key error. What are other tricks... I am just not great at SQL Server..... I suppose placing an index would do the trick but how do that best? How do I do that at all?

Expert Comment

ID: 11830823
It's a good idea to use a unique key; sound like a composite key of coordiantes fits the bill here since no two locations can have the same coordinates (I don't know the implementation, but I assume as much)

You can then create a separate nonunique (nonclustered) index on PlaceName
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 11831254
>>WHERE PlaceName LIKE '%placename%
you cannot speed that query up unless you remove the first %.
Indexes are not of any help here, unless you start using full-text indexing.

>> I tried making PlaceName a unique key but
Of course, don't use a UNIQUE key, but a simple key (without unique property). To locate (after the index creation) which entries are duplicate:
select PlaceName, count(*)
from YourTable
group by PlaceName
having count(*)>1



Author Comment

ID: 11838500
Have divided the points according to the answers that gave the solutions I needed. Will start a new question with the remaining open issues. This thread is getting a bit chaotic. Thanks for all the help!

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Viewers will learn how the fundamental information of how to create a table.

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question