Solved

Import Question & SQL Calculation

Posted on 2004-08-18
10
797 Views
Last Modified: 2012-06-27
Hi,

I am importing a table with geographic co-ordinates of about 5.5 million places on the globe. The table is rather large with 5.5 M records and about 30 data fields. The file is in text file format delimited by a tab. Now I have two questions regarding this data:

IMPORT
How can I best import this data into my SQL Server 2000 database? What I did now was to import it by uploading it to my server and importing it as a new table. Problem now is that I can easily drop the columns that I do not need but all other fields are set to CHAR 8000. Which is much too big. Changing these columns to for instance CHAR 200 takes a very long time... in fact it seems my server is never getting done with changing 1 field! Would it be a better idea to create an empty table first with the specs that I need and then import it?
Also: what is the SQL Server data type for decimal numbers? I need a data type that will support values like: 51.4166667 so at least 6-8 decimals.

CALCULATIONS
The moment I have this table inserted I will then want to be able to use it to look up the coordinates of locations on the globe. So for instance if I have something that is linked to Washington then I know the coordinates of Washington. Now I want to be able to search in our database for things that are in a 10 mile radius from washington. For this I need to be able to query a table including a calculation is this possible? Perhaps in a stored procedure?

An example of those calculations:

Simplest Calculation:
Distance = sqrt(x * x + y * y)
where x = 69.1 * (lat2 - lat1)
and y = 53.0 * (lon2 - lon1)

More Precise Calculation:
Distance = sqrt(x * x + y * y)
where x = 69.1 * (lat2 - lat1)
and y = 69.1 * (lon2 - lon1) * cos(lat1/57.3)

Most Precise Calculation:
Distance = 3963.0 * arctan[sqrt(1-x^2))/x]
where x = [sin(lat1/57.2958) * sin(lat2/57.2958)] + [cos(lat1/57.2958) * cos(lat2/57.2958) * cos(lon2/57.2958 - lon1/57.2958)]

I am fine if only the simplest calculation is feasible. But how would I put this in a query? Let say that Washington is at latitude=x and longtitude=y how would I query for places 10 miles from there?

SELECT * FROM tblX WHERE (sqrt(69.1 * (LATITUDE- x) * 69.1 * (LATITUDE- x) + y = 53.0 * (LONGITUDE- y)  * y = 53.0 * (LONGITUDE- y)) < 10)

Can you query like this? If so: does SQL Server support sin, cos, tan, arcsin, arccos, arctan? What would be the expected impact of such a query? I suppose this would be a very heavy query to run, right?




0
Comment
Question by:pgkooijman
10 Comments
 
LVL 5

Expert Comment

by:hkamal
ID: 11830249
Regarding Import: Definitely create a table with the correct columns and not STRINGs . A subsequent alter table on 5.5 million rows will take a long time
Regarding decimals, in banking we use NUMERIC because it allows you to specify the precision (unlike FLOAT); eg NUMERIC(20,12)


0
 
LVL 6

Assisted Solution

by:JaffaKREE
JaffaKREE earned 100 total points
ID: 11830288
When first importing the table, you should control the columns more precisely.  char(8000) is a monster.

Real and Float are SQL server data types which support decimals.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_fa-fz_6r3g.asp

Here's the functions for COS, TAN, etc : http://www.samspublishing.com/articles/article.asp?p=31453&seqNum=2

Yes, this query should work if the math is right.  5 million rows really isn't all that much, but I'm not going to wager a guess as to how long that will run for.  

If this is something you are going to run regularly, I would consider making an extra column with computed ranges for distances from major cities (assuming this would be the majority of the queries).  You could have about a dozen or so extra columns such as DIST_FROM_NYC, DIST_FROM_PHILA.  This would cut the calculation time for those queries to seconds.

Let me know how it goes.
0
 
LVL 5

Expert Comment

by:hkamal
ID: 11830337
Regarding locations, I'm no cartographer, but if you're storing coordinates of set places, then using your formulae above, you can find all sample points in your table within a certain tolerance limit (say 10%) of 10 miles
0
NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

 
LVL 17

Accepted Solution

by:
BillAn1 earned 250 total points
ID: 11830344
The best way to import the file would be to use a DTS.
YOu probably should create your target table yourself, because, as you found, with a text file SQL Server is not able to detect the correct data types automatically.

The data type you want is probably FLOAT, but you could possibly use DECIMAL(10,8)

You would be best off creating a function to do your trug - and you can do the most precise quite easily :

CREATE FUNCTION dbo.calc_distance (@lon1 float, @lat1 float, @lon2 float, @lat2 float)
RETURNS float
AS
begin
declare @dist float
set @dist =  
(sin(@lat1/57.2958) * sin(@lat2/57.2958)) + (cos(@lat1/57.2958) * cos(@lat2/57.2958) * cos(@lon2/57.2958 - @lon1/57.2958))
return @dist
end

then you can query your table as :
SELECT * FROM MYTABLE
WHERE dbo.calc_distance (longitude, latitide, X,Y) < 10.0

substituting X,Y with the actual values for Washington or wherever you want....

If your data is very big, this will take a long time.
One possible way to speed this up would be to put some extra where clauses, eg
AND longitude between X-10 AND X+10
AND latitude between Y - 10 AND Y + 10

or some such thing, i.e. without any calculations, or functions.
Then you could put an index on these columns, so it would be able to narrow down to this effective square, very quickly, then use the accurate calc only for those small number of potential places




0
 
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 50 total points
ID: 11830367
>> How can I best import this data into my SQL Server 2000 database? What I did now was to import it by uploading it to my server and importing it as a new table. <<

Use BULK INSERT since that is the fastest way to import data.

Agree with others on data typing: *unless there is non-numeric data* in the input, use a numeric data type if the data is numeric.  And I too recommend using DECIMAL(nn, n) rather than float because of the exactness.
0
 
LVL 11

Assisted Solution

by:Otana
Otana earned 100 total points
ID: 11830387
About the calculations:

You should start with a select statement to find the points in a square around the search point. If you place an index on the LONGITUDE and LATITUDE tables, this should go reasonably fast. After that, you can select a calculated field which contains the distance, and perform a check on that field.

In a stored procedure, you could get something like:

CREATE PROCEDURE spSearchPoint @X bigint, @Y bigint

AS

SELECT sqrt((LATITUDE-@X)*(LATITUDE-@X) + (LONGITUDE-@Y)*(LONGITUDE-@Y)) DISTANCE
FROM
(SELECT * FROM tblCoordinates WHERE LATITUDE >= @X -10 AND LATITUDE <= @X + 10 AND LONGITUDE >= @Y - 10 AND LONGITUDE <= @Y + 10) t1
WHERE DISTANCE <= 10
0
 

Author Comment

by:pgkooijman
ID: 11830567
Wow, lot of good ideas. Will let you know how that works out.....
 
First things first.... I imported the text file again and all went very well. The table I imported contains place names and their co-ordinates so the queries that will be ran against it would mostly be: SELECT Lat,Lon FROM tblX WHERE PlaceName LIKE '%placename%' or SELECT Lat,Lon FROM tblX WHERE PlaceName='placename'

How can I speed these queries up? I tried making PlaceName a unique key but that took forever and then gave a duplicate key error. What are other tricks... I am just not great at SQL Server..... I suppose placing an index would do the trick but how do that best? How do I do that at all?
0
 
LVL 5

Expert Comment

by:hkamal
ID: 11830823
It's a good idea to use a unique key; sound like a composite key of coordiantes fits the bill here since no two locations can have the same coordinates (I don't know the implementation, but I assume as much)

You can then create a separate nonunique (nonclustered) index on PlaceName
0
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 11831254
>>WHERE PlaceName LIKE '%placename%
you cannot speed that query up unless you remove the first %.
Indexes are not of any help here, unless you start using full-text indexing.

>> I tried making PlaceName a unique key but
Of course, don't use a UNIQUE key, but a simple key (without unique property). To locate (after the index creation) which entries are duplicate:
select PlaceName, count(*)
from YourTable
group by PlaceName
having count(*)>1

CHeers

0
 

Author Comment

by:pgkooijman
ID: 11838500
Have divided the points according to the answers that gave the solutions I needed. Will start a new question with the remaining open issues. This thread is getting a bit chaotic. Thanks for all the help!
0

Featured Post

Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Having an SQL database can be a big investment for a small company. Hardware, setup and of course, the price of software all add up to a big bill that some companies may not be able to absorb.  Luckily, there is a free version SQL Express, but does …
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question