Solved

Import Question & SQL Calculation

Posted on 2004-08-18
10
805 Views
Last Modified: 2012-06-27
Hi,

I am importing a table with geographic co-ordinates of about 5.5 million places on the globe. The table is rather large with 5.5 M records and about 30 data fields. The file is in text file format delimited by a tab. Now I have two questions regarding this data:

IMPORT
How can I best import this data into my SQL Server 2000 database? What I did now was to import it by uploading it to my server and importing it as a new table. Problem now is that I can easily drop the columns that I do not need but all other fields are set to CHAR 8000. Which is much too big. Changing these columns to for instance CHAR 200 takes a very long time... in fact it seems my server is never getting done with changing 1 field! Would it be a better idea to create an empty table first with the specs that I need and then import it?
Also: what is the SQL Server data type for decimal numbers? I need a data type that will support values like: 51.4166667 so at least 6-8 decimals.

CALCULATIONS
The moment I have this table inserted I will then want to be able to use it to look up the coordinates of locations on the globe. So for instance if I have something that is linked to Washington then I know the coordinates of Washington. Now I want to be able to search in our database for things that are in a 10 mile radius from washington. For this I need to be able to query a table including a calculation is this possible? Perhaps in a stored procedure?

An example of those calculations:

Simplest Calculation:
Distance = sqrt(x * x + y * y)
where x = 69.1 * (lat2 - lat1)
and y = 53.0 * (lon2 - lon1)

More Precise Calculation:
Distance = sqrt(x * x + y * y)
where x = 69.1 * (lat2 - lat1)
and y = 69.1 * (lon2 - lon1) * cos(lat1/57.3)

Most Precise Calculation:
Distance = 3963.0 * arctan[sqrt(1-x^2))/x]
where x = [sin(lat1/57.2958) * sin(lat2/57.2958)] + [cos(lat1/57.2958) * cos(lat2/57.2958) * cos(lon2/57.2958 - lon1/57.2958)]

I am fine if only the simplest calculation is feasible. But how would I put this in a query? Let say that Washington is at latitude=x and longtitude=y how would I query for places 10 miles from there?

SELECT * FROM tblX WHERE (sqrt(69.1 * (LATITUDE- x) * 69.1 * (LATITUDE- x) + y = 53.0 * (LONGITUDE- y)  * y = 53.0 * (LONGITUDE- y)) < 10)

Can you query like this? If so: does SQL Server support sin, cos, tan, arcsin, arccos, arctan? What would be the expected impact of such a query? I suppose this would be a very heavy query to run, right?




0
Comment
Question by:pgkooijman
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
10 Comments
 
LVL 5

Expert Comment

by:hkamal
ID: 11830249
Regarding Import: Definitely create a table with the correct columns and not STRINGs . A subsequent alter table on 5.5 million rows will take a long time
Regarding decimals, in banking we use NUMERIC because it allows you to specify the precision (unlike FLOAT); eg NUMERIC(20,12)


0
 
LVL 6

Assisted Solution

by:JaffaKREE
JaffaKREE earned 100 total points
ID: 11830288
When first importing the table, you should control the columns more precisely.  char(8000) is a monster.

Real and Float are SQL server data types which support decimals.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_fa-fz_6r3g.asp

Here's the functions for COS, TAN, etc : http://www.samspublishing.com/articles/article.asp?p=31453&seqNum=2

Yes, this query should work if the math is right.  5 million rows really isn't all that much, but I'm not going to wager a guess as to how long that will run for.  

If this is something you are going to run regularly, I would consider making an extra column with computed ranges for distances from major cities (assuming this would be the majority of the queries).  You could have about a dozen or so extra columns such as DIST_FROM_NYC, DIST_FROM_PHILA.  This would cut the calculation time for those queries to seconds.

Let me know how it goes.
0
 
LVL 5

Expert Comment

by:hkamal
ID: 11830337
Regarding locations, I'm no cartographer, but if you're storing coordinates of set places, then using your formulae above, you can find all sample points in your table within a certain tolerance limit (say 10%) of 10 miles
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 17

Accepted Solution

by:
BillAn1 earned 250 total points
ID: 11830344
The best way to import the file would be to use a DTS.
YOu probably should create your target table yourself, because, as you found, with a text file SQL Server is not able to detect the correct data types automatically.

The data type you want is probably FLOAT, but you could possibly use DECIMAL(10,8)

You would be best off creating a function to do your trug - and you can do the most precise quite easily :

CREATE FUNCTION dbo.calc_distance (@lon1 float, @lat1 float, @lon2 float, @lat2 float)
RETURNS float
AS
begin
declare @dist float
set @dist =  
(sin(@lat1/57.2958) * sin(@lat2/57.2958)) + (cos(@lat1/57.2958) * cos(@lat2/57.2958) * cos(@lon2/57.2958 - @lon1/57.2958))
return @dist
end

then you can query your table as :
SELECT * FROM MYTABLE
WHERE dbo.calc_distance (longitude, latitide, X,Y) < 10.0

substituting X,Y with the actual values for Washington or wherever you want....

If your data is very big, this will take a long time.
One possible way to speed this up would be to put some extra where clauses, eg
AND longitude between X-10 AND X+10
AND latitude between Y - 10 AND Y + 10

or some such thing, i.e. without any calculations, or functions.
Then you could put an index on these columns, so it would be able to narrow down to this effective square, very quickly, then use the accurate calc only for those small number of potential places




0
 
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 50 total points
ID: 11830367
>> How can I best import this data into my SQL Server 2000 database? What I did now was to import it by uploading it to my server and importing it as a new table. <<

Use BULK INSERT since that is the fastest way to import data.

Agree with others on data typing: *unless there is non-numeric data* in the input, use a numeric data type if the data is numeric.  And I too recommend using DECIMAL(nn, n) rather than float because of the exactness.
0
 
LVL 11

Assisted Solution

by:Otana
Otana earned 100 total points
ID: 11830387
About the calculations:

You should start with a select statement to find the points in a square around the search point. If you place an index on the LONGITUDE and LATITUDE tables, this should go reasonably fast. After that, you can select a calculated field which contains the distance, and perform a check on that field.

In a stored procedure, you could get something like:

CREATE PROCEDURE spSearchPoint @X bigint, @Y bigint

AS

SELECT sqrt((LATITUDE-@X)*(LATITUDE-@X) + (LONGITUDE-@Y)*(LONGITUDE-@Y)) DISTANCE
FROM
(SELECT * FROM tblCoordinates WHERE LATITUDE >= @X -10 AND LATITUDE <= @X + 10 AND LONGITUDE >= @Y - 10 AND LONGITUDE <= @Y + 10) t1
WHERE DISTANCE <= 10
0
 

Author Comment

by:pgkooijman
ID: 11830567
Wow, lot of good ideas. Will let you know how that works out.....
 
First things first.... I imported the text file again and all went very well. The table I imported contains place names and their co-ordinates so the queries that will be ran against it would mostly be: SELECT Lat,Lon FROM tblX WHERE PlaceName LIKE '%placename%' or SELECT Lat,Lon FROM tblX WHERE PlaceName='placename'

How can I speed these queries up? I tried making PlaceName a unique key but that took forever and then gave a duplicate key error. What are other tricks... I am just not great at SQL Server..... I suppose placing an index would do the trick but how do that best? How do I do that at all?
0
 
LVL 5

Expert Comment

by:hkamal
ID: 11830823
It's a good idea to use a unique key; sound like a composite key of coordiantes fits the bill here since no two locations can have the same coordinates (I don't know the implementation, but I assume as much)

You can then create a separate nonunique (nonclustered) index on PlaceName
0
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 11831254
>>WHERE PlaceName LIKE '%placename%
you cannot speed that query up unless you remove the first %.
Indexes are not of any help here, unless you start using full-text indexing.

>> I tried making PlaceName a unique key but
Of course, don't use a UNIQUE key, but a simple key (without unique property). To locate (after the index creation) which entries are duplicate:
select PlaceName, count(*)
from YourTable
group by PlaceName
having count(*)>1

CHeers

0
 

Author Comment

by:pgkooijman
ID: 11838500
Have divided the points according to the answers that gave the solutions I needed. Will start a new question with the remaining open issues. This thread is getting a bit chaotic. Thanks for all the help!
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Recently we ran in to an issue while running some SQL jobs where we were trying to process the cubes.  We got an error saying failure stating 'NT SERVICE\SQLSERVERAGENT does not have access to Analysis Services. So this is a way to automate that wit…
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.

630 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question