Solved

Import Question & SQL Calculation

Posted on 2004-08-18
10
794 Views
Last Modified: 2012-06-27
Hi,

I am importing a table with geographic co-ordinates of about 5.5 million places on the globe. The table is rather large with 5.5 M records and about 30 data fields. The file is in text file format delimited by a tab. Now I have two questions regarding this data:

IMPORT
How can I best import this data into my SQL Server 2000 database? What I did now was to import it by uploading it to my server and importing it as a new table. Problem now is that I can easily drop the columns that I do not need but all other fields are set to CHAR 8000. Which is much too big. Changing these columns to for instance CHAR 200 takes a very long time... in fact it seems my server is never getting done with changing 1 field! Would it be a better idea to create an empty table first with the specs that I need and then import it?
Also: what is the SQL Server data type for decimal numbers? I need a data type that will support values like: 51.4166667 so at least 6-8 decimals.

CALCULATIONS
The moment I have this table inserted I will then want to be able to use it to look up the coordinates of locations on the globe. So for instance if I have something that is linked to Washington then I know the coordinates of Washington. Now I want to be able to search in our database for things that are in a 10 mile radius from washington. For this I need to be able to query a table including a calculation is this possible? Perhaps in a stored procedure?

An example of those calculations:

Simplest Calculation:
Distance = sqrt(x * x + y * y)
where x = 69.1 * (lat2 - lat1)
and y = 53.0 * (lon2 - lon1)

More Precise Calculation:
Distance = sqrt(x * x + y * y)
where x = 69.1 * (lat2 - lat1)
and y = 69.1 * (lon2 - lon1) * cos(lat1/57.3)

Most Precise Calculation:
Distance = 3963.0 * arctan[sqrt(1-x^2))/x]
where x = [sin(lat1/57.2958) * sin(lat2/57.2958)] + [cos(lat1/57.2958) * cos(lat2/57.2958) * cos(lon2/57.2958 - lon1/57.2958)]

I am fine if only the simplest calculation is feasible. But how would I put this in a query? Let say that Washington is at latitude=x and longtitude=y how would I query for places 10 miles from there?

SELECT * FROM tblX WHERE (sqrt(69.1 * (LATITUDE- x) * 69.1 * (LATITUDE- x) + y = 53.0 * (LONGITUDE- y)  * y = 53.0 * (LONGITUDE- y)) < 10)

Can you query like this? If so: does SQL Server support sin, cos, tan, arcsin, arccos, arctan? What would be the expected impact of such a query? I suppose this would be a very heavy query to run, right?




0
Comment
Question by:pgkooijman
10 Comments
 
LVL 5

Expert Comment

by:hkamal
Comment Utility
Regarding Import: Definitely create a table with the correct columns and not STRINGs . A subsequent alter table on 5.5 million rows will take a long time
Regarding decimals, in banking we use NUMERIC because it allows you to specify the precision (unlike FLOAT); eg NUMERIC(20,12)


0
 
LVL 6

Assisted Solution

by:JaffaKREE
JaffaKREE earned 100 total points
Comment Utility
When first importing the table, you should control the columns more precisely.  char(8000) is a monster.

Real and Float are SQL server data types which support decimals.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_fa-fz_6r3g.asp

Here's the functions for COS, TAN, etc : http://www.samspublishing.com/articles/article.asp?p=31453&seqNum=2

Yes, this query should work if the math is right.  5 million rows really isn't all that much, but I'm not going to wager a guess as to how long that will run for.  

If this is something you are going to run regularly, I would consider making an extra column with computed ranges for distances from major cities (assuming this would be the majority of the queries).  You could have about a dozen or so extra columns such as DIST_FROM_NYC, DIST_FROM_PHILA.  This would cut the calculation time for those queries to seconds.

Let me know how it goes.
0
 
LVL 5

Expert Comment

by:hkamal
Comment Utility
Regarding locations, I'm no cartographer, but if you're storing coordinates of set places, then using your formulae above, you can find all sample points in your table within a certain tolerance limit (say 10%) of 10 miles
0
 
LVL 17

Accepted Solution

by:
BillAn1 earned 250 total points
Comment Utility
The best way to import the file would be to use a DTS.
YOu probably should create your target table yourself, because, as you found, with a text file SQL Server is not able to detect the correct data types automatically.

The data type you want is probably FLOAT, but you could possibly use DECIMAL(10,8)

You would be best off creating a function to do your trug - and you can do the most precise quite easily :

CREATE FUNCTION dbo.calc_distance (@lon1 float, @lat1 float, @lon2 float, @lat2 float)
RETURNS float
AS
begin
declare @dist float
set @dist =  
(sin(@lat1/57.2958) * sin(@lat2/57.2958)) + (cos(@lat1/57.2958) * cos(@lat2/57.2958) * cos(@lon2/57.2958 - @lon1/57.2958))
return @dist
end

then you can query your table as :
SELECT * FROM MYTABLE
WHERE dbo.calc_distance (longitude, latitide, X,Y) < 10.0

substituting X,Y with the actual values for Washington or wherever you want....

If your data is very big, this will take a long time.
One possible way to speed this up would be to put some extra where clauses, eg
AND longitude between X-10 AND X+10
AND latitude between Y - 10 AND Y + 10

or some such thing, i.e. without any calculations, or functions.
Then you could put an index on these columns, so it would be able to narrow down to this effective square, very quickly, then use the accurate calc only for those small number of potential places




0
 
LVL 69

Assisted Solution

by:ScottPletcher
ScottPletcher earned 50 total points
Comment Utility
>> How can I best import this data into my SQL Server 2000 database? What I did now was to import it by uploading it to my server and importing it as a new table. <<

Use BULK INSERT since that is the fastest way to import data.

Agree with others on data typing: *unless there is non-numeric data* in the input, use a numeric data type if the data is numeric.  And I too recommend using DECIMAL(nn, n) rather than float because of the exactness.
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 11

Assisted Solution

by:Otana
Otana earned 100 total points
Comment Utility
About the calculations:

You should start with a select statement to find the points in a square around the search point. If you place an index on the LONGITUDE and LATITUDE tables, this should go reasonably fast. After that, you can select a calculated field which contains the distance, and perform a check on that field.

In a stored procedure, you could get something like:

CREATE PROCEDURE spSearchPoint @X bigint, @Y bigint

AS

SELECT sqrt((LATITUDE-@X)*(LATITUDE-@X) + (LONGITUDE-@Y)*(LONGITUDE-@Y)) DISTANCE
FROM
(SELECT * FROM tblCoordinates WHERE LATITUDE >= @X -10 AND LATITUDE <= @X + 10 AND LONGITUDE >= @Y - 10 AND LONGITUDE <= @Y + 10) t1
WHERE DISTANCE <= 10
0
 

Author Comment

by:pgkooijman
Comment Utility
Wow, lot of good ideas. Will let you know how that works out.....
 
First things first.... I imported the text file again and all went very well. The table I imported contains place names and their co-ordinates so the queries that will be ran against it would mostly be: SELECT Lat,Lon FROM tblX WHERE PlaceName LIKE '%placename%' or SELECT Lat,Lon FROM tblX WHERE PlaceName='placename'

How can I speed these queries up? I tried making PlaceName a unique key but that took forever and then gave a duplicate key error. What are other tricks... I am just not great at SQL Server..... I suppose placing an index would do the trick but how do that best? How do I do that at all?
0
 
LVL 5

Expert Comment

by:hkamal
Comment Utility
It's a good idea to use a unique key; sound like a composite key of coordiantes fits the bill here since no two locations can have the same coordinates (I don't know the implementation, but I assume as much)

You can then create a separate nonunique (nonclustered) index on PlaceName
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
Comment Utility
>>WHERE PlaceName LIKE '%placename%
you cannot speed that query up unless you remove the first %.
Indexes are not of any help here, unless you start using full-text indexing.

>> I tried making PlaceName a unique key but
Of course, don't use a UNIQUE key, but a simple key (without unique property). To locate (after the index creation) which entries are duplicate:
select PlaceName, count(*)
from YourTable
group by PlaceName
having count(*)>1

CHeers

0
 

Author Comment

by:pgkooijman
Comment Utility
Have divided the points according to the answers that gave the solutions I needed. Will start a new question with the remaining open issues. This thread is getting a bit chaotic. Thanks for all the help!
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Nowadays, some of developer are too much worried about data. Who is using data, who is updating it etc. etc. Because, data is more costlier in term of money and information. So security of data is focusing concern in days. Lets' understand the Au…
Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now