qinyan
asked on
better way than a cursor on a >700k row?
In my zips table i have type 'CA' with 765344 rows, i need to loop through each row and see whether in the table with type 'CA-25' there is a zip which is less than 25 miles from the zip in the cursor, if not insert into the table the record in the cursor but as a type 'CA-25'. When the cursor loops to the next record from the 765344 list, it will evaluate any inserted record from previous iteration. But the only problem with the cursor is that the performance is not good and it takes about 9 hours to run maybe. so I wonder whether there are other better way to handle this in T-SQL?
Thanks a lot!
Thanks a lot!
declare @c_id numeric(18,0)
declare @c_lat float
declare @c_lng float
declare @c_state varchar(20)
declare zc CURSOR FOR
select id, latitude, longitude, state from zips (nolock) where zip_type = 'CA'
open zc
fetch next from zc into @c_id, @c_lat, @c_lng, @c_state
-- iterate over "main" zip code list
while @@FETCH_STATUS = 0
begin
declare @t_id numeric(18,0)
-- check if there are any zip codes in the new list that are <= 25 miles away from the current iterator zip
set @t_id = (select top 1 id from zips (nolock)
where zip_type = 'CA-25' and state = @c_state and
ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1) <= 25)
-- add the iterator zip into the new list if no matches are found
if @t_id is null
begin
insert into zips (zip, city, state, latitude, longitude, zip_type, wban_latitude, wban_longitude)
select zip, city, state, latitude, longitude, 'CA-25', wban_latitude, wban_longitude from zips where id = @c_id
end
fetch next from zc into @c_id, @c_lat, @c_lng, @c_state
end
close zc
deallocate zc
ASKER
Run the query but update ALL that meet the criteria at once for the data that exists in the zip table at that time.
This would copy all 765344 'CA' type records to be 'CA-25' zips, since at the beginning there is no 'CA-25' records in the table, correct? This is not what i want.
also by update you mean insert?
Thanks for the reply!
This would copy all 765344 'CA' type records to be 'CA-25' zips, since at the beginning there is no 'CA-25' records in the table, correct? This is not what i want.
also by update you mean insert?
Thanks for the reply!
Try the following. Please excuse any syntax errors; I don't have SSMS installed on this machine.
Also, make sure that you have a compound index on zip_type and state.
Also, make sure that you have a compound index on zip_type and state.
insert into zips (zip,
city,
state,
latitude,
longitude,
zip_type,
wban_latitude,
wban_longitude)
select z.zip,
z.city,
z.state,
z.latitude,
z.longitude,
zip_type = 'CA-25',
z.wban_latitude,
z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
not exists (select 1
from zips z2 (nolock)
where z2.zip_type = 'CA-25'
and z2.state = z.state
and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) +
POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
ASKER
Thanks for the help! I ran it but it still give me 765344 rows so it's still copying the 765344 CA type records to become 'CA-25'.
Hmmm... Try with the following change.
and z2.zip <> z.zip
If this doesn't work, can you post some example data (10 rows or more would suffice). This will give me something to test against.
One other thing...are there any CA-25 records in the table when you first run the query. If not, then it will always copy every CA record the first time.
and z2.zip <> z.zip
If this doesn't work, can you post some example data (10 rows or more would suffice). This will give me something to test against.
One other thing...are there any CA-25 records in the table when you first run the query. If not, then it will always copy every CA record the first time.
insert into zips (zip,
city,
state,
latitude,
longitude,
zip_type,
wban_latitude,
wban_longitude)
select z.zip,
z.city,
z.state,
z.latitude,
z.longitude,
zip_type = 'CA-25',
z.wban_latitude,
z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
not exists (select 1
from zips z2 (nolock)
where z2.zip_type = 'CA-25'
and z2.zip <> z.zip
and z2.state = z.state
and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) +
POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
ASKER
there are no records in the table with type 'CA-25' initially, that's why i use a cursor to do row by row processing. A straight query will always copy the whole 765344 records as it executes only once.
I understand now. Is this a SQL 2005 database? I'll get you an answer tonight.
I think I got it. This worked on my tests.
A couple things...
1) This query assumes your cursor loops through the records in the order of the id field
declare zc CURSOR FOR
select id, latitude, longitude, state from zips (nolock) where zip_type = 'CA'
There is no order by field, so I am assuming the ID field.
2) An index on zip_type, state, and id would be good for this query.
Sorry for the syntax error in my previous example. I fixed it here.
A couple things...
1) This query assumes your cursor loops through the records in the order of the id field
declare zc CURSOR FOR
select id, latitude, longitude, state from zips (nolock) where zip_type = 'CA'
There is no order by field, so I am assuming the ID field.
2) An index on zip_type, state, and id would be good for this query.
Sorry for the syntax error in my previous example. I fixed it here.
insert into zips (zip,
city,
state,
latitude,
longitude,
zip_type,
wban_latitude,
wban_longitude)
select z.zip,
z.city,
z.state,
z.latitude,
z.longitude,
zip_type = 'CA-25',
z.wban_latitude,
z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
and not exists (select 1
from zips z2 (nolock)
where z2.zip_type in ('CA','CA-25')
and z2.state = z.state
and z2.id < z.id -- check against previous rows.
and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) +
POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
ASKER
i tried this and it doesn't finish after more than one hour so i stopped it.
Could you provide me the indexes that are defined on the table?
ASKER
clustered on id and non-clustered on state, zip and zip_type.
This is a tough one. Would be possible to alter the indexes?
See the code changes below.
Also, how long is acceptable for this query to run>
See the code changes below.
Also, how long is acceptable for this query to run>
--create an index on zip_type,state,zip,latitude,longitude
--Include the index in an index hint.
and not exists (select 1
from zips z2 with INDEX(NEWINDEX)
where z2.zip_type in ('CA','CA-25')
and z2.state = z.state
and z2.id < z.id -- check against previous rows.
and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) +
POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
ASKER
ok i did this but this query is still running now after over 40 mins.
How quickly are you looking for it to run? How many minutes?
ASKER
I would say no more than 60 mins.
well, the result is different than the cursor. By running the cursor i got 970 rows for ca-25 but got 695 such rows from the query.
well, the result is different than the cursor. By running the cursor i got 970 rows for ca-25 but got 695 such rows from the query.
The difference between the # of rows returned from cursor versus the query is likely attributed to no order by clause in the cursor definition.
Did you run the query to full execution? How long did it run?
Would you be able to paste explain plans for the following queries?
Did you run the query to full execution? How long did it run?
Would you be able to paste explain plans for the following queries?
select z.zip,
z.city,
z.state,
z.latitude,
z.longitude,
zip_type = 'CA-25',
z.wban_latitude,
z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
and not exists (select 1
from zips z2 (nolock)
where z2.zip_type in ('CA','CA-25')
and z2.state = z.state
and z2.id < z.id -- check against previous rows.
and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) +
POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
select top 1 id from zips (nolock)
where zip_type = 'CA-25' and state = @c_state and ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1)
ASKER
i remember the query run for about 3 hrs. then how can i get the same result from both cursor and query?
actually to my suprise the cursor was fast, only couple minutes, if no ca-25 records in the table to begin with. if there are alreay several ca-25 records in the table, then the cursor runs for several hours too and got different result than if there are no ca-25 records in the table, don't know why.
actually to my suprise the cursor was fast, only couple minutes, if no ca-25 records in the table to begin with. if there are alreay several ca-25 records in the table, then the cursor runs for several hours too and got different result than if there are no ca-25 records in the table, don't know why.
I would it is the performance of the following statement.
set @t_id = (select top 1 id from zips (nolock)
where zip_type = 'CA-25' and state = @c_state and
ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1) <= 25)
By the way is this a stored procedure?
set @t_id = (select top 1 id from zips (nolock)
where zip_type = 'CA-25' and state = @c_state and
ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1) <= 25)
By the way is this a stored procedure?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Could you get an explain plan for the select statement. I wonder what indexes it is using.
ASKER
you mean the execution plan for
set @t_id = (select top 1 id from zips (nolock)
where zip_type = 'CA-25' and state = @c_state and
ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1) <= 25)
it's in a cursor how can i get the execution plan?
set @t_id = (select top 1 id from zips (nolock)
where zip_type = 'CA-25' and state = @c_state and
ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1) <= 25)
it's in a cursor how can i get the execution plan?
Try SQLCLR instead, it can be faster when dealing with cursor:
http://msdn.microsoft.com/ en-us/libr ary/ms3451 36.aspx
http://www.sqlskills.com/r esources/W hitepapers /SQL%20Ser ver%20DBA% 20Guide%20 to%20SQLCL R.htm
http://msdn.microsoft.com/
http://www.sqlskills.com/r
Create a while loop that checks if a variable is set or not. This variable will be set if any records were updated and you continue to loop until there are no records updated.
Run the query but update ALL that meet the criteria at once for the data that exists in the zip table at that time. If any updates occur set variable = "Y"
Then the loop (explained above) will loop again if variable = "Y". And then re-run the query again but this time running on all the existing data that was just inserted for the loop above.
Then if any are updated the second loop set the variable = "Y". Continue until no updates are made anymore and exit the loop.
This may still take a bunch of loops but not nearly as many as you have above plus it will do them in bulk inserts vs 1 offs.
Let me know if you have any questions.