asked on

better way than a cursor on a >700k row?

In my zips table i have type 'CA' with 765344 rows, i need to loop through each row and see whether in the table with type 'CA-25' there is a zip which is less than 25 miles from the zip in the cursor, if not insert into the table the record in the cursor but as a type 'CA-25'. When the cursor loops to the next record from the 765344 list, it will evaluate any inserted record from previous iteration. But the only problem with the cursor is that the performance is not good and it takes about 9 hours to run maybe. so I wonder whether there are other better way to handle this in T-SQL?
Thanks a lot!

declare @c_id numeric(18,0)
declare @c_lat float
declare @c_lng float
declare @c_state varchar(20)
declare zc CURSOR FOR
	select id, latitude, longitude, state from zips (nolock) where zip_type = 'CA'
open zc
 
fetch next from zc into @c_id, @c_lat, @c_lng, @c_state
 
-- iterate over "main" zip code list
while @@FETCH_STATUS = 0
begin
	declare @t_id numeric(18,0)
 
	-- check if there are any zip codes in the new list that are <= 25 miles away from the current iterator zip
	set @t_id = (select top 1 id from zips (nolock)
				where zip_type = 'CA-25' and state = @c_state and 
				ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1) <= 25)
 
	-- add the iterator zip into the new list if no matches are found
	if @t_id is null
	begin
		insert into zips (zip, city, state, latitude, longitude, zip_type, wban_latitude, wban_longitude)
		select zip, city, state, latitude, longitude, 'CA-25', wban_latitude, wban_longitude from zips where id = @c_id
    end
 
	fetch next from zc into @c_id, @c_lat, @c_lng, @c_state 
end 
close zc 
deallocate zc

Open in new window

brad2575

Could you do it something like this (logically):

Create a while loop that checks if a variable is set or not. This variable will be set if any records were updated and you continue to loop until there are no records updated.

Run the query but update ALL that meet the criteria at once for the data that exists in the zip table at that time. If any updates occur set variable = "Y"

Then the loop (explained above) will loop again if variable = "Y". And then re-run the query again but this time running on all the existing data that was just inserted for the loop above.

Then if any are updated the second loop set the variable = "Y". Continue until no updates are made anymore and exit the loop.

This may still take a bunch of loops but not nearly as many as you have above plus it will do them in bulk inserts vs 1 offs.

Let me know if you have any questions.

qinyan

ASKER

Run the query but update ALL that meet the criteria at once for the data that exists in the zip table at that time.

This would copy all 765344 'CA' type records to be 'CA-25' zips, since at the beginning there is no 'CA-25' records in the table, correct? This is not what i want.

also by update you mean insert?

Thanks for the reply!

lundnak

Try the following. Please excuse any syntax errors; I don't have SSMS installed on this machine.
Also, make sure that you have a compound index on zip_type and state.

insert into zips (zip,
                  city, 
                  state,
                  latitude,
                  longitude,
                  zip_type,
                  wban_latitude,
                  wban_longitude)
select z.zip,
       z.city,
       z.state,
       z.latitude,
       z.longitude,
       zip_type = 'CA-25',
       z.wban_latitude,
       z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
  not exists (select 1
              from zips z2 (nolock)
              where z2.zip_type = 'CA-25'
                and z2.state = z.state
                and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                 POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)

Open in new window

qinyan

ASKER

Thanks for the help! I ran it but it still give me 765344 rows so it's still copying the 765344 CA type records to become 'CA-25'.

lundnak

Hmmm... Try with the following change.
and z2.zip <> z.zip

If this doesn't work, can you post some example data (10 rows or more would suffice). This will give me something to test against.

One other thing...are there any CA-25 records in the table when you first run the query. If not, then it will always copy every CA record the first time.

insert into zips (zip,
                  city, 
                  state,
                  latitude,
                  longitude,
                  zip_type,
                  wban_latitude,
                  wban_longitude)
select z.zip,
       z.city,
       z.state,
       z.latitude,
       z.longitude,
       zip_type = 'CA-25',
       z.wban_latitude,
       z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
  not exists (select 1
              from zips z2 (nolock)
              where z2.zip_type = 'CA-25'
                and z2.zip <> z.zip
                and z2.state = z.state
                and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                 POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)

Open in new window

qinyan

ASKER

there are no records in the table with type 'CA-25' initially, that's why i use a cursor to do row by row processing. A straight query will always copy the whole 765344 records as it executes only once.

lundnak

I understand now. Is this a SQL 2005 database? I'll get you an answer tonight.

lundnak

I think I got it. This worked on my tests.
A couple things...
1) This query assumes your cursor loops through the records in the order of the id field
declare zc CURSOR FOR
select id, latitude, longitude, state from zips (nolock) where zip_type = 'CA'
There is no order by field, so I am assuming the ID field.

2) An index on zip_type, state, and id would be good for this query.

Sorry for the syntax error in my previous example. I fixed it here.

insert into zips (zip,
                  city, 
                  state,
                  latitude,
                  longitude,
                  zip_type,
                  wban_latitude,
                  wban_longitude)
select z.zip,
       z.city,
       z.state,
       z.latitude,
       z.longitude,
       zip_type = 'CA-25',
       z.wban_latitude,
       z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
  and not exists (select 1
              from zips z2 (nolock)
              where z2.zip_type in ('CA','CA-25')
                and z2.state = z.state
                and z2.id < z.id -- check against previous rows.
                and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                 POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)

Open in new window

qinyan

ASKER

i tried this and it doesn't finish after more than one hour so i stopped it.

lundnak

Could you provide me the indexes that are defined on the table?

qinyan

ASKER

clustered on id and non-clustered on state, zip and zip_type.

lundnak

This is a tough one. Would be possible to alter the indexes?

See the code changes below.

Also, how long is acceptable for this query to run>

--create an index on zip_type,state,zip,latitude,longitude
--Include the index in an index hint.
 
  and not exists (select 1
                  from zips z2 with INDEX(NEWINDEX)
                  where z2.zip_type in ('CA','CA-25')
                    and z2.state = z.state
                    and z2.id < z.id -- check against previous rows.
                   and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                     POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)

Open in new window

qinyan

ASKER

ok i did this but this query is still running now after over 40 mins.

lundnak

How quickly are you looking for it to run? How many minutes?

qinyan

ASKER

I would say no more than 60 mins.
well, the result is different than the cursor. By running the cursor i got 970 rows for ca-25 but got 695 such rows from the query.

lundnak

The difference between the # of rows returned from cursor versus the query is likely attributed to no order by clause in the cursor definition.

Did you run the query to full execution? How long did it run?

Would you be able to paste explain plans for the following queries?

select z.zip,
       z.city,
       z.state,
       z.latitude,
       z.longitude,
       zip_type = 'CA-25',
       z.wban_latitude,
       z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
  and not exists (select 1
              from zips z2 (nolock)
              where z2.zip_type in ('CA','CA-25')
                and z2.state = z.state
                and z2.id < z.id -- check against previous rows.
                and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                 POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
 
 
 
 
select top 1 id from zips (nolock)
where zip_type = 'CA-25' and state = @c_state and ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1)

Open in new window

qinyan

ASKER

i remember the query run for about 3 hrs. then how can i get the same result from both cursor and query?
actually to my suprise the cursor was fast, only couple minutes, if no ca-25 records in the table to begin with. if there are alreay several ca-25 records in the table, then the cursor runs for several hours too and got different result than if there are no ca-25 records in the table, don't know why.

lundnak

I would it is the performance of the following statement.

set @t_id = (select top 1 id from zips (nolock)
where zip_type = 'CA-25' and state = @c_state and
ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1) <= 25)

By the way is this a stored procedure?

ASKER CERTIFIED SOLUTION

qinyan

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

lundnak

Could you get an explain plan for the select statement. I wonder what indexes it is using.

qinyan

ASKER

you mean the execution plan for
set @t_id = (select top 1 id from zips (nolock)
where zip_type = 'CA-25' and state = @c_state and
ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1) <= 25)

it's in a cursor how can i get the execution plan?

RHADMIN

Try SQLCLR instead, it can be faster when dealing with cursor:
http://msdn.microsoft.com/en-us/library/ms345136.aspx
http://www.sqlskills.com/resources/Whitepapers/SQL%20Server%20DBA%20Guide%20to%20SQLCLR.htm

better way than a cursor on a &gt;700k row?

better way than a cursor on a >700k row?