better way than a cursor on a >700k row?

In my zips table i have type 'CA' with 765344 rows, i need to loop through each row and see whether in the table with type 'CA-25' there is a zip which is less than 25 miles from the zip in the cursor, if not insert into the table the record in the cursor but as a type 'CA-25'. When the cursor loops to the next record from the 765344 list, it will evaluate any inserted record from previous iteration. But the only problem with the cursor is that the performance is not good and it takes about 9 hours to run maybe. so I wonder whether there are other better way to handle this in T-SQL?
Thanks a lot!


declare @c_id numeric(18,0)
declare @c_lat float
declare @c_lng float
declare @c_state varchar(20)
declare zc CURSOR FOR
	select id, latitude, longitude, state from zips (nolock) where zip_type = 'CA'
open zc
 
fetch next from zc into @c_id, @c_lat, @c_lng, @c_state
 
-- iterate over "main" zip code list
while @@FETCH_STATUS = 0
begin
	declare @t_id numeric(18,0)
 
	-- check if there are any zip codes in the new list that are <= 25 miles away from the current iterator zip
	set @t_id = (select top 1 id from zips (nolock)
				where zip_type = 'CA-25' and state = @c_state and 
				ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1) <= 25)
 
	-- add the iterator zip into the new list if no matches are found
	if @t_id is null
	begin
		insert into zips (zip, city, state, latitude, longitude, zip_type, wban_latitude, wban_longitude)
		select zip, city, state, latitude, longitude, 'CA-25', wban_latitude, wban_longitude from zips where id = @c_id
    end
 
	fetch next from zc into @c_id, @c_lat, @c_lng, @c_state 
end 
close zc 
deallocate zc

Open in new window

qinyanAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

brad2575Commented:
Could you do it something like this (logically):

Create a while loop that checks if a variable is set or not.  This variable will be set if any records were updated and you continue to loop until there are no records updated.

Run the query but update ALL that meet the criteria at once for the data that exists in the zip table at that time.  If any updates occur set variable = "Y"

Then the loop (explained above) will loop again if variable = "Y".  And then re-run the query again but this time running on all the existing data that was just inserted for the loop above.

Then if any are updated the second loop set the variable = "Y".  Continue until no updates are made anymore and exit the loop.


This may still take a bunch of loops but not nearly as many as you have above plus it will do them in bulk inserts vs 1 offs.

Let me know if you have any questions.

0
qinyanAuthor Commented:
Run the query but update ALL that meet the criteria at once for the data that exists in the zip table at that time.
 
This would copy all 765344 'CA' type records to be 'CA-25' zips, since at the beginning there is no 'CA-25' records in the table, correct? This is not what i want.

also by update you mean insert?

Thanks for the reply!
0
lundnakCommented:
Try the following.  Please excuse any syntax errors; I don't have SSMS installed on this machine.
Also, make sure that you have a compound index on zip_type and state.
insert into zips (zip,
                  city, 
                  state,
                  latitude,
                  longitude,
                  zip_type,
                  wban_latitude,
                  wban_longitude)
select z.zip,
       z.city,
       z.state,
       z.latitude,
       z.longitude,
       zip_type = 'CA-25',
       z.wban_latitude,
       z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
  not exists (select 1
              from zips z2 (nolock)
              where z2.zip_type = 'CA-25'
                and z2.state = z.state
                and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                 POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
 

Open in new window

0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

qinyanAuthor Commented:
Thanks for the help! I ran it but it still give me 765344 rows so it's still copying the 765344 CA type records to become 'CA-25'.
0
lundnakCommented:
Hmmm...  Try with the following change.
                and z2.zip <> z.zip

If this doesn't work, can you post some example data (10 rows or more would suffice).  This will give me something to test against.

One other thing...are there any CA-25 records in the table when you first run the query.  If not, then it will always copy every CA record the first time.
insert into zips (zip,
                  city, 
                  state,
                  latitude,
                  longitude,
                  zip_type,
                  wban_latitude,
                  wban_longitude)
select z.zip,
       z.city,
       z.state,
       z.latitude,
       z.longitude,
       zip_type = 'CA-25',
       z.wban_latitude,
       z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
  not exists (select 1
              from zips z2 (nolock)
              where z2.zip_type = 'CA-25'
                and z2.zip <> z.zip
                and z2.state = z.state
                and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                 POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
 

Open in new window

0
qinyanAuthor Commented:
there are no records in the table with type 'CA-25' initially, that's why i use a cursor to do row by row processing. A straight query will always copy the whole 765344 records as it executes only once.
0
lundnakCommented:
I understand now.  Is this a SQL 2005 database? I'll get you an answer tonight.
0
lundnakCommented:
I think I got it.  This worked on my tests.
A couple things...
1) This query assumes your cursor loops through the records in the order of the id field
       declare zc CURSOR FOR
            select id, latitude, longitude, state from zips (nolock) where zip_type = 'CA'
    There is no order by field, so I am assuming the ID field.

2) An index on zip_type, state, and id would be good for this query.

Sorry for the syntax error in my previous example.  I fixed it here.

insert into zips (zip,
                  city, 
                  state,
                  latitude,
                  longitude,
                  zip_type,
                  wban_latitude,
                  wban_longitude)
select z.zip,
       z.city,
       z.state,
       z.latitude,
       z.longitude,
       zip_type = 'CA-25',
       z.wban_latitude,
       z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
  and not exists (select 1
              from zips z2 (nolock)
              where z2.zip_type in ('CA','CA-25')
                and z2.state = z.state
                and z2.id < z.id -- check against previous rows.
                and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                 POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
 

Open in new window

0
qinyanAuthor Commented:
i tried this and it doesn't finish after more than one hour so i stopped it.
0
lundnakCommented:
Could you provide me the indexes that are defined on the table?
0
qinyanAuthor Commented:
clustered on id and non-clustered on state, zip and zip_type.
0
lundnakCommented:
This is a tough one.  Would be possible to alter the indexes?

See the code changes below.

Also, how long is acceptable for this query to run>
--create an index on zip_type,state,zip,latitude,longitude
--Include the index in an index hint.
 
  and not exists (select 1
                  from zips z2 with INDEX(NEWINDEX)
                  where z2.zip_type in ('CA','CA-25')
                    and z2.state = z.state
                    and z2.id < z.id -- check against previous rows.
                   and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                     POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
 

Open in new window

0
qinyanAuthor Commented:
ok i did this but this query is still running now after over 40 mins.
0
lundnakCommented:
How quickly are you looking for it to run?  How many minutes?
0
qinyanAuthor Commented:
I would say no more than 60 mins.
well, the result is different than the cursor. By running the cursor i got 970 rows for ca-25 but got 695 such rows from the query.
0
lundnakCommented:
The difference between the # of rows returned from cursor versus the query is likely attributed to no order by clause in the cursor definition.

Did you run the query to full execution?  How long did it run?

Would you be able to paste explain plans for the following queries?

select z.zip,
       z.city,
       z.state,
       z.latitude,
       z.longitude,
       zip_type = 'CA-25',
       z.wban_latitude,
       z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
  and not exists (select 1
              from zips z2 (nolock)
              where z2.zip_type in ('CA','CA-25')
                and z2.state = z.state
                and z2.id < z.id -- check against previous rows.
                and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                 POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
 
 
 
 
select top 1 id from zips (nolock)
where zip_type = 'CA-25' and state = @c_state and ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1)

Open in new window

0
qinyanAuthor Commented:
i remember the query run for about 3 hrs. then how can i get the same result from both cursor and query?
actually to my suprise the cursor was fast, only couple minutes, if no ca-25 records in the table to begin with. if there are alreay several ca-25 records in the table, then the cursor runs for several hours too and got different result than if there are no ca-25 records in the table, don't know why.
0
lundnakCommented:
I would it is the performance of the following statement.

       set @t_id = (select top 1 id from zips (nolock)
                                where zip_type = 'CA-25' and state = @c_state and
                                ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1) <= 25)


By the way is this a stored procedure?
0
qinyanAuthor Commented:
No it's just a one-off thing. ok, is there a way to modify that piece of code to improve the performance?
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
lundnakCommented:
Could you get an explain plan for the select statement.  I wonder what indexes it is using.
0
qinyanAuthor Commented:
you mean the execution plan for
       set @t_id = (select top 1 id from zips (nolock)
                                where zip_type = 'CA-25' and state = @c_state and
                                ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1) <= 25)

it's in a cursor how can i get the execution plan?
0
RHADMINCommented:
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.