?
Solved

better way than a cursor on a >700k row?

Posted on 2008-11-11
22
Medium Priority
?
260 Views
Last Modified: 2012-05-05
In my zips table i have type 'CA' with 765344 rows, i need to loop through each row and see whether in the table with type 'CA-25' there is a zip which is less than 25 miles from the zip in the cursor, if not insert into the table the record in the cursor but as a type 'CA-25'. When the cursor loops to the next record from the 765344 list, it will evaluate any inserted record from previous iteration. But the only problem with the cursor is that the performance is not good and it takes about 9 hours to run maybe. so I wonder whether there are other better way to handle this in T-SQL?
Thanks a lot!


declare @c_id numeric(18,0)
declare @c_lat float
declare @c_lng float
declare @c_state varchar(20)
declare zc CURSOR FOR
	select id, latitude, longitude, state from zips (nolock) where zip_type = 'CA'
open zc
 
fetch next from zc into @c_id, @c_lat, @c_lng, @c_state
 
-- iterate over "main" zip code list
while @@FETCH_STATUS = 0
begin
	declare @t_id numeric(18,0)
 
	-- check if there are any zip codes in the new list that are <= 25 miles away from the current iterator zip
	set @t_id = (select top 1 id from zips (nolock)
				where zip_type = 'CA-25' and state = @c_state and 
				ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1) <= 25)
 
	-- add the iterator zip into the new list if no matches are found
	if @t_id is null
	begin
		insert into zips (zip, city, state, latitude, longitude, zip_type, wban_latitude, wban_longitude)
		select zip, city, state, latitude, longitude, 'CA-25', wban_latitude, wban_longitude from zips where id = @c_id
    end
 
	fetch next from zc into @c_id, @c_lat, @c_lng, @c_state 
end 
close zc 
deallocate zc

Open in new window

0
Comment
Question by:qinyan
  • 10
  • 10
  • +1
22 Comments
 
LVL 16

Expert Comment

by:brad2575
ID: 22932447
Could you do it something like this (logically):

Create a while loop that checks if a variable is set or not.  This variable will be set if any records were updated and you continue to loop until there are no records updated.

Run the query but update ALL that meet the criteria at once for the data that exists in the zip table at that time.  If any updates occur set variable = "Y"

Then the loop (explained above) will loop again if variable = "Y".  And then re-run the query again but this time running on all the existing data that was just inserted for the loop above.

Then if any are updated the second loop set the variable = "Y".  Continue until no updates are made anymore and exit the loop.


This may still take a bunch of loops but not nearly as many as you have above plus it will do them in bulk inserts vs 1 offs.

Let me know if you have any questions.

0
 

Author Comment

by:qinyan
ID: 22933564
Run the query but update ALL that meet the criteria at once for the data that exists in the zip table at that time.
 
This would copy all 765344 'CA' type records to be 'CA-25' zips, since at the beginning there is no 'CA-25' records in the table, correct? This is not what i want.

also by update you mean insert?

Thanks for the reply!
0
 
LVL 7

Expert Comment

by:lundnak
ID: 22936153
Try the following.  Please excuse any syntax errors; I don't have SSMS installed on this machine.
Also, make sure that you have a compound index on zip_type and state.
insert into zips (zip,
                  city, 
                  state,
                  latitude,
                  longitude,
                  zip_type,
                  wban_latitude,
                  wban_longitude)
select z.zip,
       z.city,
       z.state,
       z.latitude,
       z.longitude,
       zip_type = 'CA-25',
       z.wban_latitude,
       z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
  not exists (select 1
              from zips z2 (nolock)
              where z2.zip_type = 'CA-25'
                and z2.state = z.state
                and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                 POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
 

Open in new window

0
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

 

Author Comment

by:qinyan
ID: 22940475
Thanks for the help! I ran it but it still give me 765344 rows so it's still copying the 765344 CA type records to become 'CA-25'.
0
 
LVL 7

Expert Comment

by:lundnak
ID: 22946457
Hmmm...  Try with the following change.
                and z2.zip <> z.zip

If this doesn't work, can you post some example data (10 rows or more would suffice).  This will give me something to test against.

One other thing...are there any CA-25 records in the table when you first run the query.  If not, then it will always copy every CA record the first time.
insert into zips (zip,
                  city, 
                  state,
                  latitude,
                  longitude,
                  zip_type,
                  wban_latitude,
                  wban_longitude)
select z.zip,
       z.city,
       z.state,
       z.latitude,
       z.longitude,
       zip_type = 'CA-25',
       z.wban_latitude,
       z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
  not exists (select 1
              from zips z2 (nolock)
              where z2.zip_type = 'CA-25'
                and z2.zip <> z.zip
                and z2.state = z.state
                and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                 POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
 

Open in new window

0
 

Author Comment

by:qinyan
ID: 22950076
there are no records in the table with type 'CA-25' initially, that's why i use a cursor to do row by row processing. A straight query will always copy the whole 765344 records as it executes only once.
0
 
LVL 7

Expert Comment

by:lundnak
ID: 22956479
I understand now.  Is this a SQL 2005 database? I'll get you an answer tonight.
0
 
LVL 7

Expert Comment

by:lundnak
ID: 22956814
I think I got it.  This worked on my tests.
A couple things...
1) This query assumes your cursor loops through the records in the order of the id field
       declare zc CURSOR FOR
            select id, latitude, longitude, state from zips (nolock) where zip_type = 'CA'
    There is no order by field, so I am assuming the ID field.

2) An index on zip_type, state, and id would be good for this query.

Sorry for the syntax error in my previous example.  I fixed it here.

insert into zips (zip,
                  city, 
                  state,
                  latitude,
                  longitude,
                  zip_type,
                  wban_latitude,
                  wban_longitude)
select z.zip,
       z.city,
       z.state,
       z.latitude,
       z.longitude,
       zip_type = 'CA-25',
       z.wban_latitude,
       z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
  and not exists (select 1
              from zips z2 (nolock)
              where z2.zip_type in ('CA','CA-25')
                and z2.state = z.state
                and z2.id < z.id -- check against previous rows.
                and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                 POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
 

Open in new window

0
 

Author Comment

by:qinyan
ID: 22962177
i tried this and it doesn't finish after more than one hour so i stopped it.
0
 
LVL 7

Expert Comment

by:lundnak
ID: 22963626
Could you provide me the indexes that are defined on the table?
0
 

Author Comment

by:qinyan
ID: 22964124
clustered on id and non-clustered on state, zip and zip_type.
0
 
LVL 7

Expert Comment

by:lundnak
ID: 22964342
This is a tough one.  Would be possible to alter the indexes?

See the code changes below.

Also, how long is acceptable for this query to run>
--create an index on zip_type,state,zip,latitude,longitude
--Include the index in an index hint.
 
  and not exists (select 1
                  from zips z2 with INDEX(NEWINDEX)
                  where z2.zip_type in ('CA','CA-25')
                    and z2.state = z.state
                    and z2.id < z.id -- check against previous rows.
                   and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                     POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
 

Open in new window

0
 

Author Comment

by:qinyan
ID: 22999141
ok i did this but this query is still running now after over 40 mins.
0
 
LVL 7

Expert Comment

by:lundnak
ID: 23000299
How quickly are you looking for it to run?  How many minutes?
0
 

Author Comment

by:qinyan
ID: 23007603
I would say no more than 60 mins.
well, the result is different than the cursor. By running the cursor i got 970 rows for ca-25 but got 695 such rows from the query.
0
 
LVL 7

Expert Comment

by:lundnak
ID: 23013226
The difference between the # of rows returned from cursor versus the query is likely attributed to no order by clause in the cursor definition.

Did you run the query to full execution?  How long did it run?

Would you be able to paste explain plans for the following queries?

select z.zip,
       z.city,
       z.state,
       z.latitude,
       z.longitude,
       zip_type = 'CA-25',
       z.wban_latitude,
       z.wban_longitude
from zips z (nolock)
where z.zip_type = 'CA'
  and not exists (select 1
              from zips z2 (nolock)
              where z2.zip_type in ('CA','CA-25')
                and z2.state = z.state
                and z2.id < z.id -- check against previous rows.
                and ROUND( SQRT( POWER((69.1 * (z.latitude - z2.latitude)), 2) + 
                                 POWER((53 * (z.longitude - z2.longitude)), 2)), 1) <= 25)
 
 
 
 
select top 1 id from zips (nolock)
where zip_type = 'CA-25' and state = @c_state and ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1)

Open in new window

0
 

Author Comment

by:qinyan
ID: 23017549
i remember the query run for about 3 hrs. then how can i get the same result from both cursor and query?
actually to my suprise the cursor was fast, only couple minutes, if no ca-25 records in the table to begin with. if there are alreay several ca-25 records in the table, then the cursor runs for several hours too and got different result than if there are no ca-25 records in the table, don't know why.
0
 
LVL 7

Expert Comment

by:lundnak
ID: 23026732
I would it is the performance of the following statement.

       set @t_id = (select top 1 id from zips (nolock)
                                where zip_type = 'CA-25' and state = @c_state and
                                ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1) <= 25)


By the way is this a stored procedure?
0
 

Accepted Solution

by:
qinyan earned 0 total points
ID: 23042304
No it's just a one-off thing. ok, is there a way to modify that piece of code to improve the performance?
0
 
LVL 7

Expert Comment

by:lundnak
ID: 23046359
Could you get an explain plan for the select statement.  I wonder what indexes it is using.
0
 

Author Comment

by:qinyan
ID: 23078069
you mean the execution plan for
       set @t_id = (select top 1 id from zips (nolock)
                                where zip_type = 'CA-25' and state = @c_state and
                                ROUND( SQRT( POWER((69.1 * (@c_lat - latitude)), 2) + POWER((53 * (@c_lng - longitude)), 2)), 1) <= 25)

it's in a cursor how can i get the execution plan?
0
 
LVL 1

Expert Comment

by:RHADMIN
ID: 23151756
0

Featured Post

Get your Disaster Recovery as a Service basics

Disaster Recovery as a Service is one go-to solution that revolutionizes DR planning. Implementing DRaaS could be an efficient process, easily accessible to non-DR experts. Learn about monitoring, testing, executing failovers and failbacks to ensure a "healthy" DR environment.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In part one, we reviewed the prerequisites required for installing SQL Server vNext. In this part we will explore how to install Microsoft's SQL Server on Ubuntu 16.04.
Ready to get certified? Check out some courses that help you prepare for third-party exams.
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.
Suggested Courses

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question