Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

UPDATING 300 Million Rows With A Cursor Is There A Better Method? 500 POINTS!

Posted on 2004-08-30
13
Medium Priority
?
509 Views
Last Modified: 2006-11-17
I have to update a table with 300 million rows, and there is no unique ID on the table. I have written the cursor below to do the update:

CREATE PROCEDURE dbo.usp_CleanADFTradeline

AS


DECLARE @dynCR_ID INT
DECLARE Curs_tblTasks_Update CURSOR FOR
 SELECT DISTINCT CR_ID AS dynCR_ID FROM dbo.tblTasks
       
OPEN Curs_tblTasks_Update
FETCH NEXT FROM Curs_tblTasks_Update INTO @dynCR_ID

WHILE (@@FETCH_STATUS = 0)
     BEGIN
     UPDATE tblTasks
      SET Task_Num = CASE
      WHEN DATALENGTH(Task_Num)= 9 THEN  SUBSTRING(Task_Num,1,6)
      WHEN DATALENGTH(Task_Num)> 9 THEN  SUBSTRING(Task_Num,1,8)
      ELSE Task_Num
      END
      WHERE CR_ID = @dynCR_ID

     FETCH NEXT FROM Curs_tblTasks_Update INTO @dynCR_ID
     END
CLOSE Curs_tblTasks_Update
DEALLOCATE Curs_tblTasks_Update

GO

Is there a better code that will do this without the performance headache of cursors?

DeMyu
0
Comment
Question by:DeMyu
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 3
  • +1
13 Comments
 
LVL 8

Expert Comment

by:MartinCMS
ID: 11934306
I don't understand why you're using a cursor for what you're trying to do!

you should be able to just run you Update query straightup.

    UPDATE tblTasks
     SET Task_Num = CASE
     WHEN DATALENGTH(Task_Num)= 9 THEN  SUBSTRING(Task_Num,1,6)
     WHEN DATALENGTH(Task_Num)> 9 THEN  SUBSTRING(Task_Num,1,8)
     ELSE Task_Num
     END

0
 

Author Comment

by:DeMyu
ID: 11934379
I am trying to run the update in batches and each CR_ID  is a batch. Out of the 300 million records there are 13 million distinct CR_ID.

DeMyu
0
 
LVL 8

Expert Comment

by:MartinCMS
ID: 11934454
The Cursor you're using run the same as an Update Statement without the Where Clause.  Unless you want to run one distinct CR_ID at the time.... then

 UPDATE tblTasks
     SET Task_Num = CASE
     WHEN DATALENGTH(Task_Num)= 9 THEN  SUBSTRING(Task_Num,1,6)
     WHEN DATALENGTH(Task_Num)> 9 THEN  SUBSTRING(Task_Num,1,8)
     ELSE Task_Num
     END
Where CR_ID = InputyourCR_ID


I guest I am not 100% understanding what you're trying to accomplish here.  Further clarity would be helpful.

0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 11934463
Since cursors are so much overhead, personally I would use SET ROWCOUNT to do the batching.  Also, if there are 13M distinct CR_IDs, then your batches will be way too small (avg. 24 rows each):


SET ROWCOUNT 100000  --or whatever other batch size you want
SELECT '1'
-- WARNING: as coded, this will run until all rows are updated!!
WHILE @@ROWCOUNT > 0
BEGIN
     UPDATE tblTasks
     SET Task_Num = SUBSTRING(Task_Num, 1,
         CASE DATALENGTH(Task_Num) WHEN 9 THEN 6 ELSE 8 END)
     WHERE DATALENGTH(Task_Num) >= 9
END --WHILE
0
 
LVL 8

Expert Comment

by:MartinCMS
ID: 11934505
Or you can do something like this....

UPDATE tblTasks
SET Task_Num = CASE
          WHEN DATALENGTH(Task_Num)= 9 THEN  SUBSTRING(Task_Num,1,6)
          WHEN DATALENGTH(Task_Num)> 9 THEN  SUBSTRING(Task_Num,1,8)
          ELSE Task_Num END
WHERE CR_ID in (SELECT DISTINCT CR_ID AS dynCR_ID FROM dbo.tblTasks)
0
 
LVL 10

Expert Comment

by:Jay Toops
ID: 11934522
Setup a control table of
CRID for doing a
WHERE CRID > 0 and CRID < 20000000

make your main loop select these values
and have your update statement update more rows at one time.
This will minimize your update statments to whatever level of
granularity you wish

Id do a MAX(CRID) and Min(crid) and divide it by the number of batches you want.

(IF YOU want to get fancy) You can even have a boolean value in that table
so that you can select the particular range for that processing execution.

JAY
0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 11934571
NOTE:

That code will tend to run very long near the end since it will have to scan most of the table to find qualifying rows.  Although not generally preferred, in this situation you might want to consider an updateable cursor that does the change, for example:


DECLARE Curs_tblTasks_Update CURSOR FOR
SELECT task_num
FROM dbo.tblTasks
WHERE DATALENGTH(task_num) >= 9
FOR UPDATE OF task_num
       
OPEN Curs_tblTasks_Update
FETCH NEXT FROM Curs_tblTasks_Update INTO @task_num

WHILE (@@FETCH_STATUS = 0)
     BEGIN
     UPDATE tblTasks
     SET Task_Num = SUBSTRING(Task_Num, 1,
         CASE DATALENGTH(Task_Num) WHEN 9 THEN 6 ELSE 8 END)
     WHERE CURRENT OF  Curs_tblTasks_Update
     FETCH NEXT FROM Curs_tblTasks_Update INTO @task_num
     END
CLOSE Curs_tblTasks_Update
DEALLOCATE Curs_tblTasks_Update
GO
0
 

Author Comment

by:DeMyu
ID: 11934656
ScottPletcher will it update the table in batches of 100000 as you stated above. I guess at the end of the last record @@ROWCOUNT equals 0 and the loop will be exited. Is this right?
0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 11934664
Sorry, but this code:

UPDATE tblTasks
SET Task_Num = CASE
          WHEN DATALENGTH(Task_Num)= 9 THEN  SUBSTRING(Task_Num,1,6)
          WHEN DATALENGTH(Task_Num)> 9 THEN  SUBSTRING(Task_Num,1,8)
          ELSE Task_Num END
WHERE CR_ID in (SELECT DISTINCT CR_ID AS dynCR_ID FROM dbo.tblTasks)

is a waste of time.  The cr_id will *always* be present in the list of distinct cr_id's in the same table, so the whole table will always be updated.  You might as well just code:

UPDATE tblTasks
SET Task_Num = CASE
          WHEN DATALENGTH(Task_Num)= 9 THEN  SUBSTRING(Task_Num,1,6)
          WHEN DATALENGTH(Task_Num)> 9 THEN  SUBSTRING(Task_Num,1,8)
          ELSE Task_Num END


0
 
LVL 70

Accepted Solution

by:
Scott Pletcher earned 2000 total points
ID: 11934724
>> ScottPletcher will it update the table in batches of 100000 as you stated above. I guess at the end of the last record @@ROWCOUNT equals 0 and the loop will be exited. Is this right? <<

Exactly.  When all of them are finally <= a length of nine, then the rowcount will be 0, since no rows were updated in that pass, and the loop will exit.


Honestly, just to keep an eye on the progress, I would probably do something like this:

DECLARE @rowcount INT
DECLARE @loopcount INT
SET @rowcount = 100000  --or whatever other batch size you want
SET ROWCOUNT @rowcount
SET @loopCount = 1
SELECT '1'  -- make sure @@ROWCOUNT is > 0
-- WARNING: as coded, this will run until all rows are updated!!
WHILE @@ROWCOUNT > 0
BEGIN
     UPDATE tblTasks
     SET Task_Num = SUBSTRING(Task_Num, 1,
         CASE DATALENGTH(Task_Num) WHEN 9 THEN 6 ELSE 8 END)
     WHERE DATALENGTH(Task_Num) >= 9
     PRINT 'Completed #' + CAST(@loopCount AS VARCHAR(10)) + ' group of ' + CAST(@rowcount AS VARCHAR(10)) + ' rows'
     SET @loopCount = @loopCount + 1
END --WHILE
0
 

Author Comment

by:DeMyu
ID: 11935373
ScottPletcher, the @rowcount variable in the code above is never incremented. My goal would be to update 100000 rows at a time. Any suggestions?
0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 11935451
The @rowcount is how many rows will be *updated*, not just read, so the first UPDATE should do the first 100K rows, the next UPDATE a different 100000 rows, and so on.
0
 

Author Comment

by:DeMyu
ID: 11935568
Thank you.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
This article shows gives you an overview on SQL Server 2016 row level security. You will also get to know the usages of row-level-security and how it works
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.

604 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question