DeMyu
asked on
UPDATING 300 Million Rows With A Cursor Is There A Better Method? 500 POINTS!
I have to update a table with 300 million rows, and there is no unique ID on the table. I have written the cursor below to do the update:
CREATE PROCEDURE dbo.usp_CleanADFTradeline
AS
DECLARE @dynCR_ID INT
DECLARE Curs_tblTasks_Update CURSOR FOR
SELECT DISTINCT CR_ID AS dynCR_ID FROM dbo.tblTasks
OPEN Curs_tblTasks_Update
FETCH NEXT FROM Curs_tblTasks_Update INTO @dynCR_ID
WHILE (@@FETCH_STATUS = 0)
BEGIN
UPDATE tblTasks
SET Task_Num = CASE
WHEN DATALENGTH(Task_Num)= 9 THEN SUBSTRING(Task_Num,1,6)
WHEN DATALENGTH(Task_Num)> 9 THEN SUBSTRING(Task_Num,1,8)
ELSE Task_Num
END
WHERE CR_ID = @dynCR_ID
FETCH NEXT FROM Curs_tblTasks_Update INTO @dynCR_ID
END
CLOSE Curs_tblTasks_Update
DEALLOCATE Curs_tblTasks_Update
GO
Is there a better code that will do this without the performance headache of cursors?
DeMyu
CREATE PROCEDURE dbo.usp_CleanADFTradeline
AS
DECLARE @dynCR_ID INT
DECLARE Curs_tblTasks_Update CURSOR FOR
SELECT DISTINCT CR_ID AS dynCR_ID FROM dbo.tblTasks
OPEN Curs_tblTasks_Update
FETCH NEXT FROM Curs_tblTasks_Update INTO @dynCR_ID
WHILE (@@FETCH_STATUS = 0)
BEGIN
UPDATE tblTasks
SET Task_Num = CASE
WHEN DATALENGTH(Task_Num)= 9 THEN SUBSTRING(Task_Num,1,6)
WHEN DATALENGTH(Task_Num)> 9 THEN SUBSTRING(Task_Num,1,8)
ELSE Task_Num
END
WHERE CR_ID = @dynCR_ID
FETCH NEXT FROM Curs_tblTasks_Update INTO @dynCR_ID
END
CLOSE Curs_tblTasks_Update
DEALLOCATE Curs_tblTasks_Update
GO
Is there a better code that will do this without the performance headache of cursors?
DeMyu
ASKER
I am trying to run the update in batches and each CR_ID is a batch. Out of the 300 million records there are 13 million distinct CR_ID.
DeMyu
DeMyu
The Cursor you're using run the same as an Update Statement without the Where Clause. Unless you want to run one distinct CR_ID at the time.... then
UPDATE tblTasks
SET Task_Num = CASE
WHEN DATALENGTH(Task_Num)= 9 THEN SUBSTRING(Task_Num,1,6)
WHEN DATALENGTH(Task_Num)> 9 THEN SUBSTRING(Task_Num,1,8)
ELSE Task_Num
END
Where CR_ID = InputyourCR_ID
I guest I am not 100% understanding what you're trying to accomplish here. Further clarity would be helpful.
UPDATE tblTasks
SET Task_Num = CASE
WHEN DATALENGTH(Task_Num)= 9 THEN SUBSTRING(Task_Num,1,6)
WHEN DATALENGTH(Task_Num)> 9 THEN SUBSTRING(Task_Num,1,8)
ELSE Task_Num
END
Where CR_ID = InputyourCR_ID
I guest I am not 100% understanding what you're trying to accomplish here. Further clarity would be helpful.
Since cursors are so much overhead, personally I would use SET ROWCOUNT to do the batching. Also, if there are 13M distinct CR_IDs, then your batches will be way too small (avg. 24 rows each):
SET ROWCOUNT 100000 --or whatever other batch size you want
SELECT '1'
-- WARNING: as coded, this will run until all rows are updated!!
WHILE @@ROWCOUNT > 0
BEGIN
UPDATE tblTasks
SET Task_Num = SUBSTRING(Task_Num, 1,
CASE DATALENGTH(Task_Num) WHEN 9 THEN 6 ELSE 8 END)
WHERE DATALENGTH(Task_Num) >= 9
END --WHILE
SET ROWCOUNT 100000 --or whatever other batch size you want
SELECT '1'
-- WARNING: as coded, this will run until all rows are updated!!
WHILE @@ROWCOUNT > 0
BEGIN
UPDATE tblTasks
SET Task_Num = SUBSTRING(Task_Num, 1,
CASE DATALENGTH(Task_Num) WHEN 9 THEN 6 ELSE 8 END)
WHERE DATALENGTH(Task_Num) >= 9
END --WHILE
Or you can do something like this....
UPDATE tblTasks
SET Task_Num = CASE
WHEN DATALENGTH(Task_Num)= 9 THEN SUBSTRING(Task_Num,1,6)
WHEN DATALENGTH(Task_Num)> 9 THEN SUBSTRING(Task_Num,1,8)
ELSE Task_Num END
WHERE CR_ID in (SELECT DISTINCT CR_ID AS dynCR_ID FROM dbo.tblTasks)
UPDATE tblTasks
SET Task_Num = CASE
WHEN DATALENGTH(Task_Num)= 9 THEN SUBSTRING(Task_Num,1,6)
WHEN DATALENGTH(Task_Num)> 9 THEN SUBSTRING(Task_Num,1,8)
ELSE Task_Num END
WHERE CR_ID in (SELECT DISTINCT CR_ID AS dynCR_ID FROM dbo.tblTasks)
Setup a control table of
CRID for doing a
WHERE CRID > 0 and CRID < 20000000
make your main loop select these values
and have your update statement update more rows at one time.
This will minimize your update statments to whatever level of
granularity you wish
Id do a MAX(CRID) and Min(crid) and divide it by the number of batches you want.
(IF YOU want to get fancy) You can even have a boolean value in that table
so that you can select the particular range for that processing execution.
JAY
CRID for doing a
WHERE CRID > 0 and CRID < 20000000
make your main loop select these values
and have your update statement update more rows at one time.
This will minimize your update statments to whatever level of
granularity you wish
Id do a MAX(CRID) and Min(crid) and divide it by the number of batches you want.
(IF YOU want to get fancy) You can even have a boolean value in that table
so that you can select the particular range for that processing execution.
JAY
NOTE:
That code will tend to run very long near the end since it will have to scan most of the table to find qualifying rows. Although not generally preferred, in this situation you might want to consider an updateable cursor that does the change, for example:
DECLARE Curs_tblTasks_Update CURSOR FOR
SELECT task_num
FROM dbo.tblTasks
WHERE DATALENGTH(task_num) >= 9
FOR UPDATE OF task_num
OPEN Curs_tblTasks_Update
FETCH NEXT FROM Curs_tblTasks_Update INTO @task_num
WHILE (@@FETCH_STATUS = 0)
BEGIN
UPDATE tblTasks
SET Task_Num = SUBSTRING(Task_Num, 1,
CASE DATALENGTH(Task_Num) WHEN 9 THEN 6 ELSE 8 END)
WHERE CURRENT OF Curs_tblTasks_Update
FETCH NEXT FROM Curs_tblTasks_Update INTO @task_num
END
CLOSE Curs_tblTasks_Update
DEALLOCATE Curs_tblTasks_Update
GO
That code will tend to run very long near the end since it will have to scan most of the table to find qualifying rows. Although not generally preferred, in this situation you might want to consider an updateable cursor that does the change, for example:
DECLARE Curs_tblTasks_Update CURSOR FOR
SELECT task_num
FROM dbo.tblTasks
WHERE DATALENGTH(task_num) >= 9
FOR UPDATE OF task_num
OPEN Curs_tblTasks_Update
FETCH NEXT FROM Curs_tblTasks_Update INTO @task_num
WHILE (@@FETCH_STATUS = 0)
BEGIN
UPDATE tblTasks
SET Task_Num = SUBSTRING(Task_Num, 1,
CASE DATALENGTH(Task_Num) WHEN 9 THEN 6 ELSE 8 END)
WHERE CURRENT OF Curs_tblTasks_Update
FETCH NEXT FROM Curs_tblTasks_Update INTO @task_num
END
CLOSE Curs_tblTasks_Update
DEALLOCATE Curs_tblTasks_Update
GO
ASKER
ScottPletcher will it update the table in batches of 100000 as you stated above. I guess at the end of the last record @@ROWCOUNT equals 0 and the loop will be exited. Is this right?
Sorry, but this code:
UPDATE tblTasks
SET Task_Num = CASE
WHEN DATALENGTH(Task_Num)= 9 THEN SUBSTRING(Task_Num,1,6)
WHEN DATALENGTH(Task_Num)> 9 THEN SUBSTRING(Task_Num,1,8)
ELSE Task_Num END
WHERE CR_ID in (SELECT DISTINCT CR_ID AS dynCR_ID FROM dbo.tblTasks)
is a waste of time. The cr_id will *always* be present in the list of distinct cr_id's in the same table, so the whole table will always be updated. You might as well just code:
UPDATE tblTasks
SET Task_Num = CASE
WHEN DATALENGTH(Task_Num)= 9 THEN SUBSTRING(Task_Num,1,6)
WHEN DATALENGTH(Task_Num)> 9 THEN SUBSTRING(Task_Num,1,8)
ELSE Task_Num END
UPDATE tblTasks
SET Task_Num = CASE
WHEN DATALENGTH(Task_Num)= 9 THEN SUBSTRING(Task_Num,1,6)
WHEN DATALENGTH(Task_Num)> 9 THEN SUBSTRING(Task_Num,1,8)
ELSE Task_Num END
WHERE CR_ID in (SELECT DISTINCT CR_ID AS dynCR_ID FROM dbo.tblTasks)
is a waste of time. The cr_id will *always* be present in the list of distinct cr_id's in the same table, so the whole table will always be updated. You might as well just code:
UPDATE tblTasks
SET Task_Num = CASE
WHEN DATALENGTH(Task_Num)= 9 THEN SUBSTRING(Task_Num,1,6)
WHEN DATALENGTH(Task_Num)> 9 THEN SUBSTRING(Task_Num,1,8)
ELSE Task_Num END
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
ScottPletcher, the @rowcount variable in the code above is never incremented. My goal would be to update 100000 rows at a time. Any suggestions?
The @rowcount is how many rows will be *updated*, not just read, so the first UPDATE should do the first 100K rows, the next UPDATE a different 100000 rows, and so on.
ASKER
Thank you.
you should be able to just run you Update query straightup.
UPDATE tblTasks
SET Task_Num = CASE
WHEN DATALENGTH(Task_Num)= 9 THEN SUBSTRING(Task_Num,1,6)
WHEN DATALENGTH(Task_Num)> 9 THEN SUBSTRING(Task_Num,1,8)
ELSE Task_Num
END