Solved

Mass updating a large set of data with less overhead

Posted on 2013-12-03
8
220 Views
Last Modified: 2013-12-10
I have a large table with about 30 million rows.

There is one column in the table that I have to update to a new value (currently 0, needs to be null).

When I run the normal update table set column = null where column = 0 the query takes for ever, blows up the size of tempdb, etc, etc.

Is there a better way to do this with less overhead?
0
Comment
Question by:ccleebelt
  • 4
  • 4
8 Comments
 
LVL 24

Expert Comment

by:chaau
ID: 39694272
I think you need to update in batches, like this:
set noCount on
set rowCount 10000
While 1=1 begin
  Begin Transaction
  update Table1 Set Column1 = NULL WHERE Column1 = 0
  COMMIT
  if @@rowCount = 0 break
end
set rowCount 0
set noCount off

Open in new window

0
 

Author Comment

by:ccleebelt
ID: 39710261
Chaau - this works great.  

How would I adjust the query above to just keep looping through chunks of 10,000 so I don't have to run the query over and over again on a large data set?

Also, would it kill performance to put more than on update clause in the transaction?  I need to update several columns in the table to null if they are 0.

ex. update table1 set column2 = null where column2 = 0

Thanks again
0
 
LVL 24

Expert Comment

by:chaau
ID: 39710277
I expected that one to work. Now I see where I've made a mistake. Please use the following script. That will run in a loop until all records updated:
set noCount on
set rowCount 10000
declare @LastUpdated int
While 1=1 begin
  Begin Transaction
  update Table1 Set Column1 = NULL WHERE Column1 = 0
  SET @LastUpdated = @@rowCount
  COMMIT
  if @LastUpdated = 0 break
end
set rowCount 0
set noCount off

Open in new window

As for your other question. You need to estimate how many records are updated in your other tables. If you think that the number of records is not big, you can add more updates as well. Otherwise you need to adjust the ROWCOUNT value to a smaller number
0
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

 

Author Comment

by:ccleebelt
ID: 39710290
Thanks.  This is exactly what I needed.

Last follow up.  If I put indexes on the columns to be updated first would it go faster?
0
 
LVL 24

Expert Comment

by:chaau
ID: 39710306
It will help if you put indexes on the columns appearing in your where clause. In your case it is the same column, but imagine if you had a statement like this:
Update Table1 Set Column1 = 1 Where Column2 = 1

Open in new window

In this case the index on column1 will do nothing (in fact it will decrease the update speed, the the index will be required to be re-indexed by the RDBMS)
However, if you create an index for Column2 the update will be faster
0
 

Author Comment

by:ccleebelt
ID: 39710317
yeah - the query is update table1 set column1 = null where column1 = 0

So I am updating the same column that is in the where.   Would that slow it down?
0
 
LVL 24

Accepted Solution

by:
chaau earned 500 total points
ID: 39710328
It is hard to say. My best bet is that it will improve the performance. As you see, it is a catch22 situation. The index will help to locate the required records in your where clause. At the same time, the actual update will be a bit slower, because of the re-indexing overhead.

You really need to measure the performance yourself using your own data. Most likely, the index will help, as the modern RDBMSes have "smartness" in how the indexes are re-generated during updates. In your case perhaps the re-indexing overhead will be minimal.
0
 

Author Closing Comment

by:ccleebelt
ID: 39710613
Thanks for your help.  Very useful.
0

Featured Post

Back Up Your Microsoft Windows Server®

Back up all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
stored procedures times out after 2 hours in sql server 2014 16 38
Sql Server group by 10 30
Flattening heirachies 3 31
divide by zero error 23 16
Nowadays, some of developer are too much worried about data. Who is using data, who is updating it etc. etc. Because, data is more costlier in term of money and information. So security of data is focusing concern in days. Lets' understand the Au…
When you hear the word proxy, you may become apprehensive. This article will help you to understand Proxy and when it is useful. Let's talk Proxy for SQL Server. (Not in terms of Internet access.) Typically, you'll run into this type of problem w…
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

832 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question