Solved

Mass updating a large set of data with less overhead

Posted on 2013-12-03
8
212 Views
Last Modified: 2013-12-10
I have a large table with about 30 million rows.

There is one column in the table that I have to update to a new value (currently 0, needs to be null).

When I run the normal update table set column = null where column = 0 the query takes for ever, blows up the size of tempdb, etc, etc.

Is there a better way to do this with less overhead?
0
Comment
Question by:ccleebelt
  • 4
  • 4
8 Comments
 
LVL 24

Expert Comment

by:chaau
ID: 39694272
I think you need to update in batches, like this:
set noCount on
set rowCount 10000
While 1=1 begin
  Begin Transaction
  update Table1 Set Column1 = NULL WHERE Column1 = 0
  COMMIT
  if @@rowCount = 0 break
end
set rowCount 0
set noCount off

Open in new window

0
 

Author Comment

by:ccleebelt
ID: 39710261
Chaau - this works great.  

How would I adjust the query above to just keep looping through chunks of 10,000 so I don't have to run the query over and over again on a large data set?

Also, would it kill performance to put more than on update clause in the transaction?  I need to update several columns in the table to null if they are 0.

ex. update table1 set column2 = null where column2 = 0

Thanks again
0
 
LVL 24

Expert Comment

by:chaau
ID: 39710277
I expected that one to work. Now I see where I've made a mistake. Please use the following script. That will run in a loop until all records updated:
set noCount on
set rowCount 10000
declare @LastUpdated int
While 1=1 begin
  Begin Transaction
  update Table1 Set Column1 = NULL WHERE Column1 = 0
  SET @LastUpdated = @@rowCount
  COMMIT
  if @LastUpdated = 0 break
end
set rowCount 0
set noCount off

Open in new window

As for your other question. You need to estimate how many records are updated in your other tables. If you think that the number of records is not big, you can add more updates as well. Otherwise you need to adjust the ROWCOUNT value to a smaller number
0
 

Author Comment

by:ccleebelt
ID: 39710290
Thanks.  This is exactly what I needed.

Last follow up.  If I put indexes on the columns to be updated first would it go faster?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 24

Expert Comment

by:chaau
ID: 39710306
It will help if you put indexes on the columns appearing in your where clause. In your case it is the same column, but imagine if you had a statement like this:
Update Table1 Set Column1 = 1 Where Column2 = 1

Open in new window

In this case the index on column1 will do nothing (in fact it will decrease the update speed, the the index will be required to be re-indexed by the RDBMS)
However, if you create an index for Column2 the update will be faster
0
 

Author Comment

by:ccleebelt
ID: 39710317
yeah - the query is update table1 set column1 = null where column1 = 0

So I am updating the same column that is in the where.   Would that slow it down?
0
 
LVL 24

Accepted Solution

by:
chaau earned 500 total points
ID: 39710328
It is hard to say. My best bet is that it will improve the performance. As you see, it is a catch22 situation. The index will help to locate the required records in your where clause. At the same time, the actual update will be a bit slower, because of the re-indexing overhead.

You really need to measure the performance yourself using your own data. Most likely, the index will help, as the modern RDBMSes have "smartness" in how the indexes are re-generated during updates. In your case perhaps the re-indexing overhead will be minimal.
0
 

Author Closing Comment

by:ccleebelt
ID: 39710613
Thanks for your help.  Very useful.
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now