Solved

Mass updating a large set of data with less overhead

Posted on 2013-12-03
8
227 Views
Last Modified: 2013-12-10
I have a large table with about 30 million rows.

There is one column in the table that I have to update to a new value (currently 0, needs to be null).

When I run the normal update table set column = null where column = 0 the query takes for ever, blows up the size of tempdb, etc, etc.

Is there a better way to do this with less overhead?
0
Comment
Question by:ccleebelt
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
8 Comments
 
LVL 25

Expert Comment

by:chaau
ID: 39694272
I think you need to update in batches, like this:
set noCount on
set rowCount 10000
While 1=1 begin
  Begin Transaction
  update Table1 Set Column1 = NULL WHERE Column1 = 0
  COMMIT
  if @@rowCount = 0 break
end
set rowCount 0
set noCount off

Open in new window

0
 

Author Comment

by:ccleebelt
ID: 39710261
Chaau - this works great.  

How would I adjust the query above to just keep looping through chunks of 10,000 so I don't have to run the query over and over again on a large data set?

Also, would it kill performance to put more than on update clause in the transaction?  I need to update several columns in the table to null if they are 0.

ex. update table1 set column2 = null where column2 = 0

Thanks again
0
 
LVL 25

Expert Comment

by:chaau
ID: 39710277
I expected that one to work. Now I see where I've made a mistake. Please use the following script. That will run in a loop until all records updated:
set noCount on
set rowCount 10000
declare @LastUpdated int
While 1=1 begin
  Begin Transaction
  update Table1 Set Column1 = NULL WHERE Column1 = 0
  SET @LastUpdated = @@rowCount
  COMMIT
  if @LastUpdated = 0 break
end
set rowCount 0
set noCount off

Open in new window

As for your other question. You need to estimate how many records are updated in your other tables. If you think that the number of records is not big, you can add more updates as well. Otherwise you need to adjust the ROWCOUNT value to a smaller number
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:ccleebelt
ID: 39710290
Thanks.  This is exactly what I needed.

Last follow up.  If I put indexes on the columns to be updated first would it go faster?
0
 
LVL 25

Expert Comment

by:chaau
ID: 39710306
It will help if you put indexes on the columns appearing in your where clause. In your case it is the same column, but imagine if you had a statement like this:
Update Table1 Set Column1 = 1 Where Column2 = 1

Open in new window

In this case the index on column1 will do nothing (in fact it will decrease the update speed, the the index will be required to be re-indexed by the RDBMS)
However, if you create an index for Column2 the update will be faster
0
 

Author Comment

by:ccleebelt
ID: 39710317
yeah - the query is update table1 set column1 = null where column1 = 0

So I am updating the same column that is in the where.   Would that slow it down?
0
 
LVL 25

Accepted Solution

by:
chaau earned 500 total points
ID: 39710328
It is hard to say. My best bet is that it will improve the performance. As you see, it is a catch22 situation. The index will help to locate the required records in your where clause. At the same time, the actual update will be a bit slower, because of the re-indexing overhead.

You really need to measure the performance yourself using your own data. Most likely, the index will help, as the modern RDBMSes have "smartness" in how the indexes are re-generated during updates. In your case perhaps the re-indexing overhead will be minimal.
0
 

Author Closing Comment

by:ccleebelt
ID: 39710613
Thanks for your help.  Very useful.
0

Featured Post

Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Everyone has problem when going to load data into Data warehouse (EDW). They all need to confirm that data quality is good but they don't no how to proceed. Microsoft has provided new task within SSIS 2008 called "Data Profiler Task". It solve th…
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to setup several different housekeeping processes for a SQL Server.

735 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question