Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Mass updating a large set of data with less overhead

Posted on 2013-12-03
8
Medium Priority
?
234 Views
Last Modified: 2013-12-10
I have a large table with about 30 million rows.

There is one column in the table that I have to update to a new value (currently 0, needs to be null).

When I run the normal update table set column = null where column = 0 the query takes for ever, blows up the size of tempdb, etc, etc.

Is there a better way to do this with less overhead?
0
Comment
Question by:ccleebelt
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
8 Comments
 
LVL 25

Expert Comment

by:chaau
ID: 39694272
I think you need to update in batches, like this:
set noCount on
set rowCount 10000
While 1=1 begin
  Begin Transaction
  update Table1 Set Column1 = NULL WHERE Column1 = 0
  COMMIT
  if @@rowCount = 0 break
end
set rowCount 0
set noCount off

Open in new window

0
 

Author Comment

by:ccleebelt
ID: 39710261
Chaau - this works great.  

How would I adjust the query above to just keep looping through chunks of 10,000 so I don't have to run the query over and over again on a large data set?

Also, would it kill performance to put more than on update clause in the transaction?  I need to update several columns in the table to null if they are 0.

ex. update table1 set column2 = null where column2 = 0

Thanks again
0
 
LVL 25

Expert Comment

by:chaau
ID: 39710277
I expected that one to work. Now I see where I've made a mistake. Please use the following script. That will run in a loop until all records updated:
set noCount on
set rowCount 10000
declare @LastUpdated int
While 1=1 begin
  Begin Transaction
  update Table1 Set Column1 = NULL WHERE Column1 = 0
  SET @LastUpdated = @@rowCount
  COMMIT
  if @LastUpdated = 0 break
end
set rowCount 0
set noCount off

Open in new window

As for your other question. You need to estimate how many records are updated in your other tables. If you think that the number of records is not big, you can add more updates as well. Otherwise you need to adjust the ROWCOUNT value to a smaller number
0
Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

 

Author Comment

by:ccleebelt
ID: 39710290
Thanks.  This is exactly what I needed.

Last follow up.  If I put indexes on the columns to be updated first would it go faster?
0
 
LVL 25

Expert Comment

by:chaau
ID: 39710306
It will help if you put indexes on the columns appearing in your where clause. In your case it is the same column, but imagine if you had a statement like this:
Update Table1 Set Column1 = 1 Where Column2 = 1

Open in new window

In this case the index on column1 will do nothing (in fact it will decrease the update speed, the the index will be required to be re-indexed by the RDBMS)
However, if you create an index for Column2 the update will be faster
0
 

Author Comment

by:ccleebelt
ID: 39710317
yeah - the query is update table1 set column1 = null where column1 = 0

So I am updating the same column that is in the where.   Would that slow it down?
0
 
LVL 25

Accepted Solution

by:
chaau earned 2000 total points
ID: 39710328
It is hard to say. My best bet is that it will improve the performance. As you see, it is a catch22 situation. The index will help to locate the required records in your where clause. At the same time, the actual update will be a bit slower, because of the re-indexing overhead.

You really need to measure the performance yourself using your own data. Most likely, the index will help, as the modern RDBMSes have "smartness" in how the indexes are re-generated during updates. In your case perhaps the re-indexing overhead will be minimal.
0
 

Author Closing Comment

by:ccleebelt
ID: 39710613
Thanks for your help.  Very useful.
0

Featured Post

Learn how to optimize MySQL for your business need

With the increasing importance of apps & networks in both business & personal interconnections, perfor. has become one of the key metrics of successful communication. This ebook is a hands-on business-case-driven guide to understanding MySQL query parameter tuning & database perf

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
It is possible to export the data of a SQL Table in SSMS and generate INSERT statements. It's neatly tucked away in the generate scripts option of a database.
Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question