Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


Update Query again 1.4 million records

Posted on 2003-12-05
Medium Priority
Last Modified: 2012-06-27
I am wondering if the performance I am getting is correct. I have a DB with 1.4 million reords and row size of 1288 as stated by the query analyzer. I am running a query against 1 column SDVR01 attempting to trailing spaces. This query is still running 21 mins later. should take this long on a Win 2K server with 1 gig of memory? I am using sql server 2000. any ideas?? query below:

WHERE     RIGHT(SDVR01, 1) = ' '

p.s. - decreasing row sze will do nothing becuase I am updating specific column (SDVR01). Or am I wrong? I can rid myself of numerous columns.
Question by:timokeeffe
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
  • 2
  • +6

Expert Comment

ID: 9883564
Is mandatory for you to do this update ...because you can use .. RTRIM(SDVR01) .. in your queries and do not do nothing..
LVL 42

Expert Comment

ID: 9883773
You might try excluding the "where" clause because performing a rtrim won't hurt anything even if the records don't have trailing spaces.  This may help somewhat because I believe "right" is an expensive operation in terms of processing.

This will take a while because you're almost certainly doing a full table scan on all 1.4 million records; however it doesn't seem like it should take 20 minutes.  Is the processor pegged at 100% when this is executing?  Have you checked the logs to see if there are any problems being reported?

Also, lausz isn't wrong with his/her comment either.  You are far better off to trim the spaces before inserting them if that is possible.

LVL 50

Expert Comment

ID: 9884176
have you considered just bcp ing the table in and out
to reduce the column lengths

use a select query to rtrim the column on the output statement
then just reload the shrunk results...

should be much much faster...

Veeam Task Manager for Hyper-V

Task Manager for Hyper-V provides critical information that allows you to monitor Hyper-V performance by displaying real-time views of CPU and memory at the individual VM-level, so you can quickly identify which VMs are using host resources.

LVL 22

Accepted Solution

_TAD_ earned 2000 total points
ID: 9886034

I personally would elect to perform the execution like frodoman suggested.... simply remove the where clause.   RTrim() won't hurt any fields, it will only drop blank spaces.

However, it is good to note for future processes that it is usually best to update against a primary key field.

I am not certain about the efficiency, but I was always under the impression that updating by use of a primary key is always best.

WHERE   f42119.PrimeKeyCol in (select PrimaryKeyCol from F42119 where  RIGHT(SDVR01, 1) = ' ' )

This way the SQL Server gets a subset of data to update instead of scanning the entire table.
LVL 50

Expert Comment

ID: 9887521
don't see your point
unless you have determined the primary key set somehow
(via a relationship) its going to be a full table scan....

which is why BCP should be considered, you'll end up with much a neater/optimized   table...

you could specify the primary key and split the process up into a series of ranges which maybe useful if your trying to
do this whilst allowing normal processes to continue...

Expert Comment

ID: 9888174
Go the temp table route

select primarykeyid, rtrim(sdvr01) as sdvr01
into #temp
from f42119 (index=indsdvr01)
where rtrim(sdvr01) <> sdvr01

update a
set sdvr01 = b.sdvr01
from f42119 as a inner join #temp as b on a.primarykeyid = b.primarykeyid

drop table #temp

This has the benefits of (a) the insert into #temp is not logged, and so will be fast (b) sdvr01 is already in memory so the query processor doesnt have to do any additional lookups or that right() calculation (c) the update operation is optimised against the primary key and  (d) the update will be "in place" because you can guarantee that sdvr01 will never get bigger

Leave off the index stmt if there not one in existence (inequality comparisons may force the qp to not use the index anyway).

For best results run each query separately, so the optimiser doesn't compile the first before the temp table exists. Definitely don't call them all from one sp.

LVL 15

Expert Comment

ID: 9893948
What is the datatype for column SDVR01? Char or Varchar.

1) If this column is defined as char this update will not work to eliminate trailing spaces, bacause char datatype will always have trailing spcaes apended..

2) If your column defined as varchar and ANSI_PADDING option is set to ON then you can do what you are trying to do.

3) If your column is defined as varchar and ANSI_PADDING option is set to OFF then trailing spaces will always be trimmed.

Reading your question carefully, I see that you are trying to change the rowsize. But the rowsize depends on the datatype not on the data itself. I think even after you perform this update rowsize will not change. You can try this yourself to verify.


Namasi Navaretnam
LVL 70

Expert Comment

by:Scott Pletcher
ID: 9897156
I disagree with others about dropping the WHERE clause.  That should make things *worse*, because updates will be done for all rows vs. just needed rows.  A little CPU overhead to do the RIGHT() check is a lot less total overhead that an actual update of a row with the resulting I/O.

This sounds as if it may be a log size issue, or at least I've had similar results in that situation.  If the log is initially small (say 1-2M) and it is set to grow at 10% (the default), the log will likely need dozens of dynamic extensions to complete the UPDATE, which can be very slow.  Prior to the command, change the log size to a significantly larger amount (say 200M) and the growth to a fixed amount, say 40M, using, for example, EM; NOTE: this will take some time, maybe a couple to a few minutes.  However, preformatted log space should now be available and the UPDATE should run much faster.  I trimmed a 1M row update from > 10 mins. (I cancelled after 10 mins.) to less than 1 min. after changing the log as above.  No guarantee it will work for you, of course, but I think it may be worth a shot.
LVL 18

Expert Comment

ID: 9904174
On  daily basis I receive in excess of 15 mil records which and perform numerous bits and peices on the data for clensing.  I do not use UPDATE because it is slow.

from my experience I order the potential ways like this (in terms of fastest first).

1) BCP out from view, TRUNCATE then BCP in
2) SELECT INTO NewTable, drop old table, rename new toable to old table.
3) UPDATE with no where clause (assuming there is not index on the field in question).  (scott, I have benchmarked this on 40% updates over 10 mill records and it was faster in my case, since there was no index on the field i was cleaning and this a tablescan was required).
4) UPDATE with where clause

I would also use locking hints.  using WITH(TABLOCK) can save about 20% of time due to the number of record locked before escalation.

If you are doing the update, as scott suggests you ort to do your best to minimise expansion.

Author Comment

ID: 9904302
could you go into BCP option in more detial? What is BCP and how do implement it?

sorry if this is a truly dumb question...
LVL 18

Expert Comment

ID: 9904369
BCP is the original method of getting data in and out of SQL server.   It is a command line utility which can export or import from text files (or native SQL files)

If you create a view on your table using the RTRIM(....) function eg:

    SELECT ID,RTRIM(MyText) MyNewText
FORM MyTable

then at the command line, or using xp_cmdshell execute:

BCP "MyDb.dbo.vw" OUT "c:\mytempfile.txt" -SMyServer -N

it will create a file in native SQL format


Then BCP your data back in

BCP "MyDb.dbo.vw" IN "c:\mytempfile.txt" -SMyServer -N

There is more on BCP command  in BOL  search for BCP Utility

LVL 70

Expert Comment

by:Scott Pletcher
ID: 9904716
Personally, I have found that BULK INSERT is a much faster method of loading data.  Since there is no BULK OUTPUT command, I too recommend BCP to get the data out of the db, but BULK INSERT rather than BCP to load it back in.
LVL 18

Expert Comment

ID: 9904735
Must admit i havent ever benchmarked BCP as opposed to BULK insert.
LVL 18

Expert Comment

ID: 10021189
Has your question been answered?   If so could you please close it.

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
This month, Experts Exchange sat down with resident SQL expert, Jim Horn, for an in-depth look into the makings of a successful career in SQL.
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question