asked on

Update Query again 1.4 million records

I am wondering if the performance I am getting is correct. I have a DB with 1.4 million reords and row size of 1288 as stated by the query analyzer. I am running a query against 1 column SDVR01 attempting to trailing spaces. This query is still running 21 mins later. should take this long on a Win 2K server with 1 gig of memory? I am using sql server 2000. any ideas?? query below:

UPDATE F42119
SET SDVR01 = RTRIM(SDVR01)
WHERE RIGHT(SDVR01, 1) = ' '

p.s. - decreasing row sze will do nothing becuase I am updating specific column (SDVR01). Or am I wrong? I can rid myself of numerous columns.

lausz

Is mandatory for you to do this update ...because you can use .. RTRIM(SDVR01) .. in your queries and do not do nothing..

frodoman

You might try excluding the "where" clause because performing a rtrim won't hurt anything even if the records don't have trailing spaces. This may help somewhat because I believe "right" is an expensive operation in terms of processing.

This will take a while because you're almost certainly doing a full table scan on all 1.4 million records; however it doesn't seem like it should take 20 minutes. Is the processor pegged at 100% when this is executing? Have you checked the logs to see if there are any problems being reported?

Also, lausz isn't wrong with his/her comment either. You are far better off to trim the spaces before inserting them if that is possible.

Lowfatspread

have you considered just bcp ing the table in and out
to reduce the column lengths

use a select query to rtrim the column on the output statement
then just reload the shrunk results...

should be much much faster...

ASKER CERTIFIED SOLUTION

_TAD_

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Lowfatspread

don't see your point
unless you have determined the primary key set somehow
(via a relationship) its going to be a full table scan....

which is why BCP should be considered, you'll end up with much a neater/optimized table...

you could specify the primary key and split the process up into a series of ranges which maybe useful if your trying to
do this whilst allowing normal processes to continue...

xassets

Go the temp table route

select primarykeyid, rtrim(sdvr01) as sdvr01
into #temp
from f42119 (index=indsdvr01)
where rtrim(sdvr01) <> sdvr01
go

update a
set sdvr01 = b.sdvr01
from f42119 as a inner join #temp as b on a.primarykeyid = b.primarykeyid

drop table #temp
go

This has the benefits of (a) the insert into #temp is not logged, and so will be fast (b) sdvr01 is already in memory so the query processor doesnt have to do any additional lookups or that right() calculation (c) the update operation is optimised against the primary key and (d) the update will be "in place" because you can guarantee that sdvr01 will never get bigger

Leave off the index stmt if there not one in existence (inequality comparisons may force the qp to not use the index anyway).

For best results run each query separately, so the optimiser doesn't compile the first before the temp table exists. Definitely don't call them all from one sp.

namasi_navaretnam

What is the datatype for column SDVR01? Char or Varchar.

1) If this column is defined as char this update will not work to eliminate trailing spaces, bacause char datatype will always have trailing spcaes apended..

2) If your column defined as varchar and ANSI_PADDING option is set to ON then you can do what you are trying to do.

3) If your column is defined as varchar and ANSI_PADDING option is set to OFF then trailing spaces will always be trimmed.

Reading your question carefully, I see that you are trying to change the rowsize. But the rowsize depends on the datatype not on the data itself. I think even after you perform this update rowsize will not change. You can try this yourself to verify.

HTH

Namasi Navaretnam

Scott Pletcher

I disagree with others about dropping the WHERE clause. That should make things *worse*, because updates will be done for all rows vs. just needed rows. A little CPU overhead to do the RIGHT() check is a lot less total overhead that an actual update of a row with the resulting I/O.

This sounds as if it may be a log size issue, or at least I've had similar results in that situation. If the log is initially small (say 1-2M) and it is set to grow at 10% (the default), the log will likely need dozens of dynamic extensions to complete the UPDATE, which can be very slow. Prior to the command, change the log size to a significantly larger amount (say 200M) and the growth to a fixed amount, say 40M, using, for example, EM; NOTE: this will take some time, maybe a couple to a few minutes. However, preformatted log space should now be available and the UPDATE should run much faster. I trimmed a 1M row update from > 10 mins. (I cancelled after 10 mins.) to less than 1 min. after changing the log as above. No guarantee it will work for you, of course, but I think it may be worth a shot.

ShogunWade

On daily basis I receive in excess of 15 mil records which and perform numerous bits and peices on the data for clensing. I do not use UPDATE because it is slow.

from my experience I order the potential ways like this (in terms of fastest first).

1) BCP out from view, TRUNCATE then BCP in
2) SELECT INTO NewTable, drop old table, rename new toable to old table.
3) UPDATE with no where clause (assuming there is not index on the field in question). (scott, I have benchmarked this on 40% updates over 10 mill records and it was faster in my case, since there was no index on the field i was cleaning and this a tablescan was required).
4) UPDATE with where clause

I would also use locking hints. using WITH(TABLOCK) can save about 20% of time due to the number of record locked before escalation.

If you are doing the update, as scott suggests you ort to do your best to minimise expansion.

timokeeffe

ASKER

ShogunWade
could you go into BCP option in more detial? What is BCP and how do implement it?

sorry if this is a truly dumb question...

ShogunWade

BCP is the original method of getting data in and out of SQL server. It is a command line utility which can export or import from text files (or native SQL files)

If you create a view on your table using the RTRIM(....) function eg:

CREATE VIEW vw AS
SELECT ID,RTRIM(MyText) MyNewText
FORM MyTable

then at the command line, or using xp_cmdshell execute:

BCP "MyDb.dbo.vw" OUT "c:\mytempfile.txt" -SMyServer -N

it will create a file in native SQL format

then TRUNCATE TABLE MyTable

Then BCP your data back in

BCP "MyDb.dbo.vw" IN "c:\mytempfile.txt" -SMyServer -N

There is more on BCP command in BOL search for BCP Utility

Scott Pletcher

Personally, I have found that BULK INSERT is a much faster method of loading data. Since there is no BULK OUTPUT command, I too recommend BCP to get the data out of the db, but BULK INSERT rather than BCP to load it back in.

ShogunWade

Must admit i havent ever benchmarked BCP as opposed to BULK insert.

ShogunWade

Has your question been answered? If so could you please close it.