Update Query again 1.4 million records

I am wondering if the performance I am getting is correct. I have a DB with 1.4 million reords and row size of 1288 as stated by the query analyzer. I am running a query against 1 column SDVR01 attempting to trailing spaces. This query is still running 21 mins later. should take this long on a Win 2K server with 1 gig of memory? I am using sql server 2000. any ideas?? query below:

UPDATE F42119
SET SDVR01 = RTRIM(SDVR01)
WHERE     RIGHT(SDVR01, 1) = ' '

p.s. - decreasing row sze will do nothing becuase I am updating specific column (SDVR01). Or am I wrong? I can rid myself of numerous columns.
timokeeffeAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

lauszCommented:
Is mandatory for you to do this update ...because you can use .. RTRIM(SDVR01) .. in your queries and do not do nothing..
 
0
frodomanCommented:
You might try excluding the "where" clause because performing a rtrim won't hurt anything even if the records don't have trailing spaces.  This may help somewhat because I believe "right" is an expensive operation in terms of processing.

This will take a while because you're almost certainly doing a full table scan on all 1.4 million records; however it doesn't seem like it should take 20 minutes.  Is the processor pegged at 100% when this is executing?  Have you checked the logs to see if there are any problems being reported?

Also, lausz isn't wrong with his/her comment either.  You are far better off to trim the spaces before inserting them if that is possible.

0
LowfatspreadCommented:
have you considered just bcp ing the table in and out
to reduce the column lengths

use a select query to rtrim the column on the output statement
then just reload the shrunk results...

should be much much faster...


0
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

_TAD_Commented:

I personally would elect to perform the execution like frodoman suggested.... simply remove the where clause.   RTrim() won't hurt any fields, it will only drop blank spaces.



However, it is good to note for future processes that it is usually best to update against a primary key field.

I am not certain about the efficiency, but I was always under the impression that updating by use of a primary key is always best.


UPDATE F42119
SET SDVR01 = RTRIM(SDVR01)
WHERE   f42119.PrimeKeyCol in (select PrimaryKeyCol from F42119 where  RIGHT(SDVR01, 1) = ' ' )


This way the SQL Server gets a subset of data to update instead of scanning the entire table.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
LowfatspreadCommented:
don't see your point
unless you have determined the primary key set somehow
(via a relationship) its going to be a full table scan....

which is why BCP should be considered, you'll end up with much a neater/optimized   table...

you could specify the primary key and split the process up into a series of ranges which maybe useful if your trying to
do this whilst allowing normal processes to continue...
 
0
xassetsCommented:
Go the temp table route

select primarykeyid, rtrim(sdvr01) as sdvr01
into #temp
from f42119 (index=indsdvr01)
where rtrim(sdvr01) <> sdvr01
go

update a
set sdvr01 = b.sdvr01
from f42119 as a inner join #temp as b on a.primarykeyid = b.primarykeyid

drop table #temp
go

This has the benefits of (a) the insert into #temp is not logged, and so will be fast (b) sdvr01 is already in memory so the query processor doesnt have to do any additional lookups or that right() calculation (c) the update operation is optimised against the primary key and  (d) the update will be "in place" because you can guarantee that sdvr01 will never get bigger

Leave off the index stmt if there not one in existence (inequality comparisons may force the qp to not use the index anyway).

For best results run each query separately, so the optimiser doesn't compile the first before the temp table exists. Definitely don't call them all from one sp.

0
namasi_navaretnamCommented:
What is the datatype for column SDVR01? Char or Varchar.

1) If this column is defined as char this update will not work to eliminate trailing spaces, bacause char datatype will always have trailing spcaes apended..

2) If your column defined as varchar and ANSI_PADDING option is set to ON then you can do what you are trying to do.

3) If your column is defined as varchar and ANSI_PADDING option is set to OFF then trailing spaces will always be trimmed.

Reading your question carefully, I see that you are trying to change the rowsize. But the rowsize depends on the datatype not on the data itself. I think even after you perform this update rowsize will not change. You can try this yourself to verify.

HTH

Namasi Navaretnam
0
Scott PletcherSenior DBACommented:
I disagree with others about dropping the WHERE clause.  That should make things *worse*, because updates will be done for all rows vs. just needed rows.  A little CPU overhead to do the RIGHT() check is a lot less total overhead that an actual update of a row with the resulting I/O.

This sounds as if it may be a log size issue, or at least I've had similar results in that situation.  If the log is initially small (say 1-2M) and it is set to grow at 10% (the default), the log will likely need dozens of dynamic extensions to complete the UPDATE, which can be very slow.  Prior to the command, change the log size to a significantly larger amount (say 200M) and the growth to a fixed amount, say 40M, using, for example, EM; NOTE: this will take some time, maybe a couple to a few minutes.  However, preformatted log space should now be available and the UPDATE should run much faster.  I trimmed a 1M row update from > 10 mins. (I cancelled after 10 mins.) to less than 1 min. after changing the log as above.  No guarantee it will work for you, of course, but I think it may be worth a shot.
0
ShogunWadeCommented:
On  daily basis I receive in excess of 15 mil records which and perform numerous bits and peices on the data for clensing.  I do not use UPDATE because it is slow.

from my experience I order the potential ways like this (in terms of fastest first).

1) BCP out from view, TRUNCATE then BCP in
2) SELECT INTO NewTable, drop old table, rename new toable to old table.
3) UPDATE with no where clause (assuming there is not index on the field in question).  (scott, I have benchmarked this on 40% updates over 10 mill records and it was faster in my case, since there was no index on the field i was cleaning and this a tablescan was required).
4) UPDATE with where clause

I would also use locking hints.  using WITH(TABLOCK) can save about 20% of time due to the number of record locked before escalation.

If you are doing the update, as scott suggests you ort to do your best to minimise expansion.
0
timokeeffeAuthor Commented:
ShogunWade
could you go into BCP option in more detial? What is BCP and how do implement it?

sorry if this is a truly dumb question...
0
ShogunWadeCommented:
BCP is the original method of getting data in and out of SQL server.   It is a command line utility which can export or import from text files (or native SQL files)

If you create a view on your table using the RTRIM(....) function eg:

CREATE VIEW vw AS
    SELECT ID,RTRIM(MyText) MyNewText
FORM MyTable


then at the command line, or using xp_cmdshell execute:

BCP "MyDb.dbo.vw" OUT "c:\mytempfile.txt" -SMyServer -N

it will create a file in native SQL format

then TRUNCATE TABLE MyTable

Then BCP your data back in


BCP "MyDb.dbo.vw" IN "c:\mytempfile.txt" -SMyServer -N


There is more on BCP command  in BOL  search for BCP Utility

0
Scott PletcherSenior DBACommented:
Personally, I have found that BULK INSERT is a much faster method of loading data.  Since there is no BULK OUTPUT command, I too recommend BCP to get the data out of the db, but BULK INSERT rather than BCP to load it back in.
0
ShogunWadeCommented:
Must admit i havent ever benchmarked BCP as opposed to BULK insert.
0
ShogunWadeCommented:
Has your question been answered?   If so could you please close it.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.