Solved

Update Query again 1.4 million records

Posted on 2003-12-05
14
1,597 Views
Last Modified: 2012-06-27
I am wondering if the performance I am getting is correct. I have a DB with 1.4 million reords and row size of 1288 as stated by the query analyzer. I am running a query against 1 column SDVR01 attempting to trailing spaces. This query is still running 21 mins later. should take this long on a Win 2K server with 1 gig of memory? I am using sql server 2000. any ideas?? query below:

UPDATE F42119
SET SDVR01 = RTRIM(SDVR01)
WHERE     RIGHT(SDVR01, 1) = ' '

p.s. - decreasing row sze will do nothing becuase I am updating specific column (SDVR01). Or am I wrong? I can rid myself of numerous columns.
0
Comment
Question by:timokeeffe
  • 4
  • 2
  • 2
  • +6
14 Comments
 
LVL 6

Expert Comment

by:lausz
Comment Utility
Is mandatory for you to do this update ...because you can use .. RTRIM(SDVR01) .. in your queries and do not do nothing..
 
0
 
LVL 42

Expert Comment

by:frodoman
Comment Utility
You might try excluding the "where" clause because performing a rtrim won't hurt anything even if the records don't have trailing spaces.  This may help somewhat because I believe "right" is an expensive operation in terms of processing.

This will take a while because you're almost certainly doing a full table scan on all 1.4 million records; however it doesn't seem like it should take 20 minutes.  Is the processor pegged at 100% when this is executing?  Have you checked the logs to see if there are any problems being reported?

Also, lausz isn't wrong with his/her comment either.  You are far better off to trim the spaces before inserting them if that is possible.

0
 
LVL 50

Expert Comment

by:Lowfatspread
Comment Utility
have you considered just bcp ing the table in and out
to reduce the column lengths

use a select query to rtrim the column on the output statement
then just reload the shrunk results...

should be much much faster...


0
 
LVL 22

Accepted Solution

by:
_TAD_ earned 500 total points
Comment Utility

I personally would elect to perform the execution like frodoman suggested.... simply remove the where clause.   RTrim() won't hurt any fields, it will only drop blank spaces.



However, it is good to note for future processes that it is usually best to update against a primary key field.

I am not certain about the efficiency, but I was always under the impression that updating by use of a primary key is always best.


UPDATE F42119
SET SDVR01 = RTRIM(SDVR01)
WHERE   f42119.PrimeKeyCol in (select PrimaryKeyCol from F42119 where  RIGHT(SDVR01, 1) = ' ' )


This way the SQL Server gets a subset of data to update instead of scanning the entire table.
0
 
LVL 50

Expert Comment

by:Lowfatspread
Comment Utility
don't see your point
unless you have determined the primary key set somehow
(via a relationship) its going to be a full table scan....

which is why BCP should be considered, you'll end up with much a neater/optimized   table...

you could specify the primary key and split the process up into a series of ranges which maybe useful if your trying to
do this whilst allowing normal processes to continue...
 
0
 
LVL 4

Expert Comment

by:xassets
Comment Utility
Go the temp table route

select primarykeyid, rtrim(sdvr01) as sdvr01
into #temp
from f42119 (index=indsdvr01)
where rtrim(sdvr01) <> sdvr01
go

update a
set sdvr01 = b.sdvr01
from f42119 as a inner join #temp as b on a.primarykeyid = b.primarykeyid

drop table #temp
go

This has the benefits of (a) the insert into #temp is not logged, and so will be fast (b) sdvr01 is already in memory so the query processor doesnt have to do any additional lookups or that right() calculation (c) the update operation is optimised against the primary key and  (d) the update will be "in place" because you can guarantee that sdvr01 will never get bigger

Leave off the index stmt if there not one in existence (inequality comparisons may force the qp to not use the index anyway).

For best results run each query separately, so the optimiser doesn't compile the first before the temp table exists. Definitely don't call them all from one sp.

0
 
LVL 15

Expert Comment

by:namasi_navaretnam
Comment Utility
What is the datatype for column SDVR01? Char or Varchar.

1) If this column is defined as char this update will not work to eliminate trailing spaces, bacause char datatype will always have trailing spcaes apended..

2) If your column defined as varchar and ANSI_PADDING option is set to ON then you can do what you are trying to do.

3) If your column is defined as varchar and ANSI_PADDING option is set to OFF then trailing spaces will always be trimmed.

Reading your question carefully, I see that you are trying to change the rowsize. But the rowsize depends on the datatype not on the data itself. I think even after you perform this update rowsize will not change. You can try this yourself to verify.

HTH

Namasi Navaretnam
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
I disagree with others about dropping the WHERE clause.  That should make things *worse*, because updates will be done for all rows vs. just needed rows.  A little CPU overhead to do the RIGHT() check is a lot less total overhead that an actual update of a row with the resulting I/O.

This sounds as if it may be a log size issue, or at least I've had similar results in that situation.  If the log is initially small (say 1-2M) and it is set to grow at 10% (the default), the log will likely need dozens of dynamic extensions to complete the UPDATE, which can be very slow.  Prior to the command, change the log size to a significantly larger amount (say 200M) and the growth to a fixed amount, say 40M, using, for example, EM; NOTE: this will take some time, maybe a couple to a few minutes.  However, preformatted log space should now be available and the UPDATE should run much faster.  I trimmed a 1M row update from > 10 mins. (I cancelled after 10 mins.) to less than 1 min. after changing the log as above.  No guarantee it will work for you, of course, but I think it may be worth a shot.
0
 
LVL 18

Expert Comment

by:ShogunWade
Comment Utility
On  daily basis I receive in excess of 15 mil records which and perform numerous bits and peices on the data for clensing.  I do not use UPDATE because it is slow.

from my experience I order the potential ways like this (in terms of fastest first).

1) BCP out from view, TRUNCATE then BCP in
2) SELECT INTO NewTable, drop old table, rename new toable to old table.
3) UPDATE with no where clause (assuming there is not index on the field in question).  (scott, I have benchmarked this on 40% updates over 10 mill records and it was faster in my case, since there was no index on the field i was cleaning and this a tablescan was required).
4) UPDATE with where clause

I would also use locking hints.  using WITH(TABLOCK) can save about 20% of time due to the number of record locked before escalation.

If you are doing the update, as scott suggests you ort to do your best to minimise expansion.
0
 

Author Comment

by:timokeeffe
Comment Utility
ShogunWade
could you go into BCP option in more detial? What is BCP and how do implement it?

sorry if this is a truly dumb question...
0
 
LVL 18

Expert Comment

by:ShogunWade
Comment Utility
BCP is the original method of getting data in and out of SQL server.   It is a command line utility which can export or import from text files (or native SQL files)

If you create a view on your table using the RTRIM(....) function eg:

CREATE VIEW vw AS
    SELECT ID,RTRIM(MyText) MyNewText
FORM MyTable


then at the command line, or using xp_cmdshell execute:

BCP "MyDb.dbo.vw" OUT "c:\mytempfile.txt" -SMyServer -N

it will create a file in native SQL format

then TRUNCATE TABLE MyTable

Then BCP your data back in


BCP "MyDb.dbo.vw" IN "c:\mytempfile.txt" -SMyServer -N


There is more on BCP command  in BOL  search for BCP Utility

0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
Personally, I have found that BULK INSERT is a much faster method of loading data.  Since there is no BULK OUTPUT command, I too recommend BCP to get the data out of the db, but BULK INSERT rather than BCP to load it back in.
0
 
LVL 18

Expert Comment

by:ShogunWade
Comment Utility
Must admit i havent ever benchmarked BCP as opposed to BULK insert.
0
 
LVL 18

Expert Comment

by:ShogunWade
Comment Utility
Has your question been answered?   If so could you please close it.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Introduced in Microsoft SQL Server 2005, the Copy Database Wizard (http://msdn.microsoft.com/en-us/library/ms188664.aspx) is useful in copying databases and associated objects between SQL instances; therefore, it is a good migration and upgrade tool…
I wrote this interesting script that really help me find jobs or procedures when working in a huge environment. I could I have written it as a Procedure but then I would have to have it on each machine or have a link to a server-related search that …
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now