Solved

improving with a simple query which handles lot of data

Posted on 2011-09-28
12
243 Views
Last Modified: 2012-05-12

The base query is: Update A Set A.col1 = B.col2 from A join B on A.col1 = B.col1

Ultimately what I will doing it putting this in a loop, because there are too many records, and I would like to do it in a batch, so it is committed quicker

DECLARE      @Increment tinyint, @Iterate bigint
SELECT      @Iterate = CEILING(COUNT(*) / 5000.0) FROM A JOIN B ON A.Col1 = B.Col1
SET @Increment = 1
WHILE @Increment <= @Iterate
      BEGIN
UPDATE A SET A.col1 = B.col2 FROM A join B ON A.col1 = B.col1
JOIN (SELECT TOP 5000 A.Col1 FROM A INNER JOIN B ON A.Col1 = B.Col1 WHERE A.Col1 <> B.Col2) C
ON A.Col1 = C.Col1
SET @Increment = @Increment + 1
      END

But to test the efficieny of the index on the join, I am testing on a SELECT and What is being run is:

SELECT A.Col1 , B.Col2
FROM A
INNER JOIN B
ON A.Col1 = B.Col1
JOIN
(SELECT TOP 1 A.Col1
FROM A
INNER JOIN B
ON A.Col1 = B.Col1
WHERE A.Col1 <> B.Col2
) C
ON A.Col1 = C.Col1

IO Statistics for the above shows
Table 'A'. Scan count 789628, logical reads 3562058, physical reads 491, read-ahead reads 128015.
Table 'B'. Scan count 1, logical reads 2540, physical reads 0, read-ahead reads 32.

A has 20M records
B has 500k records.

Both A.col1 & B.col1 are indexed. (Non clustered index with only one column in it)

Am I doing the best with the index or is there something more than be done to improve the query?


0
Comment
Question by:25112
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 4
12 Comments
 
LVL 25

Accepted Solution

by:
TempDBA earned 100 total points
ID: 36813071
You need to have index on (B.col1,B.col2)
0
 
LVL 50

Assisted Solution

by:Lowfatspread
Lowfatspread earned 400 total points
ID: 36813465
yes have the index on col1,col2 for table b...

you may find that

doing the update in 2 phases is better

e,g.

1. where a.col1 < b.col2
2. where a.col1 > b.col2

rather than a.col1 <> b.col2

after you change a,col1 to b.col2  is it possible that that introduces further matches  again in col1?
not sure why you bother with the initial count query ... wouldn't it be safer to loop around until your update
statement don't update anything? (by doing a select @@rowcount check after the update...)
0
 
LVL 50

Assisted Solution

by:Lowfatspread
Lowfatspread earned 400 total points
ID: 36813498
A has 20M records
B has 500k records.

so how many a rows are you going to have updated in the end?
is the distribution of b to a fairly even or can it vary widely?
what datatype is col1 (if its variable length will the update significantly increase/decrease the data stored?) ?

does col1 appear in any other table A indexes?
is the col1 index just for this processing...
what is the clustering index on table A?

if a large proportion of the 20million rows are being affected , and there aren't too many constraints defined on the table
then bcp unloading the table via a select to substitute the new col2 value , then drop/recreate table, bcp load new data, recreate any additional indexes maybe a better strategy...

0
NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

 
LVL 5

Author Comment

by:25112
ID: 36814543
>>You need to have index on (B.col1,B.col2)

do you mean B.COl2 as included column?
0
 
LVL 5

Author Comment

by:25112
ID: 36814546
1. where a.col1 < b.col2
2. where a.col1 > b.col2

rather than a.col1 <> b.col2

good idea- thanks.
0
 
LVL 5

Author Comment

by:25112
ID: 36814549

>>after you change a,col1 to b.col2  is it possible that that introduces further matches  again in col1?

perhaps.. (possible).. but that satisfied business need, right? any alternatives?
0
 
LVL 5

Author Comment

by:25112
ID: 36814552

>>not sure why you bother with the initial count query

which column do you think we can get away with? @Iterate tells us how many times we need to update, if we update 5000 at a time.. @Increment is compared against @Iterate .. can you illustrate how you would use @@rowcount here..
0
 
LVL 5

Author Comment

by:25112
ID: 36814555

>>so how many a rows are you going to have updated in the end?
18.5 M is what I see.. will be updated...

>>what datatype is col1?
all columns involved are 'int'

>>does col1 appear in any other table A indexes?
one more index.. where it is the first column, and then there are few more columns after that. The index exists regardless of this operation. I am willing to make more index if needed. The PK is non clustered unique index. there is not Clustered index at this time on this table.

>>there aren't too many constraints defined on the table
do you FKs?

>>then bcp unloading the table via a select to substitute the new col2 value , then drop/recreate table, bcp load new data bcp loads data into
i know bcp loads data in bulk to textfiles.. how can it do compare and replace.. do we do copy/paste in notepad?
0
 
LVL 50

Assisted Solution

by:Lowfatspread
Lowfatspread earned 400 total points
ID: 36814991
given this is possible
>>after you change a,col1 to b.col2  is it possible that that introduces further matches  again in col1?
perhaps.. (possible).. but that satisfied business need, right? any alternatives?



THEN
>>not sure why you bother with the initial count query
which column do you think we can get away with?[ @Iterate tells us how many times we need to update, if we update 5000 at a time.. @Increment is compared against @Iterate .. can you illustrate how you would use @@rowcount here..


the @iterate calculation is no longer valid...

DECLARE      @rows int,@modified bigint

SElect @rows=-1,@modified=0
wHILE @rows<>0
      BEGIN
UPDATE A SET A.col1 = B.col2
  FROM A join B 
   ON A.col1 = B.col1 
JOIN (SELECT TOP 5000 A.Col1 
            FROM A INNER JOIN B ON A.Col1 = B.Col1 WHERE A.Col1 <> B.Col2) C
ON A.Col1 = C.Col1
SElect @rows=@@rowcount,@modified=@modified+@rowcount
print getdate()
print @modified
end

Open in new window

   

>>there aren't too many constraints defined on the table
do you FKs?

Yes foreign keys

Bcp...

Not you output the table via a query which substitutes the desired value of the column... basically your select statement
then you drop the table
then use bcp to reload the data...

since 90%+ of the table is being altered this is the method i would go with... it will avoid a lot of logging...
however can you construct the correct "output" query... goinf back to your first response about it being a business requirement to suport multiple transfers that implies you need additional information on table b to indicate the order in which the transfers should be applied (or else you potentially have a never ending loop a-b,b-c,c-a...)

even if multiple transfers are not intended ... you probably still need some ordering information on table B if it is valid
for transfers into and out of the same values...
you need to do some validation / analysis of the data and discuss with the business what the intended result of the
"transfer process is to be"

hth

ps these considerations should be basic practice for a DBA so please involve your know before you progress further with the design/implementation
0
 
LVL 5

Author Comment

by:25112
ID: 36893870
i am following you.

could you also tel the rationale for the index (B.col1,B.col2)
0
 
LVL 50

Assisted Solution

by:Lowfatspread
Lowfatspread earned 400 total points
ID: 36893979
because then the index is a covering index and sql does not have to access the row in table b if it decides to use the index....
0
 
LVL 5

Author Comment

by:25112
ID: 36894436
thanks Sir
0

Featured Post

Get HTML5 Certified

Want to be a web developer? You'll need to know HTML. Prepare for HTML5 certification by enrolling in July's Course of the Month! It's free for Premium Members, Team Accounts, and Qualified Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Recently we ran in to an issue while running some SQL jobs where we were trying to process the cubes.  We got an error saying failure stating 'NT SERVICE\SQLSERVERAGENT does not have access to Analysis Services. So this is a way to automate that wit…
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
Via a live example, show how to setup several different housekeeping processes for a SQL Server.

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question