25112
asked on
improving with a simple query which handles lot of data
The base query is: Update A Set A.col1 = B.col2 from A join B on A.col1 = B.col1
Ultimately what I will doing it putting this in a loop, because there are too many records, and I would like to do it in a batch, so it is committed quicker
DECLARE @Increment tinyint, @Iterate bigint
SELECT @Iterate = CEILING(COUNT(*) / 5000.0) FROM A JOIN B ON A.Col1 = B.Col1
SET @Increment = 1
WHILE @Increment <= @Iterate
BEGIN
UPDATE A SET A.col1 = B.col2 FROM A join B ON A.col1 = B.col1
JOIN (SELECT TOP 5000 A.Col1 FROM A INNER JOIN B ON A.Col1 = B.Col1 WHERE A.Col1 <> B.Col2) C
ON A.Col1 = C.Col1
SET @Increment = @Increment + 1
END
But to test the efficieny of the index on the join, I am testing on a SELECT and What is being run is:
SELECT A.Col1 , B.Col2
FROM A
INNER JOIN B
ON A.Col1 = B.Col1
JOIN
(SELECT TOP 1 A.Col1
FROM A
INNER JOIN B
ON A.Col1 = B.Col1
WHERE A.Col1 <> B.Col2
) C
ON A.Col1 = C.Col1
IO Statistics for the above shows
Table 'A'. Scan count 789628, logical reads 3562058, physical reads 491, read-ahead reads 128015.
Table 'B'. Scan count 1, logical reads 2540, physical reads 0, read-ahead reads 32.
A has 20M records
B has 500k records.
Both A.col1 & B.col1 are indexed. (Non clustered index with only one column in it)
Am I doing the best with the index or is there something more than be done to improve the query?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
1. where a.col1 < b.col2
2. where a.col1 > b.col2
rather than a.col1 <> b.col2
good idea- thanks.
2. where a.col1 > b.col2
rather than a.col1 <> b.col2
good idea- thanks.
ASKER
>>after you change a,col1 to b.col2 is it possible that that introduces further matches again in col1?
perhaps.. (possible).. but that satisfied business need, right? any alternatives?
ASKER
>>not sure why you bother with the initial count query
which column do you think we can get away with? @Iterate tells us how many times we need to update, if we update 5000 at a time.. @Increment is compared against @Iterate .. can you illustrate how you would use @@rowcount here..
ASKER
>>so how many a rows are you going to have updated in the end?
18.5 M is what I see.. will be updated...
>>what datatype is col1?
all columns involved are 'int'
>>does col1 appear in any other table A indexes?
one more index.. where it is the first column, and then there are few more columns after that. The index exists regardless of this operation. I am willing to make more index if needed. The PK is non clustered unique index. there is not Clustered index at this time on this table.
>>there aren't too many constraints defined on the table
do you FKs?
>>then bcp unloading the table via a select to substitute the new col2 value , then drop/recreate table, bcp load new data bcp loads data into
i know bcp loads data in bulk to textfiles.. how can it do compare and replace.. do we do copy/paste in notepad?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
i am following you.
could you also tel the rationale for the index (B.col1,B.col2)
could you also tel the rationale for the index (B.col1,B.col2)
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
thanks Sir
ASKER
do you mean B.COl2 as included column?