Solved

update 12 millions rows with one column to other column in the same table.. based on where condition

Posted on 2009-05-06
40
923 Views
Last Modified: 2013-12-18
Hi,

EXPIRE and DUE are date fields in table PARTS
All the values in DUE column are null (it's a newly added column)
For status='Active'  the values for due should be populated with corresponding expire column values.

Doing this by
1.creation of temporary table
2.dropping indexes,triggers etc on old table and running them on this temp table
3. drop old table and rename temp table to actual table name

the above method is not acceptable by them.

I tried with simple update statement -- plz see code snippet but i may get rollback segment issue
when commit all the update records at once.

the rows matching the where condition are 12 million and the total rows in the table are 70 million


Please help me out with this.
update parts

set due=expire

where status='Active'

 

Commit;

Open in new window

0
Comment
Question by:loginboy
  • 20
  • 19
40 Comments
 
LVL 73

Accepted Solution

by:
sdstuber earned 250 total points
ID: 24321929
if you don't have sufficient rollback/undo,  can you increase it?  temporarily create a new rollback segment and then drop it.

if that's not feasible,  then determine how many rows you can update at once and try something like this...

change the 100000 limit to whatever your rollback can handle
BEGIN

    LOOP

        UPDATE parts

           SET due = expire

         WHERE status = 'Active' AND due IS NULL AND ROWNUM < 100000;
 

        EXIT WHEN sql%ROWCOUNT = 0;
 

        COMMIT;

    END LOOP;

END;

Open in new window

0
 

Author Comment

by:loginboy
ID: 24322022
sdstuber,
firstly, thanks.

It's not feasible to increase, creating or dropping a temp rollback segment..

Could you plz tell me how do i  determine number of rows that can be updated at once?

and in dba_segments it says block = 615.. (sorry for being vague, here)
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 24322169
I'd try it as is to begin with.  At 100K per loop that's only 120 transactions to cover 12 million rows.  If 100K is too many, you'll get an error on the first cycle of the loop and can reduce it.  If it does work then it'll continue until done.
0
 

Author Comment

by:loginboy
ID: 24322284
Okay, many thanks.

I shall run it and will let you know  with the progress.. or questions(if any).

Also, in the above example when would sql%rowcount=0??
so are every 100k records are committed?? like wise for 12 times commit?

0
 
LVL 73

Expert Comment

by:sdstuber
ID: 24322334
rowcount will be 0 when there is nothing left to update.

Note, this does assume expire is not null.  If any row will have a NULL expire, the loop will run forever
0
 

Author Comment

by:loginboy
ID: 24322437
Okay. But when are we commiting? after every 100k records are updated or after updation of 12M rows ?

After 100k rows , how is it checking for rows >100k which satisfy the where cond...

Thanks.
0
 
LVL 73

Assisted Solution

by:sdstuber
sdstuber earned 250 total points
ID: 24322512
you commit at the end of each cycle of the loop, line 9 of the snippet
the commit is inside the loop, so it does the commit after each 100K records (actually 99,999 since it's  < not <=)

there is no check for > 100K. because it doesn't have to.

Look at the update again.

UPDATE parts
           SET due = expire
         WHERE status = 'Active' AND due IS NULL AND ROWNUM < 100000;


that will update 99,999 rows that are Active and have null due dates.
once you perform that update, those rows no longer statisfy the where clause so the next time through the loop it will find a different set of rows.

this is why it's important that expire not be null.  If it is, then the loop will run forever because after doing the update due will still be null so those rows will remain and can be found on the every cycle through.


0
 
LVL 73

Expert Comment

by:sdstuber
ID: 24322525
in fact, to be save...

change the update to exclude rows that have a null expire, unless there is a contraint that forces that

UPDATE parts
           SET due = expire
         WHERE status = 'Active' AND due IS NULL AND expire is not null and ROWNUM < 100000;

0
 

Author Comment

by:loginboy
ID: 24322670
sdstuber,

Yes. Got it with full clarity.  btw,  how much time do you think this is going to take for the update as the table has 70 M rows ? And will it slow down if i use upper(status) = 'ACTIVE'

Thanks a lot.
0
 

Author Comment

by:loginboy
ID: 24326613
It's still running even after 1.5 hrs, how do i determine it's not going to indefinite loop? and how much time it might take to complete?
Plz advise, it's urgent

thnx
0
 
LVL 73

Assisted Solution

by:sdstuber
sdstuber earned 250 total points
ID: 24326692
I have no idea how long it will take
and yes, using upper(status) = 'ACTIVE' will likely make it slower.

check for null expires before running the script.

if that's too late then do

select count(*) from parts
where due is null

periodically and see if the number goes down
if it doesn't, then you're not progressing.

Note, your updates might be waiting on other sessions to release locks on one or more of the rows you are trying to update.


0
 

Author Comment

by:loginboy
ID: 24326847
May i use select count (1) from parts where due is null;

And before i posed the question (yesterday),
I ran this query

update parts
set due=expire
When part_status = 'Active';

it updated in 25 min and then

commit;

so overall 25 min it was done but this is in the dev db, would like to know how it might behave in prod db
as i din't face any rollback segment issues in the test db

Also, there are 2 two records with expire null but that is in test db,
and we included expire IS NOT NULL in the where condition.
so that shud be fine.
0
 
LVL 73

Assisted Solution

by:sdstuber
sdstuber earned 250 total points
ID: 24326933
compare your dev to production total data volume to estimate total time

if you can do it in one step do so.
the loop above is because, based on your posts, I thought you had already tried and failed because of rollback/undo limits

if you are simply guessing that the update won't work, don't.  12 million date updates is not very much.
if you are excluding rows where expire is not null, then you will be fine.

if you are still waiting, for the loops to finish, then your updates are probably blocked by other sessions.
Have your dba check your session to see if it's blocked
0
 

Author Comment

by:loginboy
ID: 24326956
select expire,due
from parts
WHERE status= 'Active'
AND Rownum < 10000
Order by part_status

i ran this query, yes it is populating-- but how 2 know which rec exactly it's updating??

also, plz answer my previous question? regarding test n prod db behaviour

thanks a lot
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 24327057
what previous question?  
    how long prod will take?  I have no idea.  too many factors to influence, and if you are waiting on other sessions, then it's dependent on that and has nothing to do with performance.

    will the null expire be okay?  yes,  if you are using the later version of the update with "and expire is not null" then yes it will be fine

    can you use count(1) instead of count(*)?  yes, but there's no reason to do that.  functionally it works, but semantically it doesn't make sense to do it.

you can't know which records are being updated by the loop.


this query doesn't make sense...

select expire,due
from parts
WHERE status= 'Active'
AND Rownum < 10000
Order by part_status


it says pick 9999 "random" rows where status is Active and then sort them by a column you aren't returning.
0
 

Author Comment

by:loginboy
ID: 24327126
okay,

checking the volume of prod db, will have dba check my session whther it's blocked?

how do i determine rollback/undo limit? how can i know it's value?

Also, sorry my last post and your answer went at same time.. thanks for that clarification
(i was undermining standalone UPDATE statment-- wasn't sure how it wud be behave in different env's)

Again ,thanks
0
 

Author Comment

by:loginboy
ID: 24327293

Sorry, it's order by status

select expire,due
from parts
WHERE status= 'Active'
AND Rownum < 10000
Order by status

when i ran this query i could see DUE has values same as expire (before due was null for all active)
so i could determine it's being populated


how long prod will take?  I have no idea.  too many factors to influence, and if you are waiting on other sessions, then it's dependent on that and has nothing to do with performance.


then will your approach shall be best, i suppose. as we are committing periodically, my standalone update query might have problem as committing all 12million at once
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 24332080
why would you order by status?  all of the status are Active, so that order by doesn't actually do anything

anyway, the main point of my previous comment was the order by is applied AFTER the rownum filter.
in this case it's moot because the order by doesn't really order anything
0
 

Author Comment

by:loginboy
ID: 24334516
yes, that order by doesn't make sense.

Also, the loop query is taking around 10 hrs for 9 M rows, while the stand alone update statement (with commit at last at once) is taking  on an average of 27 min  on test db.  I hope there won't be any problem with the stand alone query in prod db .

sean,  what is the difference between these 2 queries

UPDATE parts
set due = expire
where status='Active'

UPDATE parts
set due = expire
where status='Active'
AND id=id;

id is a unique key on parts. also, status has many values but we are populating the due column with corresponding expire column only if status is active.

Plz advise.







0
 

Author Comment

by:loginboy
ID: 24336876
can i  put the update query between BEGIN ... END: and commit outside.. as commit in the block will commit the record once update and will take time...


please reply...
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 73

Assisted Solution

by:sdstuber
sdstuber earned 250 total points
ID: 24337431
the loop will commit once per 100000 rows

wrapping the update inside begin/end will only add extra steps but won't change the commit frequency at all.


yes, the loop will take longer than a standalone query.

that's why I suggested making sufficient rollback at the beginning to handle the full statement.

if updating 100K rows at a time is too slow, make it bigger.

just as I said at the beginning of this whole thread. make it as big as your rollback will handle


it's a guess, not an exact number.  


just try it,  if it's too big you'll fail and rollback and the loop will end.  adjust and run again There's no downside here.

even if the loop took 36 hours to run it still would have been done long before now.



0
 

Author Comment

by:loginboy
ID: 24337626
Okay. The loop query is taking around 10hrs for 9.5M recs...

Query 1:-- 25-27 min  for 9.5M

Update parts
set due=expire
where status='Active';
/
commit;
/
------------------------------------
Query 2 -- 35 min for 9.5M

Begin

UPDATE table_site_part
SET x_actual_expire_dt = x_expire_dt
WHERE part_status = 'Active';

dbms_output.put_line('Number of rows : '||sql%rowcount);
END;
/
COMMIT
/

either query 1 or 2 i'm not facing any rollback segment issues in test db...
and as Prod Db is more robust.. hoping won't face any prob...

Plz advise...
0
 

Expert Comment

by:greyhuman
ID: 24337864
using a Partial Commit is the best solution.
Use a counter to Manipulate it.
================================
counter := 0
Update Part ...
counter := counter + 1
if counter = 1000 then
  commit;
  counter := 0;
end if;
================================

Try USing HINT /*APPEND/ Nologging etc and do this when database is least in USE
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 24340704
number 2 is not helpful,  all you're doing is wrapping begin/end around the update and that doesn't change the transaction at all

and as I suggested multiple times above,  if you can do it in one step then do so.  the ONLY reason I suggested the looping method above is because you said the prod rollback was insufficient.



grayhuman's suggestion is the same as mine, except it will run VERY slow since it updates one row at a time.  I suggest NOT following that as it will be the most resource intensive version of all suggested above.

APPEND won't help either since it's not an insert
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 24353502
need anything else?
0
 

Author Comment

by:loginboy
ID: 24353771
Hi sdstuber,

thanks for the logic...
it's working fine..
 increased the rownum size to 3M --- it's taking one hr 20 min for 9M rows... tried with 6M it's also taking the same time...  also i would like to know will the query parse through whole table scan like 80 M rows and only update
the rows matching the where clause??
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 24354282
it might do a table scan,  it might use an index.  depends on your table and your statistics and the clustering factor of the index.

Remember, if you use the looping method, include the and "due is NULL" and "expire is NOT null" clauses or the loops will never end

0
 
LVL 73

Assisted Solution

by:sdstuber
sdstuber earned 250 total points
ID: 24354297
easiest way to tell, do an explain plan for the query on production before running it.  Don't do an explain plan on the test system and expect it to be the same on production.
0
 

Author Comment

by:loginboy
ID: 24356354
Okay. also,

UPDATE parts
set due= expire
where status='Active';

this query is doing the job in less than 30 min on test db for 9M, total rows 66M
But they say in the proddb it's never going to finish.It has 12M , total rows 80 M
what could be the problem...
why will it never finish on prod???
The loop query is taking 1hr 20 min on test db..
BEGIN
  LOOP
    UPDATE parts
    SET due  = expire
    WHERE status = 'Active'
    AND due IS NULL
    AND expire        IS NOT NULL
    AND ROWNUM              < 3000000;
    EXIT
  WHEN sql%ROWCOUNT = 0;
   dbms_output.put_line('No of rows Updated = ' || sql%rowcount);
    COMMIT;
  END LOOP;
END;

How will it be using the loop query on prod??
What's the difference loop query and stand alone?

Begin
UPDATE parts
set due= expire
where status='Active';
Commit;
End;

Wil the above query help??

Plz help..
0
 
LVL 73

Assisted Solution

by:sdstuber
sdstuber earned 250 total points
ID: 24356594
have your dba's explain why it will never finish.  What is their reasoning?

The only reason the standalone wuery won't work is if you run out of rollback then switch to the loop version.

The only reason the loop won't work is if you run out of rollback (then lower the rownum limit until it works)  

If this table is used a lot then you may have problems with lots of contention and waiting on row locks,  either your process waiting on others or others waiting on you.  If the former, then you may wait a long, long time.
If the latter, then your users may wait too long and you'll cause problems for them forcing you to abort.

the difference between standalone and loop is the loop works in smaller chunks the standalone works on the whole table.

wrapping the query in begin/end does nothing to help you at all.  In fact, it causes a tiny bit of extra overhead for a pl/sql to sql context switch so it actually hurts you but not appreciably for a task this big.

If you will have contention problems with other users, then your best bet is to use the looping method with small rownum limits.  That way you never lock very many rows and are never waiting on very many rows to be unlocked by other users.  This will make the process run longer though.

It's a balancing game.  If you're looking for a magic answer that will make everyone happy, you won't get one.  You want this process to run as "big" as possible but with the least contention with others which means make it "small".

Since it can't be both "big" and "small" at the same time, you, your dba's and your users will have to work out a schedule and resource consumption that works for everybody.

Since this thread has already taken 5 days, it's obvious that speed is not an issue.  So, I recommend using the loop method and set the rownum limit to something sort of small, like maybe 10000  and let it run for a few hours.  Check periodically to see what kind of progress it's making.  If it's taking too long and not having much contention then kill it and restart with a larger number.  If it is creating contention even with a small number, then either make it the limit smaller, or see if you can arrange a temporary outage (over lunch or off-hours) when nobody will be locking the table and update it then.

0
 

Author Comment

by:loginboy
ID: 24356920
Okay,Thnx. stand alone query worked well in test db.. so no problem with roll back.obviously, even with loop
query i din't face any problem when tested with rownum< 1M, 3M, 6M. so no way it's rollback issue. as prod db is robust than test db.

If the stand alone query is run at down time will I still face contention and waiting on row locks problem??
And will stand alone query lock all the rows in the table??

What if i use loop query in downtime?? with balancing big and small rownums?

loop query when rownum< 100k then it tookmore than 12 hrs...
loop query took 3.2 hrs for rownum<1M for 9M rows-- That is reasonable... but downtime is the thing right?
0
 

Author Comment

by:loginboy
ID: 24357023
also, please tell about contention problem??
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 24357253
contention= one session waiting on another session.

I really don't know what else you're looking for in this thread.

Sounds like you've done lots and lots of testing which should give you confirmation that whatever method you choose will do what you want.

Since your tests are on a different machine with a different configuration and different data volumes you are only getting vague guidelines as to how the process will run in production.

There is no formula for saying it took X minutes in test therefore it will Y minutes in production.  

You have more data in production so it will take longer, but you probably have a more powerful machine so it will run faster.  But you probably have more users in production so the machine is busy doing other work, so it will run slower and some of those users may block your session if they lock one or more rows to it won't just be slow, it will be stopped completely until they release their locks and it continues.

Add up all that and there is no way for me to tell you how long it will take in production.   I've explained the risks in several different ways above and explained alternatives for addressing them,  but eventually you will simply have to try it.

There is no way that I or anybody else at EE can make the risk assessment for you.  All I can do it give you the explanations above.  Now you must use YOUR understanding of your server, YOUR understanding of your users, YOUR understanding of your application and YOUR understanding of your requirements to make the best decision.    Use your dba's, application support team and users to assist in the decision process.  

I will try to help you more if I can, but the last few dozen posts are essentially just rehashing the same information and questions again.

Now you simply have to pick one of the many variations above you're most comfortable with and run it.  My recommendation, is IF (and only IF) you have concerns about contention or resource consumption then start small and increase to as large as your system can handle;  if that IF is not true, then do the reverse,  try to run as big as possible (the standalone query) first and then reduce in size as needed.

Safer (least impact to others while running) = start small, increase as applicable.
Faster (but possibly impacting others) = start big, shrink as needed

Again, you must pick the balance between them.  I can't decide that for you nor can I tell you exactly what value might work best.  You'll have try something and see.
0
 

Author Comment

by:loginboy
ID: 24357829
Yes sdstuber. I'm getting the point. i completely understand that no one can suggest a magic number or right behaviour on prod. It's all trail and error to resolve this problem. I'm working on it. Also, plz can I keep this question open for some more time...  just in case...

thanks..
0
 

Author Comment

by:loginboy
ID: 24366315
Hi sdstuber...

just a quick thing.. could we do this with something like parallel query.. plz opine about it..
0
 
LVL 73

Assisted Solution

by:sdstuber
sdstuber earned 250 total points
ID: 24366460
if the table is defined with parallel and your have parallel_threads_per_cpu greater than 1 and/or have more than one cpu  then your update can be parallelized.

If you do have contention then parallel or serial won't matter.
0
 

Author Comment

by:loginboy
ID: 24381699
Hi sdstuber,

I have done with stand alone query with parallel hint. I have more than 1 cpu and parallel_threads_per_cpu is 2. there are 30 distinct process with execution servers 128.  It's taking less than 15 min.

Thanks a lot for the all your valuable input ...

update /* + parallel(parts,30)*/ parts

set due=expire

where status='Active'

and expire is not null

Open in new window

0
 
LVL 73

Expert Comment

by:sdstuber
ID: 24381711
glad I could help
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 24381719
I see this is your first question on EE,  welcome!!

If you are done with this question, please close it, accepting whichever post above (or more than one) you found helpful and assign a grade.
0
 

Author Closing Comment

by:loginboy
ID: 31578839
Thank You for being part at every step of the solution
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Note: You must have administrative privileges in order to create/edit Sharing Rules. Salesforce.com (http://www.salesforce.com) (SFDC) is a cloud-based customer relationship management (CRM) system. It is a database most commonly used by sales an…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
This video explains at a high level with the mandatory Oracle Memory processes are as well as touching on some of the more common optional ones.
This video shows how to recover a database from a user managed backup

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now