Solved

Oracle 10g Turn off Table Logging During ETL

Posted on 2012-03-15
21
1,036 Views
Last Modified: 2012-03-22
Dear Experts,

I have an Oracle 10g database (we'll call PITB) that is a reporting database. A change data capture solution keeps the data in PITB up-to-date.  Data is passed to PITB from an Informix database running on AIX RS6000 64-bit .  

On occasion, we need to run an ETL from the Informix database to our PITB which takes 22 hours.  This ETL time will increase over time as data continues to grow.

I've been trying to think of a way to speed up the ETL and have decided that it may help to turn off table logging for the specific tables being ETL'd.  The thought is that if there is no redo log switching happening and no archiving occurring during the ETL, then it will finish sooner.  Then table logging would be immediately turned on again after completion of the ETL.

The commands I was thinking to do are:

Turn off logging:    alter table xxxxx nologging;
(run the ETL)
Turn on logging:  alter table xxxxx logging;

The PITB database also captures data changes and streams them to another Oracle database.  Additionally, the PITB database uses dataguard to populate a DR copy of itself.  

I realize that by turning off table logging, that I would need to instantiate the downstream database as well as the DR database after the ETL completes and table logging is turned back on.  Although not appealing to me, this is acceptable if I'm able to make significant time gains in the ETL process.


I'm concerned that I may not be considering all pros & cons to turning off table logging for the ETL'd tables.

Can anyone shoot some holes in this idea and save me from a foolish act?
Can anyone offer alternatives to speeding up the ETL process?

The host server for the PITB database is an AIX RS6000 64-bit.
The source database/server is an Informix db on an AIX RS6000 64-bit.

The ETL process is being handled by iWay Data Migrator via a Windows 2003 32-bit server.

Thanks!
0
Comment
Question by:vocogov
21 Comments
 
LVL 5

Expert Comment

by:Bajwa
Comment Utility
Here are the questions I would be asking.

1.  Is informix database running the ETL or oracle database there might be implicit data conversion going on.

2.  What is the ETL doing for 22 hours?  figure out the time spent in each area (network, disk I/O, CPU) and if you get the Disk I/O as the biggest culprit then turn off the logging for the identified tables.  Make sure to have a backup after ETL is finished.

3.  If ETL is running during low system usage then Check if DML parallelism is being utilized for faster processing.

4.  see if you can drop indexes on the oracle side and rebuild them when done. (it is a better option than turning off logging)

what I do with my ETL is about 100GB of data processed in 2 hours across database link (oracle-oracle)

1.  Drop all indexes before I begin
2.  Run the ETL (inserts, updates, delete, transformations, etc).
3.  Validate the data( build constraints)
4.  Build the indexes.
0
 
LVL 1

Author Comment

by:vocogov
Comment Utility
Thanks Bajawa.

The ETl is being executed  on a Windows 2003 32-bit server through a tool called iWay Data Migrator.  In the Data Migrator we've set up things called data flows that have linked to both the Informix and oracle databases to create a data mapping.  That's where the conversion takes place and would certainly add to run time.

The ETL is being run for 231 tables at this time.  Each table has it's own Data Flow process that is executed.  We could have several data flows running at the same time (parallelism?) or only one.

The largest table of 41 million records is that one that takes 22 hours regardless of whether other dataflows are running or not .  

I will definitely look into dropping the indexes before the next ETL we perform (which may be tomorrow) and rebuild them afterward.  

Very excited to try this approach.
0
 
LVL 16

Accepted Solution

by:
Wasim Akram Shaik earned 450 total points
Comment Utility
if you go with this approach what expert bajwa has suggested, want to add few points which you can consider..


in your ETL operations, specify insert /*+ append*/ into instead of normal insert statements, by doing this

Using this, Oracle bypasses the buffer cache and writes data directly to datafiles. which will save some time for data reading and writing to buffer cache by background processes.

instead of dropping the indexes, you can make the non-unique indexes as unusable, this will also save some time as you don't have to create index again..

alter index index_name unusable

alter session set skip_unusable_indexes=true

however , you can make the unique indexes as unusable, you have to drop them..!!

not to mention, as there may be some staging tables(i believe those tables are truncated before and after use)..
0
 
LVL 8

Assisted Solution

by:Christoffer Swanström
Christoffer Swanström earned 50 total points
Comment Utility
If you do ALTER TABLE ... NOLOGGING, it will only affect direct-path INSERTs, i.e. with the /*+ APPEND */ hint. UPDATEs, DELETEs and regular INSERTs will not be affected. For direct-path INSERTs the redo log will be minimized, but there will always be some.

However, I think you will not gain a lot by minimizing redo logging, your problem probably lies elsewhere. Could you give some more details on the table structure and what exactly the ETL process is doing?
0
 
LVL 1

Author Comment

by:vocogov
Comment Utility
hmmm you are giving me lots to think about.  

For the ETL process, we truncate the target tables so we are loading to a fresh slate.  I am assuming that the ETL process reads from the first record of each source table and writes it to the target table.  The tool we use, Data Migrator, is a gui based interface and I don't know the underlying command/process it uses to perform the ETL.  

The Load Options available to us in Data Migrator are:

Load Type Options:  
     Key Matching Logic (we've selected this one)
     Insert records from memory (tried this one and doesn't run faster)
     Bulk load utility via a disk file (we're reading from a database not a flat file)
     Enable Change Data Capture
     Slowly Changing Dimensions

If the record exists options:
    Include the record (we've selected this one)
    Reject the record
    Update the existing record
    Delete the existing record

If the record does not exist: Include the record (this is the only option)

Commit every 1000 rows.

There are no constraints on the target tables other than the primary key which exactly matches the source tables.  There are indexes on many of the tables.

I'm thinking that I may leave the primary key indexes enabled and make all other indexes unusable instead of dropping them. Is that a viable approach?
0
 
LVL 1

Author Comment

by:vocogov
Comment Utility
BTW a comment for Tosse,
I know that during our ETL process, Oracle generates hundreds of archivelogs.  so I'm not sure how to understand your comment.
0
 
LVL 1

Author Comment

by:vocogov
Comment Utility
oh...one other thing I just realized.  Data Migrator requires the source and target tables have a primary key.  So it must be reading in primary key order and doing a sort of append to the target table.  Sorry I don't have a way to see the code being used.
0
 
LVL 16

Expert Comment

by:Wasim Akram Shaik
Comment Utility
yes, thats what i said, instead of dropping the non-unique indexes, you can make them unusable, that would save some of ur time in rebuilding them again..
0
 
LVL 1

Author Comment

by:vocogov
Comment Utility
Thanks everyone.
I've gotten the greenlight to run the ETL at the end of the day.  Planning to set all indexes to 'unusable' before the run.
Will update on Monday of results.
0
 
LVL 16

Expert Comment

by:Wasim Akram Shaik
Comment Utility
More points to be considered before you go with this activity....
---Planning to set all indexes to 'unusable' before the run.

As, I had mentioned in my earlier comment, this should be applied to only non-unique indexes,

As you cannot load into a segment with a UNIQUE index defined that is in UNUSABLE state, you'll get an error "ORA-26026: unique index ... initially in unusable state"

You have to drop and recreate the unique indexes.

If you want to use an UNIQUE index that is not maintained during the data load you need to drop it and re-create after the load.

There is a however a way to circumvent this: You can support a UNIQUE or PRIMARY KEY constraint by means of a non-unique index. This way you can either set the constraint deferrable, or disable the constraint before the load while keeping the index in unusable state and rebuilding the index and re-enabling the constraint afterwards.

Also note this point,

A truncate automatically makes the indexes usable

So while data loading if you do this,

 ALTER INDEX index_name UNUSABLE
 TRUNCATE TABLE <table_name>

The indexes will get automatically usable and you will not find any difference of first alter statement being fired..

So you should remember to modify your DDL statments as:

 TRUNCATE TABLE
ALTER INDEX ... UNUSABLE

Hope it helps..!!!
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 1

Author Comment

by:vocogov
Comment Utility
With setting the indexes to unusable, the ETL time went from 22 hours to 20.5 hours.

I wish very much that it could go faster like Bajawa's ETL (100g in 2 hours?  brag brag brag.. :))
But its seems that going from Informix to Oracle and requiring the Windows iWay Data Migrator as the middle-man may be the bottle-neck.  

Any other thoughts from anyone?

Maybe Tosse could expand more on the comment about archiving and direct-path inserts?
0
 
LVL 16

Expert Comment

by:Wasim Akram Shaik
Comment Utility
have you tried the suggestion from my comment http:#a37728781, i already have told you about the direct path inserts...
0
 
LVL 1

Author Comment

by:vocogov
Comment Utility
Thanks wasimibm,  
In my comment above #A37729107  I tried to explain that I don't exactly have a SQL I can modify to do direct path inserts.   I'm using a tool called iWay Data Migrator that uses odbc to connect to both the Informix and Oracle databases and allow me to map two tables together for an ETL.  My ONLY experience with ETL'ing is through this tool.

The only thing close to a script that I can grab is the attached script obtained from inside Data Migrator.  The attached script is supposed to be a text version of what the interface built for me based on what I mapped through the GUI options for one of the 231 tables that is being ETL'd.
data-flow-script.txt
0
 
LVL 16

Expert Comment

by:Wasim Akram Shaik
Comment Utility
understood author, ok then you can't control the sql statements which are being fired as they are done via the tool..!!!

ok.. can't you even control the truncation of tables before the activity??
 i think tool should do  this task, as a part of its operation, if not then add this activiy before starting over and check out if this helps..
0
 
LVL 1

Author Comment

by:vocogov
Comment Utility
I did truncate the tables before running ETL.  I have always done that.  Sorry if I wasn't clear on that.
The only thing new I've done is make the indexes unusable which gained me a valuable 1.5 hours.  I'm very happy about that.  
Wondering if there is anything else I could try.  Obviously SQL code is not available to me.

Perhaps something with the ODBC configuration?
0
 
LVL 1

Author Comment

by:vocogov
Comment Utility
wasimibm,
I've been digging around in the manuals for Data Migrator and there is indeed an option to use a direct-load approach when ETL the data.
I've set up a test to run tonight after hours on one of the larger tables (not the 20 hour one but a 4 hour one).  Hopefully I will have something positive to report in the morning.

Thanks for your help.
0
 
LVL 16

Expert Comment

by:Wasim Akram Shaik
Comment Utility
oh.. that was nice to hear. just to mention something, i hope you are doing it..

the order of the truncate and alter index unusable should be like this

first you have to truncate the table and after that only set the alter indexes unusable..

lets see how much time you are gonna save this time..!!!!
0
 
LVL 1

Author Comment

by:vocogov
Comment Utility
Thanks.  Yup..I was doing the truncate first and then the unusable thing.  worked very well.

here's the report on the direct load approach:
I tested it on a table that took 4 hours last week.  This time it took 43 minutes.  Wow!!

One thing that wasn't good is that right after I started the direct-load script, I couldn't do anything in the server like open other windows and such.  Couldn't even open the task bar. The server response was severely degraded.  

I will look further into other settings to see if I can limit how much memory gets sucked out by the direct-load approach here and try another test.  

This is very good progress.  :)
0
 
LVL 16

Expert Comment

by:Wasim Akram Shaik
Comment Utility
wow.. thats a good news vocogov, but I am afraid, I couldn't help you much in the regard of "The server response " because its totally pertaining to the hardware configuration of the system, how much memory available in the system, what is the paging and performance and load on CPU etc., things,

well, try out the things you had mentioned over, if you don't get much.. you can always open a related question, may be some capacity gurus will come over and help you with configuration setttings..!!!
0
 
LVL 1

Author Comment

by:vocogov
Comment Utility
I tried another test and this time set the commit rate to 1000 records instead of the default (I'm not sure what the default is).  It still ran in 43 minutes and the server showed no type of strain or poor response.  Everything worked beautifully.  I'm floored and can't wait to show the boss!
Thanks for you help!
0
 
LVL 16

Expert Comment

by:Wasim Akram Shaik
Comment Utility
Cool... I am glad that it worked...!!!

Also close this question, if you think you got the answer which you are looking for.!!!
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Working with Network Access Control Lists in Oracle 11g (part 1) Part 2: http://www.e-e.com/A_9074.html So, you upgraded to a shiny new 11g database and all of a sudden every program that used UTL_MAIL, UTL_SMTP, UTL_TCP, UTL_HTTP or any oth…
Configuring and using Oracle Database Gateway for ODBC Introduction First, a brief summary of what a Database Gateway is.  A Gateway is a set of driver agents and configurations that allow an Oracle database to communicate with other platforms…
This video shows how to copy a database user from one database to another user DBMS_METADATA.  It also shows how to copy a user's permissions and discusses password hash differences between Oracle 10g and 11g.
This video explains at a high level about the four available data types in Oracle and how dates can be manipulated by the user to get data into and out of the database.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now