Link to home
Start Free TrialLog in
Avatar of Julie Kurpa
Julie KurpaFlag for United States of America

asked on

Oracle 10g Turn off Table Logging During ETL

Dear Experts,

I have an Oracle 10g database (we'll call PITB) that is a reporting database. A change data capture solution keeps the data in PITB up-to-date.  Data is passed to PITB from an Informix database running on AIX RS6000 64-bit .  

On occasion, we need to run an ETL from the Informix database to our PITB which takes 22 hours.  This ETL time will increase over time as data continues to grow.

I've been trying to think of a way to speed up the ETL and have decided that it may help to turn off table logging for the specific tables being ETL'd.  The thought is that if there is no redo log switching happening and no archiving occurring during the ETL, then it will finish sooner.  Then table logging would be immediately turned on again after completion of the ETL.

The commands I was thinking to do are:

Turn off logging:    alter table xxxxx nologging;
(run the ETL)
Turn on logging:  alter table xxxxx logging;

The PITB database also captures data changes and streams them to another Oracle database.  Additionally, the PITB database uses dataguard to populate a DR copy of itself.  

I realize that by turning off table logging, that I would need to instantiate the downstream database as well as the DR database after the ETL completes and table logging is turned back on.  Although not appealing to me, this is acceptable if I'm able to make significant time gains in the ETL process.


I'm concerned that I may not be considering all pros & cons to turning off table logging for the ETL'd tables.

Can anyone shoot some holes in this idea and save me from a foolish act?
Can anyone offer alternatives to speeding up the ETL process?

The host server for the PITB database is an AIX RS6000 64-bit.
The source database/server is an Informix db on an AIX RS6000 64-bit.

The ETL process is being handled by iWay Data Migrator via a Windows 2003 32-bit server.

Thanks!
Avatar of Bajwa
Bajwa
Flag of United States of America image

Here are the questions I would be asking.

1.  Is informix database running the ETL or oracle database there might be implicit data conversion going on.

2.  What is the ETL doing for 22 hours?  figure out the time spent in each area (network, disk I/O, CPU) and if you get the Disk I/O as the biggest culprit then turn off the logging for the identified tables.  Make sure to have a backup after ETL is finished.

3.  If ETL is running during low system usage then Check if DML parallelism is being utilized for faster processing.

4.  see if you can drop indexes on the oracle side and rebuild them when done. (it is a better option than turning off logging)

what I do with my ETL is about 100GB of data processed in 2 hours across database link (oracle-oracle)

1.  Drop all indexes before I begin
2.  Run the ETL (inserts, updates, delete, transformations, etc).
3.  Validate the data( build constraints)
4.  Build the indexes.
Avatar of Julie Kurpa

ASKER

Thanks Bajawa.

The ETl is being executed  on a Windows 2003 32-bit server through a tool called iWay Data Migrator.  In the Data Migrator we've set up things called data flows that have linked to both the Informix and oracle databases to create a data mapping.  That's where the conversion takes place and would certainly add to run time.

The ETL is being run for 231 tables at this time.  Each table has it's own Data Flow process that is executed.  We could have several data flows running at the same time (parallelism?) or only one.

The largest table of 41 million records is that one that takes 22 hours regardless of whether other dataflows are running or not .  

I will definitely look into dropping the indexes before the next ETL we perform (which may be tomorrow) and rebuild them afterward.  

Very excited to try this approach.
ASKER CERTIFIED SOLUTION
Avatar of Wasim Akram Shaik
Wasim Akram Shaik
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
hmmm you are giving me lots to think about.  

For the ETL process, we truncate the target tables so we are loading to a fresh slate.  I am assuming that the ETL process reads from the first record of each source table and writes it to the target table.  The tool we use, Data Migrator, is a gui based interface and I don't know the underlying command/process it uses to perform the ETL.  

The Load Options available to us in Data Migrator are:

Load Type Options:  
     Key Matching Logic (we've selected this one)
     Insert records from memory (tried this one and doesn't run faster)
     Bulk load utility via a disk file (we're reading from a database not a flat file)
     Enable Change Data Capture
     Slowly Changing Dimensions

If the record exists options:
    Include the record (we've selected this one)
    Reject the record
    Update the existing record
    Delete the existing record

If the record does not exist: Include the record (this is the only option)

Commit every 1000 rows.

There are no constraints on the target tables other than the primary key which exactly matches the source tables.  There are indexes on many of the tables.

I'm thinking that I may leave the primary key indexes enabled and make all other indexes unusable instead of dropping them. Is that a viable approach?
BTW a comment for Tosse,
I know that during our ETL process, Oracle generates hundreds of archivelogs.  so I'm not sure how to understand your comment.
oh...one other thing I just realized.  Data Migrator requires the source and target tables have a primary key.  So it must be reading in primary key order and doing a sort of append to the target table.  Sorry I don't have a way to see the code being used.
yes, thats what i said, instead of dropping the non-unique indexes, you can make them unusable, that would save some of ur time in rebuilding them again..
Thanks everyone.
I've gotten the greenlight to run the ETL at the end of the day.  Planning to set all indexes to 'unusable' before the run.
Will update on Monday of results.
More points to be considered before you go with this activity....
---Planning to set all indexes to 'unusable' before the run.

As, I had mentioned in my earlier comment, this should be applied to only non-unique indexes,

As you cannot load into a segment with a UNIQUE index defined that is in UNUSABLE state, you'll get an error "ORA-26026: unique index ... initially in unusable state"

You have to drop and recreate the unique indexes.

If you want to use an UNIQUE index that is not maintained during the data load you need to drop it and re-create after the load.

There is a however a way to circumvent this: You can support a UNIQUE or PRIMARY KEY constraint by means of a non-unique index. This way you can either set the constraint deferrable, or disable the constraint before the load while keeping the index in unusable state and rebuilding the index and re-enabling the constraint afterwards.

Also note this point,

A truncate automatically makes the indexes usable

So while data loading if you do this,

 ALTER INDEX index_name UNUSABLE
 TRUNCATE TABLE <table_name>

The indexes will get automatically usable and you will not find any difference of first alter statement being fired..

So you should remember to modify your DDL statments as:

 TRUNCATE TABLE
ALTER INDEX ... UNUSABLE

Hope it helps..!!!
With setting the indexes to unusable, the ETL time went from 22 hours to 20.5 hours.

I wish very much that it could go faster like Bajawa's ETL (100g in 2 hours?  brag brag brag.. :))
But its seems that going from Informix to Oracle and requiring the Windows iWay Data Migrator as the middle-man may be the bottle-neck.  

Any other thoughts from anyone?

Maybe Tosse could expand more on the comment about archiving and direct-path inserts?
have you tried the suggestion from my comment http:#a37728781, i already have told you about the direct path inserts...
Thanks wasimibm,  
In my comment above #A37729107  I tried to explain that I don't exactly have a SQL I can modify to do direct path inserts.   I'm using a tool called iWay Data Migrator that uses odbc to connect to both the Informix and Oracle databases and allow me to map two tables together for an ETL.  My ONLY experience with ETL'ing is through this tool.

The only thing close to a script that I can grab is the attached script obtained from inside Data Migrator.  The attached script is supposed to be a text version of what the interface built for me based on what I mapped through the GUI options for one of the 231 tables that is being ETL'd.
data-flow-script.txt
understood author, ok then you can't control the sql statements which are being fired as they are done via the tool..!!!

ok.. can't you even control the truncation of tables before the activity??
 i think tool should do  this task, as a part of its operation, if not then add this activiy before starting over and check out if this helps..
I did truncate the tables before running ETL.  I have always done that.  Sorry if I wasn't clear on that.
The only thing new I've done is make the indexes unusable which gained me a valuable 1.5 hours.  I'm very happy about that.  
Wondering if there is anything else I could try.  Obviously SQL code is not available to me.

Perhaps something with the ODBC configuration?
wasimibm,
I've been digging around in the manuals for Data Migrator and there is indeed an option to use a direct-load approach when ETL the data.
I've set up a test to run tonight after hours on one of the larger tables (not the 20 hour one but a 4 hour one).  Hopefully I will have something positive to report in the morning.

Thanks for your help.
oh.. that was nice to hear. just to mention something, i hope you are doing it..

the order of the truncate and alter index unusable should be like this

first you have to truncate the table and after that only set the alter indexes unusable..

lets see how much time you are gonna save this time..!!!!
Thanks.  Yup..I was doing the truncate first and then the unusable thing.  worked very well.

here's the report on the direct load approach:
I tested it on a table that took 4 hours last week.  This time it took 43 minutes.  Wow!!

One thing that wasn't good is that right after I started the direct-load script, I couldn't do anything in the server like open other windows and such.  Couldn't even open the task bar. The server response was severely degraded.  

I will look further into other settings to see if I can limit how much memory gets sucked out by the direct-load approach here and try another test.  

This is very good progress.  :)
wow.. thats a good news vocogov, but I am afraid, I couldn't help you much in the regard of "The server response " because its totally pertaining to the hardware configuration of the system, how much memory available in the system, what is the paging and performance and load on CPU etc., things,

well, try out the things you had mentioned over, if you don't get much.. you can always open a related question, may be some capacity gurus will come over and help you with configuration setttings..!!!
I tried another test and this time set the commit rate to 1000 records instead of the default (I'm not sure what the default is).  It still ran in 43 minutes and the server showed no type of strain or poor response.  Everything worked beautifully.  I'm floored and can't wait to show the boss!
Thanks for you help!
Cool... I am glad that it worked...!!!

Also close this question, if you think you got the answer which you are looking for.!!!