Solved

loading huge data into a fact table from a huge source table

Posted on 2011-03-09
12
244 Views
Last Modified: 2012-08-14
I have a souce table xyz which contains around 23 million records with a lot of attributes. I need to load fact table xyz_f which is basically a select of xyz with some records filtered by which will insert around 23 million records by some id_cds..The table xyz_f being truncated and indexes in the fact table is dropped..

it is taking now close to an hr to load the data. the task is decrease the load time..

Please make any recommendations..

For now bulk collect is being used to load the data with a frequency of 10000.

the records that goes into the target fact table is

select * from xyz
where
id not in(124,234,2345,2345.....)

which results in close to 23 million record everyday
]
0
Comment
Question by:gs79
  • 5
  • 5
  • 2
12 Comments
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
Comment Utility
Can you not set up xyz_f as a materialized view and never 'reload' it again?
0
 
LVL 11

Expert Comment

by:Akenathon
Comment Utility
Can't you just append to the fact table the "new" data? Of course it's not that easy if you have updates and deletes on the source table, but you may find a way around.

In general, ALTER TABLE EXCHANGE PARTITION is what you want to use for these jobs. The fact table ought to be partitioned by date, with local indexes, so that new partitions can be added in a split second, with all their indexes pre-created and analyzed. If you can also get the source table partitioned by list (to isolate your IDs of interest), and then subpartition by date (or the other way round), even better since you just scan the interesting data.
0
 

Author Comment

by:gs79
Comment Utility
I just need to truncate the table xyz_f and load the table with new records provided by the below query on a daily basis..

select * from xyz
where
id not in(124,234,2345,2345.....)

right now bulk collect is being used with a commit frequency of 10000..which is taking 30/40 minutes to load..

Can parallel pipelined functions be used to insert records into table in parallel slaves after enabling the table for parallel dml?

Please advice if there are any other methods..

Thanks

as of now its taking 30/40 minutes to insert
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
Comment Utility
>>Please advice if there are any other methods..

First post:  Materialized views?  http:#a35087080

With those, no need to truncate/reinsert.  It is kept up to date.  Even if you cannot do a fast refresh during the day, have them updated during your currently nightly maintenance window.

>>Can parallel pipelined functions be used

What are you thinking pipelined functions?  Parallel is parallel.  I'm not sure how a pipelined function would help speed things up but you never know.  I suggest you set up some tests and try out everything you can think of.
0
 
LVL 11

Expert Comment

by:Akenathon
Comment Utility
You need to describe the whole procedure to tell whether pipelined functions could play any role in this.

One way or the other: if you can say it in plain SQL (i.e. NOT PL/SQL) it will be faster than manually looping through each record, even with bulk collect. Think of something like:

drop table target;
create table target parallel 4 as select /*+parallel(xyz 4)*/ * from xyz where id not in(124,234,2345,2345.....);

create index target_ix1 on target (...) compute statistics;
create index target_ix2 on target (...) compute statistics;
...

execute dbms_stats.gather_table_stats(user, 'TARGET', cascade => false)

Open in new window


Dropping the table gets rid of the indexes, that is VERY advisable when you are INSERTing so many rows on an empty table, especially if you have many indexes.

For the SELECT part, you may want to see the reference for the PARALLEL hint. You can also do it without hinting if you setup your source table in this way: ALTER TABLE XYZ PARALLEL 4;

Note you can also create the target table with 4 partitions, and optionally make its indexes local as well. Try and see what's faster in your specific scenario. Of course, the sample degree of 4 must also be adjusted by trial and error.
0
 

Author Comment

by:gs79
Comment Utility
dropping the table is not an option as it will affect the dependancies..So i am using a procedure where I am performing following operations sequentially:

1. Drop index
2. truncate the table
3. alter session enable parallel dml
3. inserting into table(table has a degree 8 parallel) and also using hint in the "select from source table statment"
4. alter session disable parallel dml
5. recreate the index

still no performance improvement.

Another thing I tried is creating a parallel pipelined function to process sql cursor in parallel slaves and do a parallel dml on the table. But I could not see any parallelism in action... I will provide the queries in a while..but any thoughts on this..

thanks
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
Comment Utility
Why haven't you commented on the materialized view comments?
0
 

Author Comment

by:gs79
Comment Utility
I am new here and I did suggest materialized view approach and the solution was discarded. I may have try this as well and I will give it a try.  Meanwhile can you please  me  with the script to create MV for the table I have described above with the correct refresh option..

Still wonder why I am not able to see parallelism in action when I used parallel pipelined functions. There is been lot of white papers written on this techniques boasting its powerfulness in ETL. But i feel it's so whimsical with the way it works..

any thoughts..

Thanks..
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
Comment Utility
What are your refresh requirements and impact requirements?  I would probably go with refresh on commit but that depends on if you can withstand the performance impact.

I'm on mobile so can't come up with any working script right now.  All the options are in the docs along with the references for the refresh job.

If you can't get it by tomorrow, I'll try to create a working example.
0
 

Author Comment

by:gs79
Comment Utility
I want this to be refreshed only once a day preferably lets say 1 am..please provide me if you have a working sample..

thanks
0
 
LVL 76

Accepted Solution

by:
slightwv (䄆 Netminder) earned 500 total points
Comment Utility
Below is a simple example of a Materialized view that refreshes every day at 1AM.
drop table tab1 purge;
create table tab1 (col1 char(1) primary key);
insert into tab1 values('a');
commit;


CREATE MATERIALIZED VIEW tab1_mv
   REFRESH START WITH ROUND(SYSDATE + 1) + 1/24 
   NEXT round(SYSDATE+1)+ 1/24 
   AS SELECT * FROM tab1
/

Open in new window

0
 

Author Comment

by:gs79
Comment Utility
Thanks every one..the mv.s work..i was hoping to solve this using pipelined functions..
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Working with Network Access Control Lists in Oracle 11g (part 1) Part 2: http://www.e-e.com/A_9074.html So, you upgraded to a shiny new 11g database and all of a sudden every program that used UTL_MAIL, UTL_SMTP, UTL_TCP, UTL_HTTP or any oth…
Subquery in Oracle: Sub queries are one of advance queries in oracle. Types of advance queries: •      Sub Queries •      Hierarchical Queries •      Set Operators Sub queries are know as the query called from another query or another subquery. It can …
Via a live example, show how to take different types of Oracle backups using RMAN.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now