[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

loading huge data into a fact table from a huge source table

Posted on 2011-03-09
12
Medium Priority
?
255 Views
Last Modified: 2012-08-14
I have a souce table xyz which contains around 23 million records with a lot of attributes. I need to load fact table xyz_f which is basically a select of xyz with some records filtered by which will insert around 23 million records by some id_cds..The table xyz_f being truncated and indexes in the fact table is dropped..

it is taking now close to an hr to load the data. the task is decrease the load time..

Please make any recommendations..

For now bulk collect is being used to load the data with a frequency of 10000.

the records that goes into the target fact table is

select * from xyz
where
id not in(124,234,2345,2345.....)

which results in close to 23 million record everyday
]
0
Comment
Question by:gs79
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
  • 2
12 Comments
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35087080
Can you not set up xyz_f as a materialized view and never 'reload' it again?
0
 
LVL 11

Expert Comment

by:Akenathon
ID: 35087106
Can't you just append to the fact table the "new" data? Of course it's not that easy if you have updates and deletes on the source table, but you may find a way around.

In general, ALTER TABLE EXCHANGE PARTITION is what you want to use for these jobs. The fact table ought to be partitioned by date, with local indexes, so that new partitions can be added in a split second, with all their indexes pre-created and analyzed. If you can also get the source table partitioned by list (to isolate your IDs of interest), and then subpartition by date (or the other way round), even better since you just scan the interesting data.
0
 

Author Comment

by:gs79
ID: 35125512
I just need to truncate the table xyz_f and load the table with new records provided by the below query on a daily basis..

select * from xyz
where
id not in(124,234,2345,2345.....)

right now bulk collect is being used with a commit frequency of 10000..which is taking 30/40 minutes to load..

Can parallel pipelined functions be used to insert records into table in parallel slaves after enabling the table for parallel dml?

Please advice if there are any other methods..

Thanks

as of now its taking 30/40 minutes to insert
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35127935
>>Please advice if there are any other methods..

First post:  Materialized views?  http:#a35087080

With those, no need to truncate/reinsert.  It is kept up to date.  Even if you cannot do a fast refresh during the day, have them updated during your currently nightly maintenance window.

>>Can parallel pipelined functions be used

What are you thinking pipelined functions?  Parallel is parallel.  I'm not sure how a pipelined function would help speed things up but you never know.  I suggest you set up some tests and try out everything you can think of.
0
 
LVL 11

Expert Comment

by:Akenathon
ID: 35130964
You need to describe the whole procedure to tell whether pipelined functions could play any role in this.

One way or the other: if you can say it in plain SQL (i.e. NOT PL/SQL) it will be faster than manually looping through each record, even with bulk collect. Think of something like:

drop table target;
create table target parallel 4 as select /*+parallel(xyz 4)*/ * from xyz where id not in(124,234,2345,2345.....);

create index target_ix1 on target (...) compute statistics;
create index target_ix2 on target (...) compute statistics;
...

execute dbms_stats.gather_table_stats(user, 'TARGET', cascade => false)

Open in new window


Dropping the table gets rid of the indexes, that is VERY advisable when you are INSERTing so many rows on an empty table, especially if you have many indexes.

For the SELECT part, you may want to see the reference for the PARALLEL hint. You can also do it without hinting if you setup your source table in this way: ALTER TABLE XYZ PARALLEL 4;

Note you can also create the target table with 4 partitions, and optionally make its indexes local as well. Try and see what's faster in your specific scenario. Of course, the sample degree of 4 must also be adjusted by trial and error.
0
 

Author Comment

by:gs79
ID: 35193722
dropping the table is not an option as it will affect the dependancies..So i am using a procedure where I am performing following operations sequentially:

1. Drop index
2. truncate the table
3. alter session enable parallel dml
3. inserting into table(table has a degree 8 parallel) and also using hint in the "select from source table statment"
4. alter session disable parallel dml
5. recreate the index

still no performance improvement.

Another thing I tried is creating a parallel pipelined function to process sql cursor in parallel slaves and do a parallel dml on the table. But I could not see any parallelism in action... I will provide the queries in a while..but any thoughts on this..

thanks
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35193982
Why haven't you commented on the materialized view comments?
0
 

Author Comment

by:gs79
ID: 35194361
I am new here and I did suggest materialized view approach and the solution was discarded. I may have try this as well and I will give it a try.  Meanwhile can you please  me  with the script to create MV for the table I have described above with the correct refresh option..

Still wonder why I am not able to see parallelism in action when I used parallel pipelined functions. There is been lot of white papers written on this techniques boasting its powerfulness in ETL. But i feel it's so whimsical with the way it works..

any thoughts..

Thanks..
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35194384
What are your refresh requirements and impact requirements?  I would probably go with refresh on commit but that depends on if you can withstand the performance impact.

I'm on mobile so can't come up with any working script right now.  All the options are in the docs along with the references for the refresh job.

If you can't get it by tomorrow, I'll try to create a working example.
0
 

Author Comment

by:gs79
ID: 35194557
I want this to be refreshed only once a day preferably lets say 1 am..please provide me if you have a working sample..

thanks
0
 
LVL 77

Accepted Solution

by:
slightwv (䄆 Netminder) earned 2000 total points
ID: 35200255
Below is a simple example of a Materialized view that refreshes every day at 1AM.
drop table tab1 purge;
create table tab1 (col1 char(1) primary key);
insert into tab1 values('a');
commit;


CREATE MATERIALIZED VIEW tab1_mv
   REFRESH START WITH ROUND(SYSDATE + 1) + 1/24 
   NEXT round(SYSDATE+1)+ 1/24 
   AS SELECT * FROM tab1
/

Open in new window

0
 

Author Comment

by:gs79
ID: 35384588
Thanks every one..the mv.s work..i was hoping to solve this using pipelined functions..
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Cursors in Oracle: A cursor is used to process individual rows returned by database system for a query. In oracle every SQL statement executed by the oracle server has a private area. This area contains information about the SQL statement and the…
I remember the day when someone asked me to create a user for an application developement. The user should be able to create views and materialized views and, so, I used the following syntax: (CODE) This way, I guessed, I would ensure that use…
This video shows information on the Oracle Data Dictionary, starting with the Oracle documentation, explaining the different types of Data Dictionary views available by group and permissions as well as giving examples on how to retrieve data from th…
This video shows how to recover a database from a user managed backup

656 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question