?
Solved

loading huge data into a fact table from a huge source table

Posted on 2011-03-09
12
Medium Priority
?
254 Views
Last Modified: 2012-08-14
I have a souce table xyz which contains around 23 million records with a lot of attributes. I need to load fact table xyz_f which is basically a select of xyz with some records filtered by which will insert around 23 million records by some id_cds..The table xyz_f being truncated and indexes in the fact table is dropped..

it is taking now close to an hr to load the data. the task is decrease the load time..

Please make any recommendations..

For now bulk collect is being used to load the data with a frequency of 10000.

the records that goes into the target fact table is

select * from xyz
where
id not in(124,234,2345,2345.....)

which results in close to 23 million record everyday
]
0
Comment
Question by:gs79
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
  • 2
12 Comments
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35087080
Can you not set up xyz_f as a materialized view and never 'reload' it again?
0
 
LVL 11

Expert Comment

by:Akenathon
ID: 35087106
Can't you just append to the fact table the "new" data? Of course it's not that easy if you have updates and deletes on the source table, but you may find a way around.

In general, ALTER TABLE EXCHANGE PARTITION is what you want to use for these jobs. The fact table ought to be partitioned by date, with local indexes, so that new partitions can be added in a split second, with all their indexes pre-created and analyzed. If you can also get the source table partitioned by list (to isolate your IDs of interest), and then subpartition by date (or the other way round), even better since you just scan the interesting data.
0
 

Author Comment

by:gs79
ID: 35125512
I just need to truncate the table xyz_f and load the table with new records provided by the below query on a daily basis..

select * from xyz
where
id not in(124,234,2345,2345.....)

right now bulk collect is being used with a commit frequency of 10000..which is taking 30/40 minutes to load..

Can parallel pipelined functions be used to insert records into table in parallel slaves after enabling the table for parallel dml?

Please advice if there are any other methods..

Thanks

as of now its taking 30/40 minutes to insert
0
Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35127935
>>Please advice if there are any other methods..

First post:  Materialized views?  http:#a35087080

With those, no need to truncate/reinsert.  It is kept up to date.  Even if you cannot do a fast refresh during the day, have them updated during your currently nightly maintenance window.

>>Can parallel pipelined functions be used

What are you thinking pipelined functions?  Parallel is parallel.  I'm not sure how a pipelined function would help speed things up but you never know.  I suggest you set up some tests and try out everything you can think of.
0
 
LVL 11

Expert Comment

by:Akenathon
ID: 35130964
You need to describe the whole procedure to tell whether pipelined functions could play any role in this.

One way or the other: if you can say it in plain SQL (i.e. NOT PL/SQL) it will be faster than manually looping through each record, even with bulk collect. Think of something like:

drop table target;
create table target parallel 4 as select /*+parallel(xyz 4)*/ * from xyz where id not in(124,234,2345,2345.....);

create index target_ix1 on target (...) compute statistics;
create index target_ix2 on target (...) compute statistics;
...

execute dbms_stats.gather_table_stats(user, 'TARGET', cascade => false)

Open in new window


Dropping the table gets rid of the indexes, that is VERY advisable when you are INSERTing so many rows on an empty table, especially if you have many indexes.

For the SELECT part, you may want to see the reference for the PARALLEL hint. You can also do it without hinting if you setup your source table in this way: ALTER TABLE XYZ PARALLEL 4;

Note you can also create the target table with 4 partitions, and optionally make its indexes local as well. Try and see what's faster in your specific scenario. Of course, the sample degree of 4 must also be adjusted by trial and error.
0
 

Author Comment

by:gs79
ID: 35193722
dropping the table is not an option as it will affect the dependancies..So i am using a procedure where I am performing following operations sequentially:

1. Drop index
2. truncate the table
3. alter session enable parallel dml
3. inserting into table(table has a degree 8 parallel) and also using hint in the "select from source table statment"
4. alter session disable parallel dml
5. recreate the index

still no performance improvement.

Another thing I tried is creating a parallel pipelined function to process sql cursor in parallel slaves and do a parallel dml on the table. But I could not see any parallelism in action... I will provide the queries in a while..but any thoughts on this..

thanks
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35193982
Why haven't you commented on the materialized view comments?
0
 

Author Comment

by:gs79
ID: 35194361
I am new here and I did suggest materialized view approach and the solution was discarded. I may have try this as well and I will give it a try.  Meanwhile can you please  me  with the script to create MV for the table I have described above with the correct refresh option..

Still wonder why I am not able to see parallelism in action when I used parallel pipelined functions. There is been lot of white papers written on this techniques boasting its powerfulness in ETL. But i feel it's so whimsical with the way it works..

any thoughts..

Thanks..
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35194384
What are your refresh requirements and impact requirements?  I would probably go with refresh on commit but that depends on if you can withstand the performance impact.

I'm on mobile so can't come up with any working script right now.  All the options are in the docs along with the references for the refresh job.

If you can't get it by tomorrow, I'll try to create a working example.
0
 

Author Comment

by:gs79
ID: 35194557
I want this to be refreshed only once a day preferably lets say 1 am..please provide me if you have a working sample..

thanks
0
 
LVL 77

Accepted Solution

by:
slightwv (䄆 Netminder) earned 2000 total points
ID: 35200255
Below is a simple example of a Materialized view that refreshes every day at 1AM.
drop table tab1 purge;
create table tab1 (col1 char(1) primary key);
insert into tab1 values('a');
commit;


CREATE MATERIALIZED VIEW tab1_mv
   REFRESH START WITH ROUND(SYSDATE + 1) + 1/24 
   NEXT round(SYSDATE+1)+ 1/24 
   AS SELECT * FROM tab1
/

Open in new window

0
 

Author Comment

by:gs79
ID: 35384588
Thanks every one..the mv.s work..i was hoping to solve this using pipelined functions..
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article started out as an Experts-Exchange question, which then grew into a quick tip to go along with an IOUG presentation for the Collaborate confernce and then later grew again into a full blown article with expanded functionality and legacy…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Via a live example, show how to take different types of Oracle backups using RMAN.
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question