Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2667
  • Last Modified:

Informatica ETL Staging Area Design

Hi there, I am working on a Datawarehouse implementation.

I am currently designing a ETL solution using Informatica and have come up with 3 different designs for my ETL staging area.

Overview: There are 3 company books, held across 20 countries.

And here are my options for the design:-
Option 1: 3 large tables based on different books.
Option 2: 1 table holding all the information from all countries and books.
Option 3: 20 tables, one for each country.

The main driver here is performance. It would be great if you could provide some insight into best practice and designs to improve informatica's performance, and help me decide on which option to take.

Links to white papers and examples would be great.

Much appreciated.
0
chocobogo
Asked:
chocobogo
2 Solutions
 
konektorCommented:
all depends how will you use the tables? users allways wants to see data only from one book, or allways only from one country ... or there will be some queries, where user want to get some data (sums, ...) from all books, all countries ?
if latest mentioned option, one table holding all data will be better solution, but use partitioning.
0
 
chocobogoAuthor Commented:
Thanks for the quick response.

Yes, there will be queries that look at all books across all countries.

Do you happen to know the performance issues around the different options and also the informatica run times so that I can put together a business case for choosing option 2?

Also do you have a link to how to use partitioning effectively?
0
 
konektorCommented:
here is described how to create partitioned table http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_7002.htm#i2129707

is there any "date attribute" in your logical class you want to strore in database ?

imagine you have orders (order_no, order_date, ...)
order_no is primary key, order_date is date, when order was created. in most of cases users are interested in only max 1 month old orders, but sometimes users also searches in history. in that case it's best to partition table based on order_date column. the advantag is that after some period (after some years) data is not needed any more. so if you have single (not partitioned) table, delete statement is wery demanding. if your table is partitioned by date column, you can just delete old partitions and then rebuild non-partitioned indexes if there are some.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
chocobogoAuthor Commented:
Thanks

Do you know anywhere where I can get the performance metrics for Informatica?

e.g. how long it takes to run an ETL procedure on
- 1000 rows
- 10k rows
- 100k rows?
0
 
waelelrifaiCommented:
Regarding the best design strategy, we'd really need to know something of the requirements for the use of the data.  Is there a reason it needs to be sitting centrally?  Are there network issues between your installations such as frequent outages?  There's a host of other considerations, but regarding your Informatica query, read on!

The performance metrics will vary widely depending on your hardware and Informatica setup as well as source/target details.  In my experience working for Informatica as a consultant, these are my estimates...

Assuming that you have a dedicated Informatica server running 1-2 CPUs on an *IX box, the database is unhindered, and the queries are simple, you can expect to see DB2/Oracle SELECT/INSERT speeds run around 3,000 rows/sec, and flat files run around 30,000 rows/sec.  Of course this all changes radically if you throw in Updates or have lookups, say in the instance of a Type II Slowly Changing Dimension.  You can also get significantly faster speeds if you have more processors and have purchased the partitioning feature of Informatica PowerCenter.

Wael
================================
wael.elrifai@peakconsulting.eu
0
 
Computer101Commented:
Forced accept.

Computer101
EE Admin
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now