Informatica ETL Staging Area Design

Posted on 2007-10-18
Last Modified: 2008-01-09
Hi there, I am working on a Datawarehouse implementation.

I am currently designing a ETL solution using Informatica and have come up with 3 different designs for my ETL staging area.

Overview: There are 3 company books, held across 20 countries.

And here are my options for the design:-
Option 1: 3 large tables based on different books.
Option 2: 1 table holding all the information from all countries and books.
Option 3: 20 tables, one for each country.

The main driver here is performance. It would be great if you could provide some insight into best practice and designs to improve informatica's performance, and help me decide on which option to take.

Links to white papers and examples would be great.

Much appreciated.
Question by:chocobogo
    LVL 9

    Expert Comment

    all depends how will you use the tables? users allways wants to see data only from one book, or allways only from one country ... or there will be some queries, where user want to get some data (sums, ...) from all books, all countries ?
    if latest mentioned option, one table holding all data will be better solution, but use partitioning.

    Author Comment

    Thanks for the quick response.

    Yes, there will be queries that look at all books across all countries.

    Do you happen to know the performance issues around the different options and also the informatica run times so that I can put together a business case for choosing option 2?

    Also do you have a link to how to use partitioning effectively?
    LVL 9

    Accepted Solution

    here is described how to create partitioned table

    is there any "date attribute" in your logical class you want to strore in database ?

    imagine you have orders (order_no, order_date, ...)
    order_no is primary key, order_date is date, when order was created. in most of cases users are interested in only max 1 month old orders, but sometimes users also searches in history. in that case it's best to partition table based on order_date column. the advantag is that after some period (after some years) data is not needed any more. so if you have single (not partitioned) table, delete statement is wery demanding. if your table is partitioned by date column, you can just delete old partitions and then rebuild non-partitioned indexes if there are some.

    Author Comment


    Do you know anywhere where I can get the performance metrics for Informatica?

    e.g. how long it takes to run an ETL procedure on
    - 1000 rows
    - 10k rows
    - 100k rows?
    LVL 1

    Assisted Solution

    Regarding the best design strategy, we'd really need to know something of the requirements for the use of the data.  Is there a reason it needs to be sitting centrally?  Are there network issues between your installations such as frequent outages?  There's a host of other considerations, but regarding your Informatica query, read on!

    The performance metrics will vary widely depending on your hardware and Informatica setup as well as source/target details.  In my experience working for Informatica as a consultant, these are my estimates...

    Assuming that you have a dedicated Informatica server running 1-2 CPUs on an *IX box, the database is unhindered, and the queries are simple, you can expect to see DB2/Oracle SELECT/INSERT speeds run around 3,000 rows/sec, and flat files run around 30,000 rows/sec.  Of course this all changes radically if you throw in Updates or have lookups, say in the instance of a Type II Slowly Changing Dimension.  You can also get significantly faster speeds if you have more processors and have purchased the partitioning feature of Informatica PowerCenter.

    LVL 1

    Expert Comment

    Forced accept.

    EE Admin

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    PRTG Network Monitor: Intuitive Network Monitoring

    Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

    Introduction A previously published article on Experts Exchange ("Joins in Oracle", makes a statement about "Oracle proprietary" joins and mixes the join syntax with gen…
    How to Create User-Defined Aggregates in Oracle Before we begin creating these things, what are user-defined aggregates?  They are a feature introduced in Oracle 9i that allows a developer to create his or her own functions like "SUM", "AVG", and…
    This video shows syntax for various backup options while discussing how the different basic backup types work.  It explains how to take full backups, incremental level 0 backups, incremental level 1 backups in both differential and cumulative mode a…
    This video shows how to configure and send email from and Oracle database using both UTL_SMTP and UTL_MAIL, as well as comparing UTL_SMTP to a manual SMTP conversation with a mail server.

    761 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    7 Experts available now in Live!

    Get 1:1 Help Now