Solved

Metrics to diagnose ETL Jobs load performance in Informatica

Posted on 2014-10-12
2
891 Views
Last Modified: 2014-10-20
Hi,

I am managing an ETL team now and I learnt that it took informatica 27 Hours to load 85 Million rows. In my opinion this is unacceptable.

What metrics (and from what tool or repository of informatica) I can get more diagnostic information about

1. The source database is slow
2. The target database is slow
3. Missing indexes issues
4. Transforms taking time
5. Staging database is slow
6. Job was queued in the system ( so the actual time can be 5 hour but for 22 hours it was in a queue)

Anything else you might think of. My point is simple I want the team to look for all kinds of optimization opportunities to speed up this job or work with the vendor to get this fixed

thanks
-anshu
0
Comment
Question by:anshuma
2 Comments
 
LVL 45

Accepted Solution

by:
aikimark earned 400 total points
ID: 40376766
Contact Informatica immediately.

In such cases, I usually advise my clients to isolate the problems.  For instance, just for the extract phase, I would ask...
* is the source database busy when you want to run the extract?
* is the extract job (network) 'close' to the source database server?
* what is the speed and capacity of the network path from the source database to the extract job?
* is the extract job running on a busy server?
* is the extract job running on a resource constrained server?
* is the extract job using local storage for its temp/working files?
* what is the fastest possible extract of data from the source, independent of the Informatica software?

All of these questions are independent of the configuration and properties your team has assigned to the ETL job, which should be double-checked.  I've seen cases, in other system utilities, where an option was selected to minimize system resource use.  The trade-off was slow run times.

In another shop, the vendor's tape management system used the first character of the job name to index the job's data in the utility's internal linked-list data structure.  Since the production job names began with "P", there was a very uneven distribution of data and all activities (insert, search, update, delete) took a very loooong.  Rather than replacing the tape management system, the jobs were renamed.
0
 
LVL 73

Assisted Solution

by:sdstuber
sdstuber earned 100 total points
ID: 40376892
You should be able to look in the workflow monitor logs and see which steps are taking a long time.

Then determine what the corresponding SQL statements are and give those to your dba team to investigate.
0

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Salesforce.com is a cloud-based customer relationship management (CRM) system. In this article, you will learn how to add and map custom lead and contact fields to your Salesforce instance.
Messaging apps are amazing tools with the power to do a lot of good, but the truth is the process of collaborating with coworkers requires relationships established through meaningful communication - the kind of communication that only happens face-…
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
This video explains at a high level about the four available data types in Oracle and how dates can be manipulated by the user to get data into and out of the database.

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question