What is the big deal with ETL tools ? are they even needed ?

I've done a lot of custom Oracle ETL, have dabbled in Informatica (and worked beside the Infa team for a long time), and have done a bit of work in SSIS plus worked alongside the team. So I have a good sense of what's going on, I'm just not an expert on those tools.

I also see tons of job postings for Informatica and SSIS.

But are they really needed ? Why isn't custom ETL better? They're all going to be processing flat files, no ? I'm just not seeing the advantage.

I did enough with Informatica and SSIS to know that their are limitations. With Informatica, the guys just shrugged their shoulders. With SSIS, I found limitations even quicker, plus others would just write a stored procedure and then the only thing the SSIS program did was call the stored procedure.

So what is the good points for the ETL tools ?

It's a huge learning curve to be really good with these tools, and so it requires another skill set, tool base, etc., etc. Plus I've also found the ETL developers knew a lot about the tools, but less about Sql and even less about whatever business is at hand.

So what IS the big draw for these tools, and what do the old-timers here say ? The only thing I can think of is that is Windows based and so you don't have to fiddle with Unix, scripts, etc.
Gadsden ConsultingIT SpecialistAsked:
Who is Participating?
As a database guy that had to learn to use Informatica I can appreciate where you're coming from.
I never became an expert but I got to work with some people that were; and, having surived, I'm glad for the experience.

First a few things that I turned out to LOVE.

1 - Self documenting Data Flow.  Trying to answer the question "Where did you get this piece of information?"  can be difficult (and that's being kind) with scripts, programs, stored procedures, etc.   However, tracing through a workflow is easy.  Of course, if the processing is complex with lots of data sources and many transformations it may still take awhile to decipher it all; but flow itself is still essentially just connecting the dots.

2 -  Semi-agnostic processing.  I absolutely detest platform agnostic development.  BUT... tools like Informatica have a nice compromise to them.  If you change platforms of your data sources, you must rewrite your source qualifiers.   But... after that you're essentially done.   So, your inputs can use SQL syntax, stored procedures, specialized web services, etc.  Whatever is the best mechanism for your source you can use it.  Once you have your data, the actual "T" transform process is the same all the way through the workflow until you get to your output targets.  Here again, you'll create them to be specialized for the target platform; but your workflow remains logically identical.  Of course, nice modular design in any programming language can duplicate this; but the point is, it's built in.  You kind of have to go out of your way to not be nicely modular with an ETL tool.

3 - Diagnostics and logging.  Most applications I've seen aren't instrumented as well as they could be.  The workflow monitor and logs are an awesome tool that give a ton of information without the developer needing to do much.  Of course, if your mapping calls a procedure or script and produces additional logs, you can include those for even more contextual information.

You might consider those the "soft" benefits but; to me they were some of my favorites.
On a more technical side, I was impressed by these features:

1 - High speed features for some platforms are "options", you don't have to do a lot of special coding in order to get significant benefits if you know when to use them.   For example - if you know you can take advantage of sql*loader style direct path inserts, you can simply turn on "Bulk" option for inserts and you've got them.  If it turns out you need to observe constraints and triggers then switch to "Normal."  It's literally as easy as toggling an option.

2 - Flexible mappings.   Informatica allows you to build transformations within its own framework with common aggregations like SUM, COUNT, MIN/MAX, etc.  It also has more sophisticated features if needed.
Even better though, when the tool comes up short, you can invoke external code.  Calling web services, stored procedures and other tools.   I worked with some that would read from DB2 on a mainframe, flow data through a few internal transformations, call an Oracle stored procedure to process more data, invoke a java program to alter others before flowing into other internal transformations before finally invoking a web service to export the data to an external system.

3 - simple parallel scaling.  If it's reasonable to split your work into pieces and run multiple threads, scaling the degree of parallelization up or down is quite simple with an integration tool with little to no special coding.

Having said all that,  I'm still not sold on Informatica as a universal tool for all problems.  More often than not, I recommend NOT using it; but that's because most of our integrations are between similar platforms that already have solid data transfer mechanisms.    Something as simple as linked tables in sql server  or queries across database links in Oracle can solve many (most?) business needs without the need for additional tools.

Also, as your question mentioned, a lot of ETL developers simply want to process files.  That's fine if that's all you have to work with; but far too often I find developers resorting to file processing when the source and/or target system had a native access method with better functionality.  Mentioning this is somewhat unfair, because it's not the fault of the integration tool, it's the fault of the human that chose to use it incorrectly; but I mention it because it's a real problem with significant impacts to performance and resource consumption.

I don't know if I've convinced you; but my time with them has convinced me they are a viable tool to have in my toolbox.  I use my screwdrivers more often than I do my block plane;  and even though it's obvious one tool is more valuable more often; that doesn't mean my block plane is pointless or broken.  It just means it's waiting for the right job.
Gadsden ConsultingIT SpecialistAuthor Commented:

I'm sorry, I missed this in my e-mail and am just checking now. I'll review tomorrow and  post my thoughts.

Thanks !
Gadsden ConsultingIT SpecialistAuthor Commented:

excellent, just what I was looking for. Basically, it's the right way to go if you are in the business of ETL.

I wish I had be able to learn the tool, but just never had an opportunity, although I did get through the beginner phase.

Really good soft points - self-documenting, semi-agnostic, diagnostics and logging,

Excellent discussion on features: High speed, flexible mappings, and simple parallel scaling.

Also good to hear your cautions, especially not using the right tool for the job.

My experience was in an organization that went whole-hog with Informatica, it became very top heaving with Admin teams, support teams, crazy migration teams, 247  etc. I can't imagine the cost.

I used MS SSIS with Oracle, and that was fine. I did crack that to the intermediate stage, and I found it easier to learn the Informatica, and definitely a lighter footprint.

So I agree for a organization that is processing data, they need an ETL tool.

From here I would need to know more about the organization as well as alternatives to Informatica, like OWB and ODI.

So this will suffice, thanks !

I'll leave this open until tomorrow, see if anyone else chimes in.
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

I personally think that was a great answer. I have watched several organizations  (some v.large others less so, both public and private) deal with ETL in a variety of ways so I was watching on with interest.

Thanks Sean for your experienced insight to this interesting question.

(& of course thanks for asking it)
Gadsden ConsultingIT SpecialistAuthor Commented:

any lessons learned ?

Not really, in my role (a vendor architect)  I get to observe at arms length, not hands on in the ETL area.  When discussing/designing solutions with clients I get to hear about their preferences etc. which vary greatly.

My interest here was simply to gain some insight in an area I dont know that deeply.
Gadsden ConsultingIT SpecialistAuthor Commented:

ok, sounds good.
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
>They're all going to be processing flat files, no ?
Without trying to compete with the first answer,  I've had to handle a variety of sources and destinations...  Salesforce.com source to SQL Server destination + Data Warehouse build, SQL to SQL, SQL to Excel, Oracle to SQL, DB2 to SQL, Flat files to flat files, SQL to flat files, Informix to SQL, XML files to SQL, SQL + Web Services Tasks to banking ACH files...

Learning ETL tools means learning all of the connections out there, specifically the data types and how they map to each other.
Gadsden ConsultingIT SpecialistAuthor Commented:

thanks for broadening my horizons - the flat file processing has been my organization's mode of operations for numerous years and I wasn't thinking beyond that. It's a good point, and one that should be considered when interacting with external parties, if the security can be worked out.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.