What is the big deal with ETL tools ? are they even needed ?

I've done a lot of custom Oracle ETL, have dabbled in Informatica (and worked beside the Infa team for a long time), and have done a bit of work in SSIS plus worked alongside the team. So I have a good sense of what's going on, I'm just not an expert on those tools.

I also see tons of job postings for Informatica and SSIS.

But are they really needed ? Why isn't custom ETL better? They're all going to be processing flat files, no ? I'm just not seeing the advantage.

I did enough with Informatica and SSIS to know that their are limitations. With Informatica, the guys just shrugged their shoulders. With SSIS, I found limitations even quicker, plus others would just write a stored procedure and then the only thing the SSIS program did was call the stored procedure.

So what is the good points for the ETL tools ?

It's a huge learning curve to be really good with these tools, and so it requires another skill set, tool base, etc., etc. Plus I've also found the ETL developers knew a lot about the tools, but less about Sql and even less about whatever business is at hand.

So what IS the big draw for these tools, and what do the old-timers here say ? The only thing I can think of is that is Windows based and so you don't have to fiddle with Unix, scripts, etc.
Gadsden ConsultingIT SpecialistAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

As a database guy that had to learn to use Informatica I can appreciate where you're coming from.
I never became an expert but I got to work with some people that were; and, having surived, I'm glad for the experience.

First a few things that I turned out to LOVE.

1 - Self documenting Data Flow.  Trying to answer the question "Where did you get this piece of information?"  can be difficult (and that's being kind) with scripts, programs, stored procedures, etc.   However, tracing through a workflow is easy.  Of course, if the processing is complex with lots of data sources and many transformations it may still take awhile to decipher it all; but flow itself is still essentially just connecting the dots.

2 -  Semi-agnostic processing.  I absolutely detest platform agnostic development.  BUT... tools like Informatica have a nice compromise to them.  If you change platforms of your data sources, you must rewrite your source qualifiers.   But... after that you're essentially done.   So, your inputs can use SQL syntax, stored procedures, specialized web services, etc.  Whatever is the best mechanism for your source you can use it.  Once you have your data, the actual "T" transform process is the same all the way through the workflow until you get to your output targets.  Here again, you'll create them to be specialized for the target platform; but your workflow remains logically identical.  Of course, nice modular design in any programming language can duplicate this; but the point is, it's built in.  You kind of have to go out of your way to not be nicely modular with an ETL tool.

3 - Diagnostics and logging.  Most applications I've seen aren't instrumented as well as they could be.  The workflow monitor and logs are an awesome tool that give a ton of information without the developer needing to do much.  Of course, if your mapping calls a procedure or script and produces additional logs, you can include those for even more contextual information.

You might consider those the "soft" benefits but; to me they were some of my favorites.
On a more technical side, I was impressed by these features:

1 - High speed features for some platforms are "options", you don't have to do a lot of special coding in order to get significant benefits if you know when to use them.   For example - if you know you can take advantage of sql*loader style direct path inserts, you can simply turn on "Bulk" option for inserts and you've got them.  If it turns out you need to observe constraints and triggers then switch to "Normal."  It's literally as easy as toggling an option.

2 - Flexible mappings.   Informatica allows you to build transformations within its own framework with common aggregations like SUM, COUNT, MIN/MAX, etc.  It also has more sophisticated features if needed.
Even better though, when the tool comes up short, you can invoke external code.  Calling web services, stored procedures and other tools.   I worked with some that would read from DB2 on a mainframe, flow data through a few internal transformations, call an Oracle stored procedure to process more data, invoke a java program to alter others before flowing into other internal transformations before finally invoking a web service to export the data to an external system.

3 - simple parallel scaling.  If it's reasonable to split your work into pieces and run multiple threads, scaling the degree of parallelization up or down is quite simple with an integration tool with little to no special coding.

Having said all that,  I'm still not sold on Informatica as a universal tool for all problems.  More often than not, I recommend NOT using it; but that's because most of our integrations are between similar platforms that already have solid data transfer mechanisms.    Something as simple as linked tables in sql server  or queries across database links in Oracle can solve many (most?) business needs without the need for additional tools.

Also, as your question mentioned, a lot of ETL developers simply want to process files.  That's fine if that's all you have to work with; but far too often I find developers resorting to file processing when the source and/or target system had a native access method with better functionality.  Mentioning this is somewhat unfair, because it's not the fault of the integration tool, it's the fault of the human that chose to use it incorrectly; but I mention it because it's a real problem with significant impacts to performance and resource consumption.

I don't know if I've convinced you; but my time with them has convinced me they are a viable tool to have in my toolbox.  I use my screwdrivers more often than I do my block plane;  and even though it's obvious one tool is more valuable more often; that doesn't mean my block plane is pointless or broken.  It just means it's waiting for the right job.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Gadsden ConsultingIT SpecialistAuthor Commented:

I'm sorry, I missed this in my e-mail and am just checking now. I'll review tomorrow and  post my thoughts.

Thanks !
Gadsden ConsultingIT SpecialistAuthor Commented:

excellent, just what I was looking for. Basically, it's the right way to go if you are in the business of ETL.

I wish I had be able to learn the tool, but just never had an opportunity, although I did get through the beginner phase.

Really good soft points - self-documenting, semi-agnostic, diagnostics and logging,

Excellent discussion on features: High speed, flexible mappings, and simple parallel scaling.

Also good to hear your cautions, especially not using the right tool for the job.

My experience was in an organization that went whole-hog with Informatica, it became very top heaving with Admin teams, support teams, crazy migration teams, 247  etc. I can't imagine the cost.

I used MS SSIS with Oracle, and that was fine. I did crack that to the intermediate stage, and I found it easier to learn the Informatica, and definitely a lighter footprint.

So I agree for a organization that is processing data, they need an ETL tool.

From here I would need to know more about the organization as well as alternatives to Informatica, like OWB and ODI.

So this will suffice, thanks !

I'll leave this open until tomorrow, see if anyone else chimes in.
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

PortletPaulEE Topic AdvisorCommented:
I personally think that was a great answer. I have watched several organizations  (some v.large others less so, both public and private) deal with ETL in a variety of ways so I was watching on with interest.

Thanks Sean for your experienced insight to this interesting question.

(& of course thanks for asking it)
Gadsden ConsultingIT SpecialistAuthor Commented:

any lessons learned ?

PortletPaulEE Topic AdvisorCommented:
Not really, in my role (a vendor architect)  I get to observe at arms length, not hands on in the ETL area.  When discussing/designing solutions with clients I get to hear about their preferences etc. which vary greatly.

My interest here was simply to gain some insight in an area I dont know that deeply.
Gadsden ConsultingIT SpecialistAuthor Commented:

ok, sounds good.
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
>They're all going to be processing flat files, no ?
Without trying to compete with the first answer,  I've had to handle a variety of sources and destinations...  Salesforce.com source to SQL Server destination + Data Warehouse build, SQL to SQL, SQL to Excel, Oracle to SQL, DB2 to SQL, Flat files to flat files, SQL to flat files, Informix to SQL, XML files to SQL, SQL + Web Services Tasks to banking ACH files...

Learning ETL tools means learning all of the connections out there, specifically the data types and how they map to each other.
Gadsden ConsultingIT SpecialistAuthor Commented:

thanks for broadening my horizons - the flat file processing has been my organization's mode of operations for numerous years and I wasn't thinking beyond that. It's a good point, and one that should be considered when interacting with external parties, if the security can be worked out.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Oracle Database

From novice to tech pro — start learning today.