do ETL programming using script/coding VS using ETL tools.

marrowyung
marrowyung used Ask the Experts™
on
hi,

I have a study on ETL tools and recently heard a lot of voice on just need to do ETL programming using script/coding, e.g. R and Qview, so is this means now ETL tools like MS SSIS is useless ?

what is the pros and cons on doing ETL logic by coding and ETL tools?

it seems now doing ETL code in container level with RESTFUL API already make ETL process can do load balancing, parallel execution and scale out (by container), is that correct ? so no need ETL tools any more ???

the new MariaDB X3 platform seems can even ignore ETL process as it can stream data directly form OLTP to OLAP, so not need ETL anymore?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2014
Commented:
With ETL tools...
* you get a variety of drivers.
* you can transform your data outside of the database
* you will have a persisted data set (tables) that you can use without extracting activities
* you might get a GUI interface (less coding)
Ryan ChongSoftware Team Lead
Commented:
it's always depends on user's decision to determine whether to use ETL that built via scripting/coding or ETL tools that off the shelf.

but few things can take into consideration:

1 ) software suitability, knowledge and support

2 ) cost and time for development

3 ) adaptions and compatibility with corporate culture and policies

ETL tools are robust and stable enough to produce a quality outcome, but at the same time, ETL via scripting in general give you flexibility to the max, you could do whatever you wanted in most of the cases.

but at the end of the story, you may come out with something that is hybrid, and both approaches need to be tested to mitigate the errors.
marrowyungSenior Technical architecture (Data)

Author

Commented:
Ryan Chong,

just talked to some ETL tools provider on this and actually if we do not need:
1) Centralized view of ETL process
2) FASTER execution
3) better communication process (chat with business using using flow diagram and pipeline),

then doing programming is no diff as programming can do ETL scale out, e.g. using python and redshift, and parallel execution .

and now what we can also argue is , latest ETL tools has big data and data goverence capabilty, this is what programming hard to do .. ?

"2 ) cost and time for development"

yeah agree! it is faster !

" ETL via scripting in general give you flexibility to the max, you could do whatever you wanted in most of the cases."

anything you can ONLY done by script but not ETL tools ?

"but at the end of the story, you may come out with something that is hybrid, "

yes ! should be !

"ETL tools are robust and stable enough to produce a quality outcome, "

ETL tools MUCH faster ? how much faster ?

script is not stable.. ? why  ? what ETL generate under the hood also script, right?


aikimark,

" you get a variety of drivers."

can connect the many data source at the SAME time ?

"you can transform your data outside of the database"

which means can transform data even between excel and access ? script can't do ?

"you will have a persisted data set (tables) that you can use without extracting activities"

it just preload to a staging table ? and it make data do not load at run time , so faster?
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Ryan ChongSoftware Team Lead

Commented:
and now what we can also argue is , latest ETL tools has big data and data goverence capabilty, this is what programming hard to do .. ?

Some of the concerns of building the ETL from the scratch is that:
* we need to spend time to develop it
* stability
* user friendliness

anything you can ONLY done by script but not ETL tools ?
It depends on what ETL tools are we using, in general, some limitation exist in off the shelf ETL tools. So we need to achieve a balance of whether to build the ETL process from Tools or Programming.

ETL tools MUCH faster ? how much faster ?
it depends on what ETL Tools are we referring to. A good tool doesn't guaranteed the good performance.

script is not stable.. ? why  ?
what I try to emphasize is that by script, we can have a full control on it, and that could be a risk in development as we need to ensure the best practice being implemented, and usually that rely on the quality works of the developers.

what ETL generate under the hood also script, right?
if we write our own ETL scripts, it can be in form of SQL scripts, some VBS, Power Shell scripts, or compiled as executables or whatever can be executed and/or linked to a scheduler.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"* we need to spend time to develop it
* stability
* user friendliness"

so ETL tools is for that ?
but with very good ETL developer, it STILL can ?

so using code to build ETL ourselves is not stable  ?  

user friendliness should be the last sort as business user when execute the ETL task, do not need to LOOK at it, right? it is only for developers ?

"some limitation exist in off the shelf ETL tools"

what it is .. ? any example. it seems this reason is a good reason to kill ETL Tools.

"it depends on what ETL Tools are we referring to. A good tool doesn't guaranteed the good performance. "

for example, Talend for OSDB, and MS SSIS for MS SQL.

"and that could be a risk in development as we need to ensure the best practice being implemented, and usually that rely on the quality works of the developers. "

so this means ETL tools usually has code practice control/anti pattern control  and data goverence so that developer can't do bad code in a hidden way?

"if we write our own ETL scripts, it can be in form of SQL scripts, some VBS, Power Shell scripts, or compiled as executables or whatever can be executed and/or linked to a scheduler."

sorry, this means not answering my question directly .. I mean ETL tools should ALSO generate the code under the hood, then same result as the ETL coding ?
Ryan ChongSoftware Team Lead

Commented:
so ETL tools is for that ?
but with very good ETL developer, it STILL can ?

so using code to build ETL ourselves is not stable  ?  
So to conclude that, just think that ETL Tool is a tool to assist in Data Extraction, Loading and Transformation processes. The tool could be great to us, to safe us from the hassle of debugging and time for development, on a contrary, the tool itself could be limited as it may not cover all your requirements, and some of the ETL tool comes with operational costs. So you need to do a SWOT analysis what approach you should proceed.

user friendliness should be the last sort as business user when execute the ETL task, do not need to LOOK at it, right? it is only for developers ?

Usually ETL process is not run by business users. But do make sure the ETL program does make proper logging.

"some limitation exist in off the shelf ETL tools"

what it is .. ? any example. it seems this reason is a good reason to kill ETL Tools.

Just like Microsoft Office enables us to create various of documents, but the product by itself is improving and include more features (meaning to say, what previously "can't be done" > the limitation, now probably can be done with latest release of product). At last, it may involve some costs to use it.


"it depends on what ETL Tools are we referring to. A good tool doesn't guaranteed the good performance. "

for example, Talend for OSDB, and MS SSIS for MS SQL.

I didn't use Talend for OSDB before so can't comment on it. SSIS would be a good tool for ETL for MS SQL.


"and that could be a risk in development as we need to ensure the best practice being implemented, and usually that rely on the quality works of the developers. "

so this means ETL tools usually has code practice control/anti pattern control  and data goverence so that developer can't do bad code in a hidden way?
It really depends on how developer using that tool. A good tool doesn't mean a good outcome!


"if we write our own ETL scripts, it can be in form of SQL scripts, some VBS, Power Shell scripts, or compiled as executables or whatever can be executed and/or linked to a scheduler."

sorry, this means not answering my question directly .. I mean ETL tools should ALSO generate the code under the hood, then same result as the ETL coding ?
The nature of ETL Tools is not to generate the ETL coding.

You can do a simple comparison.

MS Office > To create a Excel file from wizard   VS   VBA codes > To create a Excel file

so... MS Office does not generate the VBA codes. Both approaches are different but outcome could be the same.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"But do make sure the ETL program does make proper logging."

ok , user friendly you mean is about logging ! so now this is another concern, it seems if ETL program can't do this  but ETL tools ?

you mean centralize logging, right?

"I didn't use Talend for OSDB before so can't comment on it. SSIS would be a good tool for ETL for MS SQL."

what ETL tools you are using ? or you are not using ETL tools but script/coding ?

"It really depends on how developer using that tool. A good tool doesn't mean a good outcome!"

so ETL tools do not help on that ! or not related

I think ETL tools built to control that too.

"so... MS Office does not generate the VBA codes. Both approaches are different but outcome could be the same."

yes.

"The tool could be great to us, to safe us from the hassle of debugging and time for development, on a contrary, the tool itself could be limited as it may not cover all your requirements,"

so ETL is good on shorten ETL devleopement time and easier to debug as it centralize the error log.

without ETL tools, for junior developer or senior developer do not have ETL programming skill, ETL development can be MUCH longer time and hard to debug as you do not know what developer is suing ETL script/code at all?

and finally for any logic can't/hard to implement by tools will have handled by script/coding, right?

so for any this kind of manual coding, can ETL help on monitor it and debug it ?
Ryan ChongSoftware Team Lead

Commented:
you mean centralize logging, right?
Logging helps us in troubleshooting. It's a MUST to have it. You don't want to have a situation the business users claimed that the source data is good while actually it causing the ETL process to fail. So you need to trace and able to prove what causing the errors.

what ETL tools you are using ? or you are not using ETL tools but script/coding ?
In general, SSIS and well, I used to have .sql files to convert data for a CRM system years ago.

so ETL tools do not help on that ! or not related

I think ETL tools built to control that too.

In general, ETL tool helps in converting data.

so ETL is good on shorten ETL devleopement time and easier to debug as it centralize the error log.
That's probably the case.

without ETL tools, for junior developer or senior developer do not have ETL programming skill, ETL development can be MUCH longer time and hard to debug as you do not know what developer is suing ETL script/code at all?
That's why documentation is crucial, for every IT projects : )

and finally for any logic can't/hard to implement by tools will have handled by script/coding, right?
Sometimes, that's probably the case.

so for any this kind of manual coding, can ETL help on monitor it and debug it ?
yes, that's possible. It depends on in which interface that ETL is running. For example, if it's an EXE, you can write the log to a physical file ; if it's running in pure SQL, you can write the log in a Table, etc. By doing manual coding, you got the "full flexibility" in designing your ETL architecture.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"Logging helps us in troubleshooting. It's a MUST to have it. You don't want to have a situation the business users claimed that the source data is good while actually it causing the ETL process to fail"

I knew, I just want to know if ONLY can ETL offer that but when you are doing coding you can't handle it !

"In general, SSIS "

what is the problem/known issue/ and limitation of SSIS ?

"I used to have .sql files to convert data for a CRM system years ago."

convert to a different type ? usually for type conversion what kind of programming language you will use ?

"In general, ETL tool helps in converting data."

usually it is ! but it also can filter on what to control.

"Sometimes, that's probably the case."

you mean most of the time ETL tools can already handle it?

" For example, if it's an EXE, you can write the log to a physical file ; "

then how to monitor this file by SSIS, should be monitor by infrastructure monitoring tools ?

"By doing manual coding, you got the "full flexibility" in designing your ETL architecture."

yeah, but can't be fast and hard to debugs, right? and ETL Tools help dynamically on development time and debugging ?
Ryan ChongSoftware Team Lead

Commented:
I knew, I just want to know if ONLY can ETL offer that but when you are doing coding you can't handle it !
When we manual code it, we need to write the handler as well.

what is the problem/known issue/ and limitation of SSIS ?
To be frank, so far so good when I'm using SSIS. There are different products in the market, but SSIS is the preferred tool to be used if you got MSSQL DB instances.

convert to a different type ? usually for type conversion what kind of programming language you will use ?
your data source files could be in text delimited format, CSV, XML, etc. So some of them could be easily being imported to MSSQL using command such as Bulk Insert, XPath, etc

usually it is ! but it also can filter on what to control.
Yes, so need to know what's the advantages and limitations of the tools we are using.

you mean most of the time ETL tools can already handle it?
Well, yes.

then how to monitor this file by SSIS, should be monitor by infrastructure monitoring tools ?
You can have that and let the monitoring tools to trigger the alerts (if ETL failed, etc) or instantly trigger the alert when the ETL program itself detect an error. So, you can design the whole ETL process (if you got such permission) on how to handle the errors.

yeah, but can't be fast and hard to debugs, right? and ETL Tools help dynamically on development time and debugging ?
if you know that tool well, yea... it will increase your productivity.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"When we manual code it, we need to write the handler as well.
"

am sorry, handler for what ? for SSIS /ETL tools to debug ? so this mean manual script can't do it anyway.? but any program can be debug.. right?


"To be frank, so far so good when I'm using SSIS. There are different products in the market, but SSIS is the preferred tool to be used if you got MSSQL DB instances."


good ! you only use MS SQL ? so far so good mean no manual coding is necessary, right?

someone said SSIS has problem and limitation when other ETL tools vendor talk about it, just know from user how is it!

is it really preferred tools ? I though you can explorer other tools and see diff.

" So, you can design the whole ETL process (if you got such permission) on how to handle the errors."

I mean can SSIS monitor and alert us without OTHER monitoring tools ?
Ryan ChongSoftware Team Lead

Commented:
am sorry, handler for what ? for SSIS /ETL tools to debug ? so this mean manual script can't do it anyway.? but any program can be debug.. right?
A good program by design should capture the exception, which it helps for debugging. What I mean for the Handler is the typical "try ... catch" , "on error goto ErrorHandler" sort of stuffs. Manual script can do that too, it's like what feature you want to include in your program. you can treat the error handling as a feature as well in software development.

good ! you only use MS SQL ? so far so good mean no manual coding is necessary, right?
Yup, so far all my projects, the target database is MSSQL. "so far so good" means SSIS doesn't give me much of the trouble. Manual coding may still be needed even in SSIS in which you can write C# in Script Tasks. By doing that, it gives you more variety in your ETL development.

someone said SSIS has problem and limitation when other ETL tools vendor talk about it, just know from user how is it!
There is no perfect tool, you can't have a single software that can do everything. But i do believe there is best suitable tool for a scenario.

is it really preferred tools ? I though you can explorer other tools and see diff.
Yup you can. so it depends on the factors already highlighted, such as :

1 ) software suitability, knowledge and support
2 ) cost and time for development
3 ) adaptions and compatibility with corporate culture and policies

I mean can SSIS monitor and alert us without OTHER monitoring tools ?
Technically you can. But I would suggest you create some monitoring tasks (like a simple program that running at background, etc) for the monitoring.
Top Expert 2014

Commented:
there are some free/opensource ETL tools that you should explore before you start writing
marrowyungSenior Technical architecture (Data)

Author

Commented:
aikimark,

we have explorer a lot and Talend seems one of the best !

but the point is not which ETL Tools is good , is should we do manual coding or ETL tools
Top Expert 2014

Commented:
Does Talend do everything you need?
marrowyungSenior Technical architecture (Data)

Author

Commented:
hi,

According to Talend, they have doubt on it on why if customer has ETL developer and still need to buy tools like this !

but Talend can do extra like data governance and ETL tools can better help on communication so business user and ETL developer can talk easily !

for scale out and parallel process, R/ Python can do it too!

so except we do not have ETL developer, we do need that.
marrowyungSenior Technical architecture (Data)

Author

Commented:
Ryan,

"Manual coding may still be needed even in SSIS in which you can write C# in Script Tasks. By doing that, it gives you more variety in your ETL development.
"

but you can code directly within SSIS console, right?

" it's like what feature you want to include in your program. you can treat the error handling as a feature as well in software development.

then debug feature is not unique for ETL tools!

"3 ) adaptions and compatibility with corporate culture and policies"

how a tools said to be compatible with corporate culture and policies ?

"But I would suggest you create some monitoring tasks (like a simple program that running at background, etc) for the monitoring."

in your side, what you use to monitor it ?
Software Team Lead
Commented:
but you can code directly within SSIS console, right?
yes, again, you can...

then debug feature is not unique for ETL tools!
yup, error handling is a common feature and it helps us in debugging. And it's not so "unique".

how a tools said to be compatible with corporate culture and policies ?
I mean when management decided to use a software (just like ETL), it must be compatible and not against the existing corporate culture, policies and procedures.

in your side, what you use to monitor it ?
for examples, you could write the exceptions/ errors into a log file, or keep it in a Log Table in database. it depends on how you want to design your ETL process.
marrowyungSenior Technical architecture (Data)

Author

Commented:
And it's not so "unique".

   -> not only by the ETL tools I meant !

" not against the existing corporate culture, policies and procedures."

sorry, how can a tools  against the existing corporate culture, policies and procedures? is the people who use it, right?
marrowyungSenior Technical architecture (Data)

Author

Commented:
aikimark,

any update for me ?
Ryan ChongSoftware Team Lead

Commented:
  -> not only by the ETL tools I meant !
just take a feature as a "technology", every company can try to learn, mimic and put it into its own products.

sorry, how can a tools  against the existing corporate culture, policies and procedures? is the people who use it, right?
to make this simple, just imagine a scenario where someone (usually IT) cannot anyhow introduce freeware/ opensource software because it's against corporate policies. Some corporate need vendor's support for the software they have provided.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"Some corporate need vendor's support for the software they have provided."

make more sense !

and some company might only SUGGEST to use open source.
Ryan ChongSoftware Team Lead

Commented:
make more sense !

and some company might only SUGGEST to use open source.
true, so just follow the guidelines and it won't go wrong!
marrowyungSenior Technical architecture (Data)

Author

Commented:
"and some company might only SUGGEST to use open source."

this is what we are but later on we may end up with situation that we use Oracle !!
 no one will blame as we can say after FS, we found OPEN SOURCE not suitable for us.
marrowyungSenior Technical architecture (Data)

Author

Commented:
tks all.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial