Solved

Question regarding designing ETL process with DTS

Posted on 2008-10-08
7
487 Views
Last Modified: 2013-11-30
    We have two ideas based on how a DTS package should be built for an ETL process. One methodolgy is to seperate it into multiple packages and then run them utilizing jobs. The other methodology is to create the process in one master package and run the single package as a job.
    Based on experience, what is the more logical design. If you answer, please do not choose A or B, identify which one is the best way and give supporting reasons (pro's and con's) on why one is more efficient than the other.
    Once again, thank you for your ongoing support!
0
Comment
Question by:Kevin_Maillet
  • 2
  • 2
  • 2
  • +1
7 Comments
 
LVL 4

Accepted Solution

by:
randy_knight earned 167 total points
ID: 22675533
With DTS, I always did it like your option A for the following reasons:

1. The DTS ExecutePackage task is a pain to work with.
2. It is much easier to restart a failed job than to go into your master package, disable steps that have al ready completed, etc.  
3. If you build some metadata tables to handle pre-requisites and complettion of batches, etc. you can go massively parallel this way.

I coud go on and on.  SSIS is a different world but with DTS this is the only wayt to do it IMO.

0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 22675597
It really depends if you are passing global variables from the parent to the child and back.  If the answer is yes, than you have no choice
0
 

Author Comment

by:Kevin_Maillet
ID: 22689823
Is there any performance reasons why one is more effective than the other?
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 75

Assisted Solution

by:Anthony Perkins
Anthony Perkins earned 166 total points
ID: 22691865
You should not be focusing on performance as the key reason.  If there diffrences, they would have to be negligable.  Instead focus on maintenance and if the steps are independent of each other.  If they are than go with using jobs, if not than use a master/child DTS package approach.  That is the correct lithmus test to make.
0
 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 167 total points
ID: 22696759
Definitely look towards operability and scalability. I do like a modular approach, with each module being it's own package, and also self contained. For that reason, often see a few packages being brought into an ETL process, but in such a way that they can largely run independantly of the previous package.

For example, load staging, validate staging, report exceptions, put away staging. Can be four distinct packages - each has a specific job to do, and works with whatever data may exist (or even does not exist because of a failure). The schedular is essentially the glue that co-ordinates the packages - and that is often where there is a sometimes mistaken belief that they all need to know about each other. Certainly there is no point validating data that didn't load, but then, the validation doesn't have to do anything anyway.

Some purists will toally disagree, but, the basis for my reasoning is, there is very little performance overhead in running a task that has nothing to do, and maximum flexibility within a discrete module. I can change validation package as much or as little as needed without impact on anything else. More so, I can farm out different jobs to different developers based on skills, exeperience, and knowledge of data.

So take an object oriented perspective of the ETL process and manage each object (package). It kind of flows on from acperkins statement "Instead focus on maintenance and if the steps are independent of each other" except I take it one step further and suggest make the steps independant. It also kind of fits in with rand_knights suggestions, in so much as each individual step can be isolated, worked on etc...

You do have to design it pretty well, and in the validate process flag errors, and or approved data so that any ensuing steps can take appropriate action. ETL is data driven, so let the data drive the process.
0
 

Author Comment

by:Kevin_Maillet
ID: 22713570
Thank you for your help gentlemen
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 22713603
A pleasure. What did you decide to do ?
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
SQL Backup skipping a few tables 7 29
Sql query for filter 12 23
Test a query 23 19
Minus first query 1 6
Let's review the features of new SQL Server 2012 (Denali CTP3). It listed as below: PERCENT_RANK(): PERCENT_RANK() function will returns the percentage value of rank of the values among its group. PERCENT_RANK() function value always in be…
Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now