SSIS - Best way to accumulate data from several related tables before insertion into another database

Posted on 2010-11-08
Medium Priority
Last Modified: 2012-05-10
I'm a newbie to SSIS.  I have to move data from a vendor supplied database to one that we will use for reporting.  The vendor database does not store all information at the transaction level that we will need to query.  For example: We have customers assigned to certain sales reps as their default sales rep.  The transaction data contains the actual sales rep that handled the transaction, but not the default sales rep for that customer.  Our sales managers need to run reports that show sales grouped by default sales rep as well as the person who handled the transaction.  They need these reports so that they can reassign the default sales rep periodically to balance work loads.

So, my problem is that I need to offload all transactional data from the vendor's database to a reporting database and capture various other pieces of information that would be a snapshot of the related tables' data at the time of the transaction. There are several pieces of data about both the item being sold and the customer that are not stored by the vendor's database as part of the transactional record that may change over time.  I cannot alter the vendor's database.  I must capture the value of these data elements on the day of the transaction.

I'm working in SSIS, and this is my first attempt at creating an SSIS package.  We are running SQL Server 2008.  Any guidance from the experts is appreciated.

Question by:Lynn Huff
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
LVL 16

Expert Comment

ID: 34087025
Using staging tables might be a good option

1. Execute SQL task -- create Staging tables
2. Data Flow Task - Load data into Staging Tables
3. Data Flow Task - Using SQL Server Destination write a SQL to join data from Staging Tables and load into destination
4. you can later schedule package to run daily using a Job

check the links below on how to use Temp tables/staging Tables in SSIS


Author Comment

by:Lynn Huff
ID: 34087197
I read through the articles you referenced, but the terminolgoy is throwing me a bit.  Let me try to express what I'm gathering and ask if this is right.

Step 1:  Create a staging table in the destination database with all of the appropriate columns to hold my accumulated data from the related tables in the source database.

Step 2:  Insert records into the staging table from the main source table.

Step 3:  Update the records in the staging table with all of their related table data from the source database.

Step 4:  Insert all of the records from the staging table into the destination table

Step 5:  Delete the records from the staging table

Is that an accurate description of the process?  I don't really see the advantage of using "temporary" tables as described in the articles, but I don't have any knowledge about temporary tables.  It seems to me that if the staging table is permanent but only has rows during the execution of the package, then this might be an acceptable solution.
LVL 16

Expert Comment

ID: 34087661
This is what i understood from your Actual Question..Correct me if i am wrong

you basically need to join data from multiple sources table on the fly before inserting data into destination tables..Is that right? If yes, you can do that using Temp tables.so that you can use SQL server to write your join statements and then insert them into destination

On the other hand if your source is a SQL server then you will not need staging tables as you can do the join directly in your Source

Please explain what you are trying to do in simple steps/words, i will try to help you do that in SSIS
10 Questions to Ask when Buying Backup Software

Choosing the right backup solution for your organization can be a daunting task. To make the selection process easier, ask solution providers these 10 key questions.


Author Comment

by:Lynn Huff
ID: 34087779
I tried to explain that in my question.  I'll try again.

There is a database which I cannot alter that contains data that I need to move to another database on a daily basis as new rows appear.  For the most part, the source tables in DB1 and the destination tables in DB2 will have the same columns and data types.  BUT, there are some pieces of data related to each transaction that are only stored in related tables in the source database, DB1.  These data elements in the source database may change over time.  I need to capture their values at the time of the transaction and store those values in another database for reporting.

Example1:  A transaction header table in the source database has the customer number in it.  I also need to capture the customer class from the customer table and store that customer class as part of the transaction record in the destination database.

Example 2:  A transaction detail table in the source database has the item number in it.  I also need to capture the item category from the item table and store that item category as part of the transaction record in the destination database.

Does that make it clearer?

Author Comment

by:Lynn Huff
ID: 34087980
After re-reading your last post again, you said "you can do the join directly in your Source"

Are you suggesting creating a view in the source database to use as the source for the rows i need to insert into the destination database?
LVL 16

Accepted Solution

vdr1620 earned 2000 total points
ID: 34088400
You don't necessarily need to create a view.. instead write the SQL directly in the source to get all the necessary data using joins.. then load the data into destination and I will believe you don't want to load all the data daily but just the new rows and changed data, which you can do by using Incremental Loads..

source--> lookup --> Conditional Split (if changed --update, if new - Insert) --> destination

hope you understand

Author Comment

by:Lynn Huff
ID: 34092733
I do understand.  I have one other question if you don't mind.  The lookup transformation will do what I need for identifying new rows.  However, I'm uncertain if the performance will become a problem since it actually compares every row in two tables.  In your experience, how big can a table get before the lookup transformation becomes a performance problem?
LVL 16

Expert Comment

ID: 34093395
Yes, performance will be a problem if its done in the right way... like have proper indexes on the search column and use the best mode suitable for you... Its a vast Topic, i would suggest you to read articles online to optimize performance while using lookups..below are few


There is also a 3rd party component.. I believe its called checksum transformation which calculates and stores checksum value and uses that for comparison..

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have a large data set and a SSIS package. How can I load this file in multi threading?
An alternative to the "For XML" way of pivoting and concatenating result sets into strings, and an easy introduction to "common table expressions" (CTEs). Being someone who is always looking for alternatives to "work your data", I came across this …
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
Viewers will learn how the fundamental information of how to create a table.

719 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question