SSIS - Best way to accumulate data from several related tables before insertion into another database

I'm a newbie to SSIS.  I have to move data from a vendor supplied database to one that we will use for reporting.  The vendor database does not store all information at the transaction level that we will need to query.  For example: We have customers assigned to certain sales reps as their default sales rep.  The transaction data contains the actual sales rep that handled the transaction, but not the default sales rep for that customer.  Our sales managers need to run reports that show sales grouped by default sales rep as well as the person who handled the transaction.  They need these reports so that they can reassign the default sales rep periodically to balance work loads.

So, my problem is that I need to offload all transactional data from the vendor's database to a reporting database and capture various other pieces of information that would be a snapshot of the related tables' data at the time of the transaction. There are several pieces of data about both the item being sold and the customer that are not stored by the vendor's database as part of the transactional record that may change over time.  I cannot alter the vendor's database.  I must capture the value of these data elements on the day of the transaction.

I'm working in SSIS, and this is my first attempt at creating an SSIS package.  We are running SQL Server 2008.  Any guidance from the experts is appreciated.

Lynn HuffAsked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

vdr1620Connect With a Mentor Commented:
You don't necessarily need to create a view.. instead write the SQL directly in the source to get all the necessary data using joins.. then load the data into destination and I will believe you don't want to load all the data daily but just the new rows and changed data, which you can do by using Incremental Loads..

source--> lookup --> Conditional Split (if changed --update, if new - Insert) --> destination

hope you understand
Using staging tables might be a good option

1. Execute SQL task -- create Staging tables
2. Data Flow Task - Load data into Staging Tables
3. Data Flow Task - Using SQL Server Destination write a SQL to join data from Staging Tables and load into destination
4. you can later schedule package to run daily using a Job

check the links below on how to use Temp tables/staging Tables in SSIS
Lynn HuffAuthor Commented:
I read through the articles you referenced, but the terminolgoy is throwing me a bit.  Let me try to express what I'm gathering and ask if this is right.

Step 1:  Create a staging table in the destination database with all of the appropriate columns to hold my accumulated data from the related tables in the source database.

Step 2:  Insert records into the staging table from the main source table.

Step 3:  Update the records in the staging table with all of their related table data from the source database.

Step 4:  Insert all of the records from the staging table into the destination table

Step 5:  Delete the records from the staging table

Is that an accurate description of the process?  I don't really see the advantage of using "temporary" tables as described in the articles, but I don't have any knowledge about temporary tables.  It seems to me that if the staging table is permanent but only has rows during the execution of the package, then this might be an acceptable solution.
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

This is what i understood from your Actual Question..Correct me if i am wrong

you basically need to join data from multiple sources table on the fly before inserting data into destination tables..Is that right? If yes, you can do that using Temp that you can use SQL server to write your join statements and then insert them into destination

On the other hand if your source is a SQL server then you will not need staging tables as you can do the join directly in your Source

Please explain what you are trying to do in simple steps/words, i will try to help you do that in SSIS
Lynn HuffAuthor Commented:
I tried to explain that in my question.  I'll try again.

There is a database which I cannot alter that contains data that I need to move to another database on a daily basis as new rows appear.  For the most part, the source tables in DB1 and the destination tables in DB2 will have the same columns and data types.  BUT, there are some pieces of data related to each transaction that are only stored in related tables in the source database, DB1.  These data elements in the source database may change over time.  I need to capture their values at the time of the transaction and store those values in another database for reporting.

Example1:  A transaction header table in the source database has the customer number in it.  I also need to capture the customer class from the customer table and store that customer class as part of the transaction record in the destination database.

Example 2:  A transaction detail table in the source database has the item number in it.  I also need to capture the item category from the item table and store that item category as part of the transaction record in the destination database.

Does that make it clearer?
Lynn HuffAuthor Commented:
After re-reading your last post again, you said "you can do the join directly in your Source"

Are you suggesting creating a view in the source database to use as the source for the rows i need to insert into the destination database?
Lynn HuffAuthor Commented:
I do understand.  I have one other question if you don't mind.  The lookup transformation will do what I need for identifying new rows.  However, I'm uncertain if the performance will become a problem since it actually compares every row in two tables.  In your experience, how big can a table get before the lookup transformation becomes a performance problem?
Yes, performance will be a problem if its done in the right way... like have proper indexes on the search column and use the best mode suitable for you... Its a vast Topic, i would suggest you to read articles online to optimize performance while using lookups..below are few

There is also a 3rd party component.. I believe its called checksum transformation which calculates and stores checksum value and uses that for comparison..
All Courses

From novice to tech pro — start learning today.