Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 145
  • Last Modified:

Load fact table SSIS

Im trying to load a Fact table from a Stage table in SQL Server. This is an existing package that is using the Slowly Changing Dimension which is incredibly slow. The process starts off with a Query to an ODBC table and then does many Lookups and then finally a SCD to Insert or Update the fact table. Now, I dont see how any of these records will be updated but just in case I want to check. I am not going to use the SCD to check though, it takes hours to complete. I was thinking about using a Conditional Split? My only problem is that there are no Primary keys in the source table so I have to use many fields to see if its an actual duplicate? I do see there is Line numbers that seem to be unique to the document number so maybe I can go off of that. Can someone help me out with the Conditional Split to check for a duplicate and to go to one OleDB destination for Insert and another for Update. This way I can evaluate if I need this. FYI, the Fact table has millions of rows so I dont think I should be doing any kind of Truncate and then reload the table. I just need a way to check the Destination before I do anything. Can anyone help me with this real quick???   Thank you!!!
0
jknj72
Asked:
jknj72
  • 2
  • 2
1 Solution
 
Habib PourfardCommented:
Slowly Changing Dimension transformation should only be used for small dimensions (less than 10,000 rows). comparing all the columns is not optimal. a hash function is usually used to store information about all the needed attributes, so that the process can quickly check whether there has been a change.

To implement a hash function Write a SELECT statement and add a calculated column using the T-SQL HASHBYTES
function when reading the data.

Faster Slowly Changing Dimension with Hash Function

and consider using T-SQL MERGE statement.
0
 
jknj72Author Commented:
So are you saying I should be using the SCD with a hash function to do what I need to do or should I be using a T-SQL Merge statement for what I need to do? I need help either way ;)

Thanks
JK
0
 
Habib PourfardCommented:
Using Hash function is what will improve the performance. Use hash function to find out which records should be updated and which records need to be inserted.

then you can use MERGE statement if you like to insert and update in one statement.
0
 
jknj72Author Commented:
Thanks
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now