Points for any constructive advise on an SSIS project I have where the source is a Type 1 SCD
, and desired target is SQL Server that stores the data as Type 2 SCD.
Salesforce.com (SF) data, PK’s are columns id (int) and SystemModstamp (datetime) columns, and the data is stored as a Type 1 SCD, which means If a value changes in a row then it will have the same id, but different SystemModstamp, and whatever other changes. No history is kept.
Client wants see…
• (Requirement #1, aka R1) Only the current row (i.e. same as source)
• (R2) All history for a given row, which means I have to build that up on my own based on changes over time.
My Table / Views Design
• ‘Staging’ table as an initial landing for all source data. No problems on this one.
• A single table for each SF table, with all rows (to include changes as new rows), Also includes a CurrentRow (bit) column with a value of 1 if current row and 0 if not current.
• (R1) will be a view that is a SELECT * FROM above table
• (R2) will be a view that is a SELECT * FROM above table WHERE current_row = 1
• Indexed on id, SystemModstamp, and CurrentRow.
My SSIS Package Design
• Truncate ‘staging’ tables, load with SF source data.
• Then in a single data flow…
o INSERT all rows (with current_rows = 1) without a match in id
o INSERT all rows (with current rows = 0) with a match in id, and no match in SystemModstamp.
o Run a SQL Stored Procedure to look at the rows with multiple SystemModstamp values for a given id, update the max to 1, all else 0. This I can code, so no problem.
Again, points for any constructive feedback.