Daftwillie
asked on
SSIS OuterJoin
Heya,
I'm still a bit new to SSIS and am looking for some advice. I have a datawarehouse that requires data to be moved around. Obviously my first thought is "SSIS is perfect for this" but there are maybe some techniques I need to refine a bit.
What I want to be able to do is load data from one table to the next, but only if that data does not already exist in the target table. I've done something using a lookup transformation and a conditional split, but I wonder if there is a better, more efficient, way of doing it. For my way to work I have to override the errors produced by the lookup when it doesn't find anything.
As an example of what I am doing Imagine the following two tables.
RAW_PEOPLE
FirstName, LastName, Gender, DOB
Bob, Fleming, M, 01/01/1978
Rose, Cutting, F, 31/12/1985
Hugo, First, M, 12/12/2000
STG_PEOPLE
PeopleID, FirstName, LastName, Gender, DOB
1001, Bob, Fleming, M, 01/01/1978
As you can see in the stage table, Bob Fleming is already there, so I don't want to load him, I do want to load the others though. In SQL I would do this like this
INSERT INTO STG_PEOPLE
SELECT rp.FirstName, rp.LastName, rp.Gender, rp.DOB
FROM RAW_PEOPLE rp LEFT OUTER JOIN STG_PEOPLE sp
ON rp.FirstName = sp.FirstName AND rp.LastName = sp.LastName AND rp.DOB = sp.DOB
WHERE sp.PeopleID IS NULL
Is there a nice elegant way of achiving this using SSIS?
I'm still a bit new to SSIS and am looking for some advice. I have a datawarehouse that requires data to be moved around. Obviously my first thought is "SSIS is perfect for this" but there are maybe some techniques I need to refine a bit.
What I want to be able to do is load data from one table to the next, but only if that data does not already exist in the target table. I've done something using a lookup transformation and a conditional split, but I wonder if there is a better, more efficient, way of doing it. For my way to work I have to override the errors produced by the lookup when it doesn't find anything.
As an example of what I am doing Imagine the following two tables.
RAW_PEOPLE
FirstName, LastName, Gender, DOB
Bob, Fleming, M, 01/01/1978
Rose, Cutting, F, 31/12/1985
Hugo, First, M, 12/12/2000
STG_PEOPLE
PeopleID, FirstName, LastName, Gender, DOB
1001, Bob, Fleming, M, 01/01/1978
As you can see in the stage table, Bob Fleming is already there, so I don't want to load him, I do want to load the others though. In SQL I would do this like this
INSERT INTO STG_PEOPLE
SELECT rp.FirstName, rp.LastName, rp.Gender, rp.DOB
FROM RAW_PEOPLE rp LEFT OUTER JOIN STG_PEOPLE sp
ON rp.FirstName = sp.FirstName AND rp.LastName = sp.LastName AND rp.DOB = sp.DOB
WHERE sp.PeopleID IS NULL
Is there a nice elegant way of achiving this using SSIS?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Using the lookup component and use the red output line to link to the destination component in order to insert only new data.
helped?
regards,
Pedro
helped?
regards,
Pedro
Well, you can always cheat and do:
INSERT INTO STG_PEOPLE
SELECT rp.FirstName, rp.LastName, rp.Gender, rp.DOB
FROM RAW_PEOPLE rp
WHERE rp.FirstName + rp.LastName NOT IN
(
SELECT FirstName + LastName
from STG_PEOPLE
)
But I can't vouch for performance.
However, even in a Data Warehouse, it's acceptable to store unique key values for just this occasion: you want to insert unique rows that do not already exist. It would be even easier if you could match by PeopleID.
INSERT INTO STG_PEOPLE
SELECT rp.FirstName, rp.LastName, rp.Gender, rp.DOB
FROM RAW_PEOPLE rp
WHERE rp.FirstName + rp.LastName NOT IN
(
SELECT FirstName + LastName
from STG_PEOPLE
)
But I can't vouch for performance.
However, even in a Data Warehouse, it's acceptable to store unique key values for just this occasion: you want to insert unique rows that do not already exist. It would be even easier if you could match by PeopleID.
ASKER
I ended up cheating myself.
The PeopleID is assigned to the data when it moves from RAW to STG. I added a "Metadata" column to the raw table of PeopleID. I then try to update this column first and just load in anything with a NULL PeopleID from RAW. Easier, I just wanted to make sure there wasn't an easy snazzy way of doing it in SSIS.
The PeopleID is assigned to the data when it moves from RAW to STG. I added a "Metadata" column to the raw table of PeopleID. I then try to update this column first and just load in anything with a NULL PeopleID from RAW. Easier, I just wanted to make sure there wasn't an easy snazzy way of doing it in SSIS.
ASKER
No solution given but pretty much confirmed my suspisions that SSIS doesn;t really deal with this very well. Ended up giving SSIS the elbow and just wrote a large migration script in SQL, was much easier.
You can improve the performance if you can sub-select only the changed rows and then join.
Thanks,
Sam