SSIS OuterJoin

Heya,

I'm still a bit new to SSIS and am looking for some advice. I have a datawarehouse that requires data to be moved around. Obviously my first thought is "SSIS is perfect for this" but there are maybe some techniques I need to refine a bit.

What I want to be able to do is load data from one table to the next, but only if that data does not already exist in the target table. I've done something using a lookup transformation and a conditional split, but I wonder if there is a better, more efficient, way of doing it. For my way to work I have to override the errors produced by the lookup when it doesn't find anything.

As an example of what I am doing Imagine the following two tables.

RAW_PEOPLE
FirstName, LastName, Gender, DOB
Bob, Fleming, M, 01/01/1978
Rose, Cutting, F, 31/12/1985
Hugo, First, M, 12/12/2000

STG_PEOPLE
PeopleID, FirstName, LastName, Gender, DOB
1001, Bob, Fleming, M, 01/01/1978

As you can see in the stage table, Bob Fleming is already there, so I don't want to load him, I do want to load the others though. In SQL I would do this like this

INSERT INTO STG_PEOPLE
SELECT rp.FirstName, rp.LastName, rp.Gender, rp.DOB
FROM RAW_PEOPLE rp LEFT OUTER JOIN STG_PEOPLE sp
ON rp.FirstName = sp.FirstName AND rp.LastName = sp.LastName AND rp.DOB = sp.DOB
WHERE sp.PeopleID IS NULL

Is there a nice elegant way of achiving this using SSIS?
LVL 1
DaftwillieAsked:
Who is Participating?
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
no, unless you use exactly that kind of sql query as input ...
0
 
sameer2010Commented:
Hi,

You can improve the performance if you can sub-select only the changed rows and then join.

Thanks,
Sam
0
 
PedroCGDCommented:
Using the lookup component and use the red output line to link to the destination component in order to insert only new data.
helped?
regards,
Pedro
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
MDTalleyCommented:
Well, you can always cheat and do:

INSERT INTO STG_PEOPLE
SELECT rp.FirstName, rp.LastName, rp.Gender, rp.DOB
FROM RAW_PEOPLE rp
WHERE rp.FirstName + rp.LastName NOT IN
(
  SELECT FirstName + LastName
  from STG_PEOPLE
)

But I can't vouch for performance.

However, even in a Data Warehouse, it's acceptable to store unique key values for just this occasion: you want to insert unique rows that do not already exist.  It would be even easier if you could match by PeopleID.
0
 
DaftwillieAuthor Commented:
I ended up cheating myself.

The PeopleID is assigned to the data when it moves from RAW to STG. I added a "Metadata" column to the raw table of PeopleID. I then try to update this column first and just load in anything with a NULL PeopleID from RAW. Easier, I just wanted to make sure there wasn't an easy snazzy way of doing it in SSIS.
0
 
nmcdermaidCommented:
If you're looking for snazzy and performance, try this:

http://www.codeplex.com/kimballscd

0
 
DaftwillieAuthor Commented:
No solution given but pretty much confirmed my suspisions that SSIS doesn;t really deal with this very well. Ended up giving SSIS the elbow and just wrote a large migration script in SQL, was much easier.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.