Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 677
  • Last Modified:

SSIS OuterJoin

Heya,

I'm still a bit new to SSIS and am looking for some advice. I have a datawarehouse that requires data to be moved around. Obviously my first thought is "SSIS is perfect for this" but there are maybe some techniques I need to refine a bit.

What I want to be able to do is load data from one table to the next, but only if that data does not already exist in the target table. I've done something using a lookup transformation and a conditional split, but I wonder if there is a better, more efficient, way of doing it. For my way to work I have to override the errors produced by the lookup when it doesn't find anything.

As an example of what I am doing Imagine the following two tables.

RAW_PEOPLE
FirstName, LastName, Gender, DOB
Bob, Fleming, M, 01/01/1978
Rose, Cutting, F, 31/12/1985
Hugo, First, M, 12/12/2000

STG_PEOPLE
PeopleID, FirstName, LastName, Gender, DOB
1001, Bob, Fleming, M, 01/01/1978

As you can see in the stage table, Bob Fleming is already there, so I don't want to load him, I do want to load the others though. In SQL I would do this like this

INSERT INTO STG_PEOPLE
SELECT rp.FirstName, rp.LastName, rp.Gender, rp.DOB
FROM RAW_PEOPLE rp LEFT OUTER JOIN STG_PEOPLE sp
ON rp.FirstName = sp.FirstName AND rp.LastName = sp.LastName AND rp.DOB = sp.DOB
WHERE sp.PeopleID IS NULL

Is there a nice elegant way of achiving this using SSIS?
0
Daftwillie
Asked:
Daftwillie
1 Solution
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
no, unless you use exactly that kind of sql query as input ...
0
 
sameer2010Commented:
Hi,

You can improve the performance if you can sub-select only the changed rows and then join.

Thanks,
Sam
0
 
PedroCGDCommented:
Using the lookup component and use the red output line to link to the destination component in order to insert only new data.
helped?
regards,
Pedro
0
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 
MDTalleyCommented:
Well, you can always cheat and do:

INSERT INTO STG_PEOPLE
SELECT rp.FirstName, rp.LastName, rp.Gender, rp.DOB
FROM RAW_PEOPLE rp
WHERE rp.FirstName + rp.LastName NOT IN
(
  SELECT FirstName + LastName
  from STG_PEOPLE
)

But I can't vouch for performance.

However, even in a Data Warehouse, it's acceptable to store unique key values for just this occasion: you want to insert unique rows that do not already exist.  It would be even easier if you could match by PeopleID.
0
 
DaftwillieAuthor Commented:
I ended up cheating myself.

The PeopleID is assigned to the data when it moves from RAW to STG. I added a "Metadata" column to the raw table of PeopleID. I then try to update this column first and just load in anything with a NULL PeopleID from RAW. Easier, I just wanted to make sure there wasn't an easy snazzy way of doing it in SSIS.
0
 
nmcdermaidCommented:
If you're looking for snazzy and performance, try this:

http://www.codeplex.com/kimballscd

0
 
DaftwillieAuthor Commented:
No solution given but pretty much confirmed my suspisions that SSIS doesn;t really deal with this very well. Ended up giving SSIS the elbow and just wrote a large migration script in SQL, was much easier.
0

Featured Post

NEW Veeam Backup for Microsoft Office 365 1.5

With Office 365, it’s your data and your responsibility to protect it. NEW Veeam Backup for Microsoft Office 365 eliminates the risk of losing access to your Office 365 data.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now