I have an existing SSIS import package that works just fine. It imports excel source file "A" containing customer information. The package then scrubs the database rows and removed rows that are no longer on the import file.
I am now being asked to modify the package to use excel file "B". File "B" consists of updated customer information. Ideally, any rows on both files should be imported from file A and then certain fields updated from file B. SO I implemented a lookup to update those fields. Seems to work fine...
Here's the catch: There are some rows on File B that are not on file A. (ie: File B is not wholely a subset of file A). They want those customers imported also - as if they were on file A.
If I import both the files using a UNION I will get duplicate rows for some customers.
If I import both the files using a MERGE JOIN (left join) it has she same effect as the lookup. (ie: rows on B but not A don't get imported)
If I import both the files using a MERGE JOIN (full join) it has she same effect as the UNION. (ie: dupes)
Is there an easy way to accomplish what I am trying to do here?
Is there a way to re-run a "select distinct" against data already in the flow?
Or is there an SSIS object structure that can DE-DUPE your data? :)