SSIS input souce delimma

I have an existing SSIS import package that works just fine.  It imports excel source file "A" containing customer information.  The package then scrubs the database rows and removed rows that are no longer on the import file.

I am now being asked to modify the package to use excel file "B".  File "B" consists of updated customer information.  Ideally, any rows on both files should be imported from file A and then certain fields updated from file B.  SO I implemented a lookup to update those fields.  Seems to work fine...

Here's the catch:  There are some rows on File B that are not on file A.  (ie: File B is not wholely a subset of file A).  They want those customers imported also - as if they were on file A.

If I import both the files using a UNION I will get duplicate rows for some customers.
If I import both the files using a MERGE JOIN (left join) it has she same effect as the lookup. (ie:  rows on B but not A don't get imported)
If I import both the files using a MERGE JOIN (full join) it has she same effect as the UNION.  (ie: dupes)

Is there an easy way to accomplish what I am trying to do here?
Is there a way to re-run a "select distinct" against data already in the flow?  
Or is there an SSIS object structure that can DE-DUPE your data?  :)

Thanks.
okacsAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Reza RadConsultant, TrainerCommented:
could you upload ssis package and excel files here, I will try to find a solution for this issue
0
okacsAuthor Commented:
Sorry, but the files contain confidential customer information.

The problem though is not specific to the data.  I could have an inventory of pineapples and the same issue would occur.  How do I both update from and import from the secondary file without creating duplicates?

I think I may have a possible work around (If I can get it working) where I use file B to update, then delete dupe rows from file B, then import it.  Not sure if that will work out though...
0
nmcdermaidCommented:
Now that I have waited ten minutes for BIDS to start, and set up a sample project, I can see that the SORT data flow transformation has a tick box - 'Remove Rows with Duplicate Values', which will remove duplicates.
0
The 7 Worst Nightmares of a Sysadmin

Fear not! To defend your business’ IT systems we’re going to shine a light on the seven most sinister terrors that haunt sysadmins. That way you can be sure there’s nothing in your stack waiting to go bump in the night.

nmcdermaidCommented:
Note that the merge join implies that the data is already sorted so the sort step shouldn't be too intensive.
0
nmcdermaidCommented:
Assuming that you will soon encounter a third and fourth excel file to be imported you might want to try a generic solution:

1. Import all files into a single staging table. Each file dataset gets a 'priority' indicating which data is most correct
2. Perform self joins against the table, performing updates which cleans the data up in order of priority.

Then you can just keep adding data to the table and run the cleanup script at the end.

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
okacsAuthor Commented:
I ended up doing this.  Importing it all into one table and writing an SP to do all the work.  PITA that SISS doesn't have an easy construct for this, but the SP works great, and I have more granular control.

Thanks,
0
okacsAuthor Commented:
I ended up doing this.  Importing it all into one table and writing an SP to do all the work.  PITA that SISS doesn't have an easy construct for this, but the SP works great, and I have more granular control.

Thanks,
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Databases

From novice to tech pro — start learning today.