• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 929
  • Last Modified:

SSIS input souce delimma

I have an existing SSIS import package that works just fine.  It imports excel source file "A" containing customer information.  The package then scrubs the database rows and removed rows that are no longer on the import file.

I am now being asked to modify the package to use excel file "B".  File "B" consists of updated customer information.  Ideally, any rows on both files should be imported from file A and then certain fields updated from file B.  SO I implemented a lookup to update those fields.  Seems to work fine...

Here's the catch:  There are some rows on File B that are not on file A.  (ie: File B is not wholely a subset of file A).  They want those customers imported also - as if they were on file A.

If I import both the files using a UNION I will get duplicate rows for some customers.
If I import both the files using a MERGE JOIN (left join) it has she same effect as the lookup. (ie:  rows on B but not A don't get imported)
If I import both the files using a MERGE JOIN (full join) it has she same effect as the UNION.  (ie: dupes)

Is there an easy way to accomplish what I am trying to do here?
Is there a way to re-run a "select distinct" against data already in the flow?  
Or is there an SSIS object structure that can DE-DUPE your data?  :)

Thanks.
0
okacs
Asked:
okacs
  • 3
  • 3
1 Solution
 
Reza RadConsultant, TrainerCommented:
could you upload ssis package and excel files here, I will try to find a solution for this issue
0
 
okacsAuthor Commented:
Sorry, but the files contain confidential customer information.

The problem though is not specific to the data.  I could have an inventory of pineapples and the same issue would occur.  How do I both update from and import from the secondary file without creating duplicates?

I think I may have a possible work around (If I can get it working) where I use file B to update, then delete dupe rows from file B, then import it.  Not sure if that will work out though...
0
 
nmcdermaidCommented:
Now that I have waited ten minutes for BIDS to start, and set up a sample project, I can see that the SORT data flow transformation has a tick box - 'Remove Rows with Duplicate Values', which will remove duplicates.
0
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

 
nmcdermaidCommented:
Note that the merge join implies that the data is already sorted so the sort step shouldn't be too intensive.
0
 
nmcdermaidCommented:
Assuming that you will soon encounter a third and fourth excel file to be imported you might want to try a generic solution:

1. Import all files into a single staging table. Each file dataset gets a 'priority' indicating which data is most correct
2. Perform self joins against the table, performing updates which cleans the data up in order of priority.

Then you can just keep adding data to the table and run the cleanup script at the end.

0
 
okacsAuthor Commented:
I ended up doing this.  Importing it all into one table and writing an SP to do all the work.  PITA that SISS doesn't have an easy construct for this, but the SP works great, and I have more granular control.

Thanks,
0
 
okacsAuthor Commented:
I ended up doing this.  Importing it all into one table and writing an SP to do all the work.  PITA that SISS doesn't have an easy construct for this, but the SP works great, and I have more granular control.

Thanks,
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now