How do I insert unique reference values with lookup + ole db command?

Hi,

I am trying to insert data from an Excel file to a SQL Server database, using SSIS packages. (I have simplified the actual package, to narrow in the problem, so it's like a school book example).
The Excel file has three columns: Name, Product, Supplier.
The database has a table TestTable with three columns: Id (identity int), Name, SupplierId
The SupplierId is mapped to a reference table named Supplier, with columns: Id, Name.

Short description of the SSIS package:
Excel Source --> Sort (on Name) --> Derived Column (adding SupplierId with dummy key 1) --> Lookup (trying to match Supplier names with reference table, replacing SupplierId if found)

If the Lookup finds the supplier, move with a SupplierId. Sort on SupplierId.
If not, insert new suppliers in reference table Suppliers.
** THIS IS WHERE THE PROBLEM IS. **

Lookup to get the SupplierId of the newly inserted suppliers.
Sort on SupplierId.
Merge and Lookup to either insert with OLE DB Destination or update with OLE DB Command.


The problem is:
When making the insert of new suppliers, not only unique suppliers are inserted. If, e.g., the Excel file has two records with supplier "FOX LTD", and the supplier does not yet exist, it adds to records to the reference table. I of course want it to add unique suppliers.

I tried to add a Sort before the insert OLE DB Command, removing rows with duplicate sort values, but then I of course will lose the records that I want to move on with...


This must be an easy thing so solve, but I really can't find out how.
Is there anyone in here who can help me out?

If any code or further explanation is needed, I will provide it.
sennenAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

PedroCGDCommented:
well.... give us more details...
... I also supose you understand correctly SSIS for setting 250 points...

regards,
Pedro
www.pedrocgd.blogspot.com
0
sennenAuthor Commented:
Concerning the points being awarded, I don't know how to rate the question, and I couldn't find any guidelines. I probably don't understand SSIS completely as I should, being a total newbie. However, when I couldn't find an answer to the question anywhere, I thought I'd try here. Do you suggest the points reward too high or too low??


More details...
What kind of details would be useful?

The insert of any new suppliers is only one step on the way in a larger process. That means I need the original dataset (with SupplierId added as a column, of course) with me "downstream", for later update of a fact table.

1. Lookup
2. If match - proceed to 4
3. If no match - insert new Suppliers and proceed to 3
4. Join the two datasets (same records as before, but now with SupplierId added, both previously existing and newly inserted)
5. Proceed working with the dataset, e.g. inserting it into a fact table...

0
PedroCGDCommented:
relating the points I normally create an SSIS example for those who set the question with 500 points, and give guiedlines and suggestions to those who give less than 500poinst, because I understand that those persons are familiar with SSIS and I dont need to create the example... only for that reason. You are welcome here!

Ok... now...
You are looking data for SupplierName or supplierID?
Could you attach and example of input data and output with what you want... some details to better explain your issue!

regards!
Pedro
0
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

sennenAuthor Commented:
OK, thanks for your explanation.

I've attached an image, I hope it helps explaining what I'm trying to do.

From "upstream", a dataset is running, containing some columns and records, a.o. SupplierName.
In "Derived Column" I add a SupplierId column (int) and set it to 1.
The "Lookup" is checking the Supplier table in the database (columns Id, Name), trying to match SupplierName with Name in table, replacing SupplierId if it matches. So far everything is ok.

The error output from "Lookup" contains Suppliers that don't yet exist in the database, and thus need to be added.
This is supposed to be done in "OLE DB Command".
The SqlCommand in "OLE DB Command" is "INSERT INTO [Supplier] (Name) VALUES (?)".
The result of this data flow transformation is that it inserts all the records' SupplierName. I of course want unique Suppliers to be created.

In "Lookup 1" after the insert, I do the came check as in the previous lookup, matching supplier names with supplier id (that now exist in the database).


Let's say that the dataset contains the following data (with header/column name on top)
Name        Supplier
Mr X          Home Depot
Mrs Y        Walmart
Mr Z          Walmart
Mr W         Home Depot
Ms Q         Walgreens

I want the Supplier table to be filled with three suppliers, not five.

I tried to add a Sort component before the insert, checking "Remove rows with duplicate sort values", but as I'm using the pass-through dataset "downstream" (of course I want to insert/update all facts), this doesn't work for me.


Does that explain my problem better?
2009-07-09-130530.jpg
0
PedroCGDCommented:

The lookup 1 found the suppliers names inserted in the previous step?
Try to use aggregate component to have all distinct suplliers names...

or you can also split the dataflow in 2 dataflow... one to insert suppliers and other to do the lookups and insert into destination...
helped?

Pedro
www.pedrocgd.blogspot.com
0
sennenAuthor Commented:
Yes, the Lookup 1 finds all the recently (in the step before) added suppliers.

Splitting the dataflow in 2... Isn't that exactly what I'm doing?
The flow with the not-yet-existing suppliers is taking a branched stream, to insert suppliers... When done, it merges into the "main" flow again, to insert all data into the destination.

I still think the problem is that the OLE DB Command is inserting all suppliers and duplicates, not unique suppliers.

In other words, the stream should look like this:

[Dataset] to process, coming from upstream:
PName      Supplier
Mr X          Home Depot
Mrs Y        Walmart
Mr Z          Walmart
Mr W         Home Depot
Ms Q         Walgreens

1. INSERT INTO Supplier (Name)
    SELECT Supplier FROM [Dataset]

2. Do a Lookup to get SupplierId for suppliers in [Dataset]. The [Dataset] now looks like this:
PName      Supplier          SupplierId
Mr X          Home Depot    1
Mrs Y        Walmart          2
Mr Z          Walmart          2
Mr W         Home Depot    1
Ms Q         Walgreens      3

2. Do a Lookup to see which records in [Dataset] to insert and which to update.

3. Have one OLE DB Command update the destination table, and one OLE DB Destination to insert into the destination table. The destination table would finally look as follows:

PName      SupplierId
Mr X          1
Mrs Y        2
Mr Z          2
Mr W         1
Ms Q         3


0
PedroCGDCommented:
I'm a little confused... I will start from
PName      Supplier
Mr X          Home Depot
Mrs Y        Walmart
Mr Z          Walmart
Mr W         Home Depot
Ms Q         Walgreens

and make an example for you.. I think it's better!
Pedro
0
PedroCGDCommented:
I I wrote split the dataflow is soplit in two different dataflow... one to insert dimensions (Supplier) and other to insert facts... check attached images...

Helped?
Regards,
Pedro
www.pedrocgd.blogspot.com
SSIS-Interface.JPG
SSIS-Dimensions.JPG
SSIS-Facts.JPG
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
PedroCGDCommented:
improvements?
0
sennenAuthor Commented:
Hi again, I apologize for my late response, the summer holidays came in between...

OK, performing it as two tasks, one for inserting/updating dimensions and one for inserting/updating the actual facts... That makes sense. It may take some more time, if the data source is very large, but it's a clean way to do it, and as I can't see any other way to solve it (with a data flow transformation control. e.g.), I will try it out! I'll be back with the result shortly.
0
PedroCGDCommented:
ok.. I'll be waiting and this time more fast, ok? :-)
regards,
Pedro
www.pedrocgd.blogspot.com
0
sennenAuthor Commented:
Hi again!

It worked like a charm!
I accepted your solution, and raised the point value, I think you put a lot of time into this, which I'm grateful for. I hope you're ok with that. :o)

Thank you very much for your help!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.