Solved

How do I insert unique reference values with lookup + ole db command?

Posted on 2009-07-06
12
698 Views
Last Modified: 2013-11-30
Hi,

I am trying to insert data from an Excel file to a SQL Server database, using SSIS packages. (I have simplified the actual package, to narrow in the problem, so it's like a school book example).
The Excel file has three columns: Name, Product, Supplier.
The database has a table TestTable with three columns: Id (identity int), Name, SupplierId
The SupplierId is mapped to a reference table named Supplier, with columns: Id, Name.

Short description of the SSIS package:
Excel Source --> Sort (on Name) --> Derived Column (adding SupplierId with dummy key 1) --> Lookup (trying to match Supplier names with reference table, replacing SupplierId if found)

If the Lookup finds the supplier, move with a SupplierId. Sort on SupplierId.
If not, insert new suppliers in reference table Suppliers.
** THIS IS WHERE THE PROBLEM IS. **

Lookup to get the SupplierId of the newly inserted suppliers.
Sort on SupplierId.
Merge and Lookup to either insert with OLE DB Destination or update with OLE DB Command.


The problem is:
When making the insert of new suppliers, not only unique suppliers are inserted. If, e.g., the Excel file has two records with supplier "FOX LTD", and the supplier does not yet exist, it adds to records to the reference table. I of course want it to add unique suppliers.

I tried to add a Sort before the insert OLE DB Command, removing rows with duplicate sort values, but then I of course will lose the records that I want to move on with...


This must be an easy thing so solve, but I really can't find out how.
Is there anyone in here who can help me out?

If any code or further explanation is needed, I will provide it.
0
Comment
Question by:sennen
  • 7
  • 5
12 Comments
 
LVL 22

Expert Comment

by:PedroCGD
ID: 24785890
well.... give us more details...
... I also supose you understand correctly SSIS for setting 250 points...

regards,
Pedro
www.pedrocgd.blogspot.com
0
 

Author Comment

by:sennen
ID: 24786036
Concerning the points being awarded, I don't know how to rate the question, and I couldn't find any guidelines. I probably don't understand SSIS completely as I should, being a total newbie. However, when I couldn't find an answer to the question anywhere, I thought I'd try here. Do you suggest the points reward too high or too low??


More details...
What kind of details would be useful?

The insert of any new suppliers is only one step on the way in a larger process. That means I need the original dataset (with SupplierId added as a column, of course) with me "downstream", for later update of a fact table.

1. Lookup
2. If match - proceed to 4
3. If no match - insert new Suppliers and proceed to 3
4. Join the two datasets (same records as before, but now with SupplierId added, both previously existing and newly inserted)
5. Proceed working with the dataset, e.g. inserting it into a fact table...

0
 
LVL 22

Expert Comment

by:PedroCGD
ID: 24786127
relating the points I normally create an SSIS example for those who set the question with 500 points, and give guiedlines and suggestions to those who give less than 500poinst, because I understand that those persons are familiar with SSIS and I dont need to create the example... only for that reason. You are welcome here!

Ok... now...
You are looking data for SupplierName or supplierID?
Could you attach and example of input data and output with what you want... some details to better explain your issue!

regards!
Pedro
0
 

Author Comment

by:sennen
ID: 24812467
OK, thanks for your explanation.

I've attached an image, I hope it helps explaining what I'm trying to do.

From "upstream", a dataset is running, containing some columns and records, a.o. SupplierName.
In "Derived Column" I add a SupplierId column (int) and set it to 1.
The "Lookup" is checking the Supplier table in the database (columns Id, Name), trying to match SupplierName with Name in table, replacing SupplierId if it matches. So far everything is ok.

The error output from "Lookup" contains Suppliers that don't yet exist in the database, and thus need to be added.
This is supposed to be done in "OLE DB Command".
The SqlCommand in "OLE DB Command" is "INSERT INTO [Supplier] (Name) VALUES (?)".
The result of this data flow transformation is that it inserts all the records' SupplierName. I of course want unique Suppliers to be created.

In "Lookup 1" after the insert, I do the came check as in the previous lookup, matching supplier names with supplier id (that now exist in the database).


Let's say that the dataset contains the following data (with header/column name on top)
Name        Supplier
Mr X          Home Depot
Mrs Y        Walmart
Mr Z          Walmart
Mr W         Home Depot
Ms Q         Walgreens

I want the Supplier table to be filled with three suppliers, not five.

I tried to add a Sort component before the insert, checking "Remove rows with duplicate sort values", but as I'm using the pass-through dataset "downstream" (of course I want to insert/update all facts), this doesn't work for me.


Does that explain my problem better?
2009-07-09-130530.jpg
0
 
LVL 22

Expert Comment

by:PedroCGD
ID: 24812911

The lookup 1 found the suppliers names inserted in the previous step?
Try to use aggregate component to have all distinct suplliers names...

or you can also split the dataflow in 2 dataflow... one to insert suppliers and other to do the lookups and insert into destination...
helped?

Pedro
www.pedrocgd.blogspot.com
0
 

Author Comment

by:sennen
ID: 24813199
Yes, the Lookup 1 finds all the recently (in the step before) added suppliers.

Splitting the dataflow in 2... Isn't that exactly what I'm doing?
The flow with the not-yet-existing suppliers is taking a branched stream, to insert suppliers... When done, it merges into the "main" flow again, to insert all data into the destination.

I still think the problem is that the OLE DB Command is inserting all suppliers and duplicates, not unique suppliers.

In other words, the stream should look like this:

[Dataset] to process, coming from upstream:
PName      Supplier
Mr X          Home Depot
Mrs Y        Walmart
Mr Z          Walmart
Mr W         Home Depot
Ms Q         Walgreens

1. INSERT INTO Supplier (Name)
    SELECT Supplier FROM [Dataset]

2. Do a Lookup to get SupplierId for suppliers in [Dataset]. The [Dataset] now looks like this:
PName      Supplier          SupplierId
Mr X          Home Depot    1
Mrs Y        Walmart          2
Mr Z          Walmart          2
Mr W         Home Depot    1
Ms Q         Walgreens      3

2. Do a Lookup to see which records in [Dataset] to insert and which to update.

3. Have one OLE DB Command update the destination table, and one OLE DB Destination to insert into the destination table. The destination table would finally look as follows:

PName      SupplierId
Mr X          1
Mrs Y        2
Mr Z          2
Mr W         1
Ms Q         3


0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 22

Expert Comment

by:PedroCGD
ID: 24813239
I'm a little confused... I will start from
PName      Supplier
Mr X          Home Depot
Mrs Y        Walmart
Mr Z          Walmart
Mr W         Home Depot
Ms Q         Walgreens

and make an example for you.. I think it's better!
Pedro
0
 
LVL 22

Accepted Solution

by:
PedroCGD earned 500 total points
ID: 24813639
I I wrote split the dataflow is soplit in two different dataflow... one to insert dimensions (Supplier) and other to insert facts... check attached images...

Helped?
Regards,
Pedro
www.pedrocgd.blogspot.com
SSIS-Interface.JPG
SSIS-Dimensions.JPG
SSIS-Facts.JPG
0
 
LVL 22

Expert Comment

by:PedroCGD
ID: 24823408
improvements?
0
 

Author Comment

by:sennen
ID: 25088411
Hi again, I apologize for my late response, the summer holidays came in between...

OK, performing it as two tasks, one for inserting/updating dimensions and one for inserting/updating the actual facts... That makes sense. It may take some more time, if the data source is very large, but it's a clean way to do it, and as I can't see any other way to solve it (with a data flow transformation control. e.g.), I will try it out! I'll be back with the result shortly.
0
 
LVL 22

Expert Comment

by:PedroCGD
ID: 25088657
ok.. I'll be waiting and this time more fast, ok? :-)
regards,
Pedro
www.pedrocgd.blogspot.com
0
 

Author Comment

by:sennen
ID: 25089032
Hi again!

It worked like a charm!
I accepted your solution, and raised the point value, I think you put a lot of time into this, which I'm grateful for. I hope you're ok with that. :o)

Thank you very much for your help!
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

When you hear the word proxy, you may become apprehensive. This article will help you to understand Proxy and when it is useful. Let's talk Proxy for SQL Server. (Not in terms of Internet access.) Typically, you'll run into this type of problem w…
Everyone has problem when going to load data into Data warehouse (EDW). They all need to confirm that data quality is good but they don't no how to proceed. Microsoft has provided new task within SSIS 2008 called "Data Profiler Task". It solve th…
Viewers will learn how the fundamental information of how to create a table.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now