Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Filter duplicate rows in SSIS data flow task

Posted on 2010-09-14
5
Medium Priority
?
1,786 Views
Last Modified: 2012-05-10
Hi Experts,

   I am retrieving data from oracle and inserting into sql server.
   I have 2 tables in oracle called Person and Contacts.
   Person:
      Person_ID, Forename, Surname, Initials, Title, AddressId, TelNo1, TelNo2, Email1, Email2
   Contact:
        Forename, Surname, Initials, Title, AddressId, TelNo1, TelNo2, Email1, Email2
        
      In SSIS, DataFlow, I am retrieving data from 2 tables and UNION ALL and assigning uniqueID for PersonID(sequence number)
      
      If I have a row in Person table and 3 rows in contacts table, after UNION ALL I am getting 4 rows for a person.
      All contacts will be inserted to Telecom table and Person details will be inserted to People table.
      But in People table I want only 1 row to be inserted(no duplicates).
      
      If AddressID is different for the same person, its not called as duplicate. so all rows
      are inserted to People table.
      
      How can I do this in SSIS data flow task after UNION ALL and assigning unique PersonID?
      I mean tracking duplicate records and not inserting into People table.
      
      I can't do UNIQUE INDEX with NO_DUP_VAL as the PersonID is generating after UNION ALL
      and is assigned to each contact i.e added to Telecom table.
      
      So, I want to do some logic after UNION ALL.
      
      Any suggestion?
0
Comment
Question by:rajvja
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 19

Expert Comment

by:Bhavesh Shah
ID: 33673432
Hi,

Can u use max(addressid)?
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 33673443
  1. Do you have to do a UNION ALL instead of a UNION?  (DO you know the difference between those two? ;-)
  2. If you have to use the UNION ALL, then you might want to insert that data into a staging table and then select distinct rows from the staging table.
Remember, SSIS lets you have multiple steps if you need them and it looks like you may need them. ;-)
 
0
 
LVL 11

Author Comment

by:rajvja
ID: 33673828
Hi,

 thanks for the replies. Even I used UNION I am getting the same records.
There is a column in Person table called PERSON_ID(it is unique per row) and I need to retrieve that column coz based on that column I am generate surrogate keys.

Any solution? I am really strucked.
0
 
LVL 22

Accepted Solution

by:
8080_Diver earned 2000 total points
ID: 33673911
In that case, you will need to pull the data into a staging table and then determine how to identify the duplicates that you don't want.  Be sure to add an Identity column in the staging table so that you can distinguish between totally duplicate rows.
For instance, you could create a query that selects the MAX(PERSON_ID) based upon the rest of the columns and then use that to control which row you use to either move the data to your final destination table or simply as part of the process of selecting the data and generating the surrogate keys.
Hmmm, does the PERSON_ID already exist or are you generating it as part of this process?
0
 
LVL 11

Author Closing Comment

by:rajvja
ID: 33680771
thanks
0

Featured Post

Nothing ever in the clear!

This technical paper will help you implement VMware’s VM encryption as well as implement Veeam encryption which together will achieve the nothing ever in the clear goal. If a bad guy steals VMs, backups or traffic they get nothing.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

     When we have to pass multiple rows of data to SQL Server, the developers either have to send one row at a time or come up with other workarounds to meet requirements like using XML to pass data, which is complex and tedious to use. There is a …
Naughty Me. While I was changing the database name from DB1 to DB_PROD1 (yep it's not real database name ^v^), I changed the database name and notified my application fellows that I did it. They turn on the application, and everything is working. A …
In this video you will find out how to export Office 365 mailboxes using the built in eDiscovery tool. Bear in mind that although this method might be useful in some cases, using PST files as Office 365 backup is troublesome in a long run (more on t…
Sometimes it takes a new vantage point, apart from our everyday security practices, to truly see our Active Directory (AD) vulnerabilities. We get used to implementing the same techniques and checking the same areas for a breach. This pattern can re…

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question