[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now

x
?
Solved

SELECT DISTINCT

Posted on 2011-02-25
6
Medium Priority
?
564 Views
Last Modified: 2013-11-10
I have a dtsx package that I built and within one of the data flows, I'm manipulating the data in a way that makes me end up with a bunch of duplicates. Is there a way to produce the same effect as SELECT DISTINCT within one of the transformations?

Here's a mini version of what I start with:

ContactId   Code   Name
35              NK1     Joe Blow
35              NK3     Joe Blow
35              NP6     Joe Blow
19              SJ1     Jane Blow
19              SJ4     Jane Blow
19              NK1    Jane Blow

In one of my existing transformations I'm taking off the numeric character of the codes and ending up with:

ContactId   Code   Name
35              NK      Joe Blow
35              NK      Joe Blow
35              NP      Joe Blow
19              SJ      Jane Blow
19              SJ      Jane Blow
19              NK     Jane Blow

I want to end up with:

ContactId   Code   Name
35              NK      Joe Blow
35              NP      Joe Blow
19              SJ      Jane Blow
19              NK     Jane Blow

Can I achieve this with a transformation, or will it require a SQL Task?

In a really perfect world I would end up with
ContactId   Code   Name
35              NK,NP      Joe Blow
19              SJ,NK      Jane Blow
0
Comment
Question by:MuffyBunny
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 57

Expert Comment

by:Raja Jegan R
ID: 34980291
This would do:
SELECT t1.ContactId, t1.NAME, SUBSTRING((SELECT ', ' + t2.Code FROM cte t2 WHERE t1.ContactId = t2.ContactId AND t1.NAME = t2.name FOR XML PATH('')), 3, 1000) Code
FROM cte t1
group BY ContactId, NAME

Open in new window

0
 
LVL 6

Author Comment

by:MuffyBunny
ID: 34980330
I guess I should have specified that the data is originating from an IBM UniVerse ODBC data source. I do not have the ability to take the numbers off at the source. In fact, I don't have the ability to do much of anything with the data from the source other than a standard no-frills SELECT statement. All manipulations have to take place AFTER getting the dataset from the source.
0
 
LVL 57

Expert Comment

by:Raja Jegan R
ID: 34980364
Ok, in that case pull data from your IBM data source and insert records into temp tables..
Once done, you can run the query provided above to get the records transformed like this using either transformation or just the query above..
0
Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

 
LVL 7

Accepted Solution

by:
rmm2001 earned 2000 total points
ID: 34989599
So you can do it using ssis. This way will get

ContactId   Code   Name
35              NK      Joe Blow
35              NP      Joe Blow
19              SJ      Jane Blow
19              NK     Jane Blow

First - add your source and logic to strip off the numeric char at the end of the code. And whatever logic you have. After you've done all of that, add a Fuzzy Grouping component. And tell it to group off of your ContactID, NewCode (the one where you stripped the number) and Name.

This will give you a "typical" set of fuzzy results. What you're wanting though is when the _key_in == _key_out. So if it was in a table - you could put a checksum on the column and determine dupes that way. That's the same thinking as this.

So that will give you the 4 rows of distinct values that you're looking for.

You'd have to concat the rows in sql or by writing the data to a raw file and reading it back in, putting a script component in to do the work. (Or just inserting with an oledb command)

data flow fuzzy lookup transformation conditional split to get distinct rows
0
 
LVL 6

Author Comment

by:MuffyBunny
ID: 34997368
You ROCK
0
 
LVL 6

Author Closing Comment

by:MuffyBunny
ID: 34997370
Thanks a million
0

Featured Post

Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have a large data set and a SSIS package. How can I load this file in multi threading?
In part one, we reviewed the prerequisites required for installing SQL Server vNext. In this part we will explore how to install Microsoft's SQL Server on Ubuntu 16.04.
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question