Solved

SELECT DISTINCT

Posted on 2011-02-25
6
544 Views
Last Modified: 2013-11-10
I have a dtsx package that I built and within one of the data flows, I'm manipulating the data in a way that makes me end up with a bunch of duplicates. Is there a way to produce the same effect as SELECT DISTINCT within one of the transformations?

Here's a mini version of what I start with:

ContactId   Code   Name
35              NK1     Joe Blow
35              NK3     Joe Blow
35              NP6     Joe Blow
19              SJ1     Jane Blow
19              SJ4     Jane Blow
19              NK1    Jane Blow

In one of my existing transformations I'm taking off the numeric character of the codes and ending up with:

ContactId   Code   Name
35              NK      Joe Blow
35              NK      Joe Blow
35              NP      Joe Blow
19              SJ      Jane Blow
19              SJ      Jane Blow
19              NK     Jane Blow

I want to end up with:

ContactId   Code   Name
35              NK      Joe Blow
35              NP      Joe Blow
19              SJ      Jane Blow
19              NK     Jane Blow

Can I achieve this with a transformation, or will it require a SQL Task?

In a really perfect world I would end up with
ContactId   Code   Name
35              NK,NP      Joe Blow
19              SJ,NK      Jane Blow
0
Comment
Question by:MuffyBunny
  • 3
  • 2
6 Comments
 
LVL 57

Expert Comment

by:Raja Jegan R
ID: 34980291
This would do:
SELECT t1.ContactId, t1.NAME, SUBSTRING((SELECT ', ' + t2.Code FROM cte t2 WHERE t1.ContactId = t2.ContactId AND t1.NAME = t2.name FOR XML PATH('')), 3, 1000) Code
FROM cte t1
group BY ContactId, NAME

Open in new window

0
 
LVL 6

Author Comment

by:MuffyBunny
ID: 34980330
I guess I should have specified that the data is originating from an IBM UniVerse ODBC data source. I do not have the ability to take the numbers off at the source. In fact, I don't have the ability to do much of anything with the data from the source other than a standard no-frills SELECT statement. All manipulations have to take place AFTER getting the dataset from the source.
0
 
LVL 57

Expert Comment

by:Raja Jegan R
ID: 34980364
Ok, in that case pull data from your IBM data source and insert records into temp tables..
Once done, you can run the query provided above to get the records transformed like this using either transformation or just the query above..
0
Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

 
LVL 7

Accepted Solution

by:
rmm2001 earned 500 total points
ID: 34989599
So you can do it using ssis. This way will get

ContactId   Code   Name
35              NK      Joe Blow
35              NP      Joe Blow
19              SJ      Jane Blow
19              NK     Jane Blow

First - add your source and logic to strip off the numeric char at the end of the code. And whatever logic you have. After you've done all of that, add a Fuzzy Grouping component. And tell it to group off of your ContactID, NewCode (the one where you stripped the number) and Name.

This will give you a "typical" set of fuzzy results. What you're wanting though is when the _key_in == _key_out. So if it was in a table - you could put a checksum on the column and determine dupes that way. That's the same thinking as this.

So that will give you the 4 rows of distinct values that you're looking for.

You'd have to concat the rows in sql or by writing the data to a raw file and reading it back in, putting a script component in to do the work. (Or just inserting with an oledb command)

data flow fuzzy lookup transformation conditional split to get distinct rows
0
 
LVL 6

Author Comment

by:MuffyBunny
ID: 34997368
You ROCK
0
 
LVL 6

Author Closing Comment

by:MuffyBunny
ID: 34997370
Thanks a million
0

Featured Post

Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Viewers will learn how the fundamental information of how to create a table.

813 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now