Solved

SELECT DISTINCT

Posted on 2011-02-25
6
541 Views
Last Modified: 2013-11-10
I have a dtsx package that I built and within one of the data flows, I'm manipulating the data in a way that makes me end up with a bunch of duplicates. Is there a way to produce the same effect as SELECT DISTINCT within one of the transformations?

Here's a mini version of what I start with:

ContactId   Code   Name
35              NK1     Joe Blow
35              NK3     Joe Blow
35              NP6     Joe Blow
19              SJ1     Jane Blow
19              SJ4     Jane Blow
19              NK1    Jane Blow

In one of my existing transformations I'm taking off the numeric character of the codes and ending up with:

ContactId   Code   Name
35              NK      Joe Blow
35              NK      Joe Blow
35              NP      Joe Blow
19              SJ      Jane Blow
19              SJ      Jane Blow
19              NK     Jane Blow

I want to end up with:

ContactId   Code   Name
35              NK      Joe Blow
35              NP      Joe Blow
19              SJ      Jane Blow
19              NK     Jane Blow

Can I achieve this with a transformation, or will it require a SQL Task?

In a really perfect world I would end up with
ContactId   Code   Name
35              NK,NP      Joe Blow
19              SJ,NK      Jane Blow
0
Comment
Question by:MuffyBunny
  • 3
  • 2
6 Comments
 
LVL 57

Expert Comment

by:Raja Jegan R
ID: 34980291
This would do:
SELECT t1.ContactId, t1.NAME, SUBSTRING((SELECT ', ' + t2.Code FROM cte t2 WHERE t1.ContactId = t2.ContactId AND t1.NAME = t2.name FOR XML PATH('')), 3, 1000) Code
FROM cte t1
group BY ContactId, NAME

Open in new window

0
 
LVL 6

Author Comment

by:MuffyBunny
ID: 34980330
I guess I should have specified that the data is originating from an IBM UniVerse ODBC data source. I do not have the ability to take the numbers off at the source. In fact, I don't have the ability to do much of anything with the data from the source other than a standard no-frills SELECT statement. All manipulations have to take place AFTER getting the dataset from the source.
0
 
LVL 57

Expert Comment

by:Raja Jegan R
ID: 34980364
Ok, in that case pull data from your IBM data source and insert records into temp tables..
Once done, you can run the query provided above to get the records transformed like this using either transformation or just the query above..
0
U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

 
LVL 7

Accepted Solution

by:
rmm2001 earned 500 total points
ID: 34989599
So you can do it using ssis. This way will get

ContactId   Code   Name
35              NK      Joe Blow
35              NP      Joe Blow
19              SJ      Jane Blow
19              NK     Jane Blow

First - add your source and logic to strip off the numeric char at the end of the code. And whatever logic you have. After you've done all of that, add a Fuzzy Grouping component. And tell it to group off of your ContactID, NewCode (the one where you stripped the number) and Name.

This will give you a "typical" set of fuzzy results. What you're wanting though is when the _key_in == _key_out. So if it was in a table - you could put a checksum on the column and determine dupes that way. That's the same thinking as this.

So that will give you the 4 rows of distinct values that you're looking for.

You'd have to concat the rows in sql or by writing the data to a raw file and reading it back in, putting a script component in to do the work. (Or just inserting with an oledb command)

data flow fuzzy lookup transformation conditional split to get distinct rows
0
 
LVL 6

Author Comment

by:MuffyBunny
ID: 34997368
You ROCK
0
 
LVL 6

Author Closing Comment

by:MuffyBunny
ID: 34997370
Thanks a million
0

Featured Post

Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I wrote this interesting script that really help me find jobs or procedures when working in a huge environment. I could I have written it as a Procedure but then I would have to have it on each machine or have a link to a server-related search that …
Why is this different from all of the other step by step guides?  Because I make a living as a DBA and not as a writer and I lived through this experience. Defining the name: When I talk to people they say different names on this subject stuff l…
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

27 Experts available now in Live!

Get 1:1 Help Now