?
Solved

SELECT DISTINCT

Posted on 2011-02-25
6
Medium Priority
?
561 Views
Last Modified: 2013-11-10
I have a dtsx package that I built and within one of the data flows, I'm manipulating the data in a way that makes me end up with a bunch of duplicates. Is there a way to produce the same effect as SELECT DISTINCT within one of the transformations?

Here's a mini version of what I start with:

ContactId   Code   Name
35              NK1     Joe Blow
35              NK3     Joe Blow
35              NP6     Joe Blow
19              SJ1     Jane Blow
19              SJ4     Jane Blow
19              NK1    Jane Blow

In one of my existing transformations I'm taking off the numeric character of the codes and ending up with:

ContactId   Code   Name
35              NK      Joe Blow
35              NK      Joe Blow
35              NP      Joe Blow
19              SJ      Jane Blow
19              SJ      Jane Blow
19              NK     Jane Blow

I want to end up with:

ContactId   Code   Name
35              NK      Joe Blow
35              NP      Joe Blow
19              SJ      Jane Blow
19              NK     Jane Blow

Can I achieve this with a transformation, or will it require a SQL Task?

In a really perfect world I would end up with
ContactId   Code   Name
35              NK,NP      Joe Blow
19              SJ,NK      Jane Blow
0
Comment
Question by:MuffyBunny
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 57

Expert Comment

by:Raja Jegan R
ID: 34980291
This would do:
SELECT t1.ContactId, t1.NAME, SUBSTRING((SELECT ', ' + t2.Code FROM cte t2 WHERE t1.ContactId = t2.ContactId AND t1.NAME = t2.name FOR XML PATH('')), 3, 1000) Code
FROM cte t1
group BY ContactId, NAME

Open in new window

0
 
LVL 6

Author Comment

by:MuffyBunny
ID: 34980330
I guess I should have specified that the data is originating from an IBM UniVerse ODBC data source. I do not have the ability to take the numbers off at the source. In fact, I don't have the ability to do much of anything with the data from the source other than a standard no-frills SELECT statement. All manipulations have to take place AFTER getting the dataset from the source.
0
 
LVL 57

Expert Comment

by:Raja Jegan R
ID: 34980364
Ok, in that case pull data from your IBM data source and insert records into temp tables..
Once done, you can run the query provided above to get the records transformed like this using either transformation or just the query above..
0
Get proactive database performance tuning online

At Percona’s web store you can order full Percona Database Performance Audit in minutes. Find out the health of your database, and how to improve it. Pay online with a credit card. Improve your database performance now!

 
LVL 7

Accepted Solution

by:
rmm2001 earned 2000 total points
ID: 34989599
So you can do it using ssis. This way will get

ContactId   Code   Name
35              NK      Joe Blow
35              NP      Joe Blow
19              SJ      Jane Blow
19              NK     Jane Blow

First - add your source and logic to strip off the numeric char at the end of the code. And whatever logic you have. After you've done all of that, add a Fuzzy Grouping component. And tell it to group off of your ContactID, NewCode (the one where you stripped the number) and Name.

This will give you a "typical" set of fuzzy results. What you're wanting though is when the _key_in == _key_out. So if it was in a table - you could put a checksum on the column and determine dupes that way. That's the same thinking as this.

So that will give you the 4 rows of distinct values that you're looking for.

You'd have to concat the rows in sql or by writing the data to a raw file and reading it back in, putting a script component in to do the work. (Or just inserting with an oledb command)

data flow fuzzy lookup transformation conditional split to get distinct rows
0
 
LVL 6

Author Comment

by:MuffyBunny
ID: 34997368
You ROCK
0
 
LVL 6

Author Closing Comment

by:MuffyBunny
ID: 34997370
Thanks a million
0

Featured Post

10 Questions to Ask when Buying Backup Software

Choosing the right backup solution for your organization can be a daunting task. To make the selection process easier, ask solution providers these 10 key questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
Recently we ran in to an issue while running some SQL jobs where we were trying to process the cubes.  We got an error saying failure stating 'NT SERVICE\SQLSERVERAGENT does not have access to Analysis Services. So this is a way to automate that wit…
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question