Solved

Removing duplicates based on 2 columns in SAS

Posted on 2011-02-16
11
546 Views
Last Modified: 2014-05-03
Hi, How do I remove duplicates based on 2 columns in SAS?
0
Comment
Question by:Wonderwall
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +2
11 Comments
 
LVL 9

Assisted Solution

by:bradanelson
bradanelson earned 125 total points
ID: 34911319
Here is an example on how to remove dups by 2 fields.

PROC SORT DATA=YourDataset NODUPKEY;
    BY Field1 Field2;
RUN;
0
 

Assisted Solution

by:autumnwings
autumnwings earned 125 total points
ID: 34911327
You are going to want to first sort your data set by the two variables you'd like to use as your key. Make sure you sort them in ascending/decending order as appropriate because the record that will be kept is the first one after you have sorted.

Then you will run a proc sort again and this time you will add the option 'nodupkey'. This option will only keep the first record in your dataset and eliminiate duplicates.

Example:
proc sort data=test ; by ID AGE ; run ;
proc sort data=test nodupkey ; by ID AGE ; run ;
0
 

Author Comment

by:Wonderwall
ID: 34911439
I did this
proc sort data=test ; by ID AGE ; run ;
proc sort data=test nodupkey ; by ID AGE ; run ;
and it got rid of the duplicates for the first record only, all the rest were kept
0
Webinar: MongoDB® Index Types

Join Percona’s Senior Technical Services Engineer, Adamo Tonete as he presents “MongoDB Index Types, How, When and Where Should They be Used?” on Wednesday, July 12, 2017 at 11:00 am PDT / 2:00 pm EDT (UTC-7).

 

Expert Comment

by:autumnwings
ID: 34911488
It should have deleted any duplicates of the combination of ID and AGE, are you sure it didn't work?
0
 

Author Comment

by:Wonderwall
ID: 34911582
yes, I looked at my output and it only worked on the first record
0
 
LVL 9

Expert Comment

by:bradanelson
ID: 34911588
Can you provide sample data to help troubleshoot your issue.
0
 
LVL 7

Accepted Solution

by:
d507201 earned 125 total points
ID: 34911961
This approach will create a data set with no duplicates--it will contain only unique records.  Where there are duplicates the record with the youngest age will be retained.  If you want the oldest age then change the BY statement to by ID descending Age.  If you want to capture the duplicate records in a data set then use the dupout= option.

  proc sort data=test ; by ID AGE ; run ;
  proc sort data=test nodupkey ; by ID AGE ; run ;

If you want to create a data set that contain only records that are unique to begin with--there is no conflicting age data--then try this approach.   This code will keep only IDs that have a single value for age-- they'll go into noQuestionAboutAge.  All other records will go into the questionableAge data set.  For example, both records id=100 age=18 and id=100 age=19 would be output to the questionableAge data set.

     proc sort data=test ;
        by id;
      run ;

     data noQuestionAboutAge   questionableAge;   set test;
      by id;
      if first.id and last.id then output noQuestionAboutAge;
      else output questionableAge;
     run;

first. and last. are temporary variables created when a DATA step is processed with BY groups.  I use it most often to control merging but it's also very useful for things like this and doing summarization in a DATA step.
0
 
LVL 11

Assisted Solution

by:theartfuldazzler
theartfuldazzler earned 125 total points
ID: 34913881
Hi

An easy code to remove all duplicates is:

PROC SQL;
  create table new_table as
  select * from old_table
 group by field1, field2
 having count(*) = 1;
quit;
0
 

Assisted Solution

by:Wonderwall
Wonderwall earned 0 total points
ID: 34921748
Thank you all for your help, finally found out that SAS stores dates as integers.  hmmm, problem fixed
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows the steps required to install WordPress on Azure. Web Apps, Mobile Apps, API Apps, or Functions, in Azure all these run in an App Service plan. WordPress is no exception and requires an App Service Plan and Database to install
Recently I was talking with Tim Sharp, one of my colleagues from our Technical Account Manager team about MongoDB’s scalability. While doing some quick training with some of the Percona team, Tim brought something to my attention...
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question