?
Solved

Removing duplicates based on 2 columns in SAS

Posted on 2011-02-16
11
Medium Priority
?
578 Views
Last Modified: 2014-05-03
Hi, How do I remove duplicates based on 2 columns in SAS?
0
Comment
Question by:Wonderwall
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +2
11 Comments
 
LVL 9

Assisted Solution

by:bradanelson
bradanelson earned 500 total points
ID: 34911319
Here is an example on how to remove dups by 2 fields.

PROC SORT DATA=YourDataset NODUPKEY;
    BY Field1 Field2;
RUN;
0
 

Assisted Solution

by:autumnwings
autumnwings earned 500 total points
ID: 34911327
You are going to want to first sort your data set by the two variables you'd like to use as your key. Make sure you sort them in ascending/decending order as appropriate because the record that will be kept is the first one after you have sorted.

Then you will run a proc sort again and this time you will add the option 'nodupkey'. This option will only keep the first record in your dataset and eliminiate duplicates.

Example:
proc sort data=test ; by ID AGE ; run ;
proc sort data=test nodupkey ; by ID AGE ; run ;
0
 

Author Comment

by:Wonderwall
ID: 34911439
I did this
proc sort data=test ; by ID AGE ; run ;
proc sort data=test nodupkey ; by ID AGE ; run ;
and it got rid of the duplicates for the first record only, all the rest were kept
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Expert Comment

by:autumnwings
ID: 34911488
It should have deleted any duplicates of the combination of ID and AGE, are you sure it didn't work?
0
 

Author Comment

by:Wonderwall
ID: 34911582
yes, I looked at my output and it only worked on the first record
0
 
LVL 9

Expert Comment

by:bradanelson
ID: 34911588
Can you provide sample data to help troubleshoot your issue.
0
 
LVL 7

Accepted Solution

by:
d507201 earned 500 total points
ID: 34911961
This approach will create a data set with no duplicates--it will contain only unique records.  Where there are duplicates the record with the youngest age will be retained.  If you want the oldest age then change the BY statement to by ID descending Age.  If you want to capture the duplicate records in a data set then use the dupout= option.

  proc sort data=test ; by ID AGE ; run ;
  proc sort data=test nodupkey ; by ID AGE ; run ;

If you want to create a data set that contain only records that are unique to begin with--there is no conflicting age data--then try this approach.   This code will keep only IDs that have a single value for age-- they'll go into noQuestionAboutAge.  All other records will go into the questionableAge data set.  For example, both records id=100 age=18 and id=100 age=19 would be output to the questionableAge data set.

     proc sort data=test ;
        by id;
      run ;

     data noQuestionAboutAge   questionableAge;   set test;
      by id;
      if first.id and last.id then output noQuestionAboutAge;
      else output questionableAge;
     run;

first. and last. are temporary variables created when a DATA step is processed with BY groups.  I use it most often to control merging but it's also very useful for things like this and doing summarization in a DATA step.
0
 
LVL 11

Assisted Solution

by:theartfuldazzler
theartfuldazzler earned 500 total points
ID: 34913881
Hi

An easy code to remove all duplicates is:

PROC SQL;
  create table new_table as
  select * from old_table
 group by field1, field2
 having count(*) = 1;
quit;
0
 

Assisted Solution

by:Wonderwall
Wonderwall earned 0 total points
ID: 34921748
Thank you all for your help, finally found out that SAS stores dates as integers.  hmmm, problem fixed
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently I was talking with Tim Sharp, one of my colleagues from our Technical Account Manager team about MongoDB’s scalability. While doing some quick training with some of the Percona team, Tim brought something to my attention...
This article shows how to get a list of available printers for display in a drop-down list, and then to use the selected printer to print an Access report or a Word document filled with Access data, using different syntax as needed for working with …
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question