Solved

Removing duplicates based on 2 columns in SAS

Posted on 2011-02-16
11
517 Views
Last Modified: 2014-05-03
Hi, How do I remove duplicates based on 2 columns in SAS?
0
Comment
Question by:Wonderwall
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +2
11 Comments
 
LVL 9

Assisted Solution

by:bradanelson
bradanelson earned 125 total points
ID: 34911319
Here is an example on how to remove dups by 2 fields.

PROC SORT DATA=YourDataset NODUPKEY;
    BY Field1 Field2;
RUN;
0
 

Assisted Solution

by:autumnwings
autumnwings earned 125 total points
ID: 34911327
You are going to want to first sort your data set by the two variables you'd like to use as your key. Make sure you sort them in ascending/decending order as appropriate because the record that will be kept is the first one after you have sorted.

Then you will run a proc sort again and this time you will add the option 'nodupkey'. This option will only keep the first record in your dataset and eliminiate duplicates.

Example:
proc sort data=test ; by ID AGE ; run ;
proc sort data=test nodupkey ; by ID AGE ; run ;
0
 

Author Comment

by:Wonderwall
ID: 34911439
I did this
proc sort data=test ; by ID AGE ; run ;
proc sort data=test nodupkey ; by ID AGE ; run ;
and it got rid of the duplicates for the first record only, all the rest were kept
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Expert Comment

by:autumnwings
ID: 34911488
It should have deleted any duplicates of the combination of ID and AGE, are you sure it didn't work?
0
 

Author Comment

by:Wonderwall
ID: 34911582
yes, I looked at my output and it only worked on the first record
0
 
LVL 9

Expert Comment

by:bradanelson
ID: 34911588
Can you provide sample data to help troubleshoot your issue.
0
 
LVL 7

Accepted Solution

by:
d507201 earned 125 total points
ID: 34911961
This approach will create a data set with no duplicates--it will contain only unique records.  Where there are duplicates the record with the youngest age will be retained.  If you want the oldest age then change the BY statement to by ID descending Age.  If you want to capture the duplicate records in a data set then use the dupout= option.

  proc sort data=test ; by ID AGE ; run ;
  proc sort data=test nodupkey ; by ID AGE ; run ;

If you want to create a data set that contain only records that are unique to begin with--there is no conflicting age data--then try this approach.   This code will keep only IDs that have a single value for age-- they'll go into noQuestionAboutAge.  All other records will go into the questionableAge data set.  For example, both records id=100 age=18 and id=100 age=19 would be output to the questionableAge data set.

     proc sort data=test ;
        by id;
      run ;

     data noQuestionAboutAge   questionableAge;   set test;
      by id;
      if first.id and last.id then output noQuestionAboutAge;
      else output questionableAge;
     run;

first. and last. are temporary variables created when a DATA step is processed with BY groups.  I use it most often to control merging but it's also very useful for things like this and doing summarization in a DATA step.
0
 
LVL 11

Assisted Solution

by:theartfuldazzler
theartfuldazzler earned 125 total points
ID: 34913881
Hi

An easy code to remove all duplicates is:

PROC SQL;
  create table new_table as
  select * from old_table
 group by field1, field2
 having count(*) = 1;
quit;
0
 

Assisted Solution

by:Wonderwall
Wonderwall earned 0 total points
ID: 34921748
Thank you all for your help, finally found out that SAS stores dates as integers.  hmmm, problem fixed
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
form builder not starting 3 72
Very Large data in MYSQL 7 95
MS SQL GROUP BY 6 82
Dynamic Table mySQL stored procedure 5 37
Read about achieving the basic levels of HRIS security in the workplace.
These days, all we hear about hacktivists took down so and so websites and retrieved thousands of user’s data. One of the techniques to get unauthorized access to database is by performing SQL injection. This article is quite lengthy which gives bas…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question