Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Removing duplicates based on 2 columns in SAS

Posted on 2011-02-16
11
Medium Priority
?
630 Views
Last Modified: 2014-05-03
Hi, How do I remove duplicates based on 2 columns in SAS?
0
Comment
Question by:Wonderwall
  • 3
  • 2
  • 2
  • +2
9 Comments
 
LVL 9

Assisted Solution

by:bradanelson
bradanelson earned 500 total points
ID: 34911319
Here is an example on how to remove dups by 2 fields.

PROC SORT DATA=YourDataset NODUPKEY;
    BY Field1 Field2;
RUN;
0
 

Assisted Solution

by:autumnwings
autumnwings earned 500 total points
ID: 34911327
You are going to want to first sort your data set by the two variables you'd like to use as your key. Make sure you sort them in ascending/decending order as appropriate because the record that will be kept is the first one after you have sorted.

Then you will run a proc sort again and this time you will add the option 'nodupkey'. This option will only keep the first record in your dataset and eliminiate duplicates.

Example:
proc sort data=test ; by ID AGE ; run ;
proc sort data=test nodupkey ; by ID AGE ; run ;
0
 

Author Comment

by:Wonderwall
ID: 34911439
I did this
proc sort data=test ; by ID AGE ; run ;
proc sort data=test nodupkey ; by ID AGE ; run ;
and it got rid of the duplicates for the first record only, all the rest were kept
0
Get free NFR key for Veeam Availability Suite 9.5

Veeam is happy to provide a free NFR license (1 year, 2 sockets) to all certified IT Pros. The license allows for the non-production use of Veeam Availability Suite v9.5 in your home lab, without any feature limitations. It works for both VMware and Hyper-V environments

 

Expert Comment

by:autumnwings
ID: 34911488
It should have deleted any duplicates of the combination of ID and AGE, are you sure it didn't work?
0
 

Author Comment

by:Wonderwall
ID: 34911582
yes, I looked at my output and it only worked on the first record
0
 
LVL 9

Expert Comment

by:bradanelson
ID: 34911588
Can you provide sample data to help troubleshoot your issue.
0
 
LVL 7

Accepted Solution

by:
d507201 earned 500 total points
ID: 34911961
This approach will create a data set with no duplicates--it will contain only unique records.  Where there are duplicates the record with the youngest age will be retained.  If you want the oldest age then change the BY statement to by ID descending Age.  If you want to capture the duplicate records in a data set then use the dupout= option.

  proc sort data=test ; by ID AGE ; run ;
  proc sort data=test nodupkey ; by ID AGE ; run ;

If you want to create a data set that contain only records that are unique to begin with--there is no conflicting age data--then try this approach.   This code will keep only IDs that have a single value for age-- they'll go into noQuestionAboutAge.  All other records will go into the questionableAge data set.  For example, both records id=100 age=18 and id=100 age=19 would be output to the questionableAge data set.

     proc sort data=test ;
        by id;
      run ;

     data noQuestionAboutAge   questionableAge;   set test;
      by id;
      if first.id and last.id then output noQuestionAboutAge;
      else output questionableAge;
     run;

first. and last. are temporary variables created when a DATA step is processed with BY groups.  I use it most often to control merging but it's also very useful for things like this and doing summarization in a DATA step.
0
 
LVL 11

Assisted Solution

by:theartfuldazzler
theartfuldazzler earned 500 total points
ID: 34913881
Hi

An easy code to remove all duplicates is:

PROC SQL;
  create table new_table as
  select * from old_table
 group by field1, field2
 having count(*) = 1;
quit;
0
 

Assisted Solution

by:Wonderwall
Wonderwall earned 0 total points
ID: 34921748
Thank you all for your help, finally found out that SAS stores dates as integers.  hmmm, problem fixed
0

Featured Post

Receive 1:1 tech help

Solve your biggest tech problems alongside global tech experts with 1:1 help.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Backups and Disaster RecoveryIn this post, we’ll look at strategies for backups and disaster recovery.
Creating a Cordova application which allow user to save to/load from his Dropbox account the application database.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …
Suggested Courses

577 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question