Solved

Removing duplicates based on 2 columns in SAS

Posted on 2011-02-16
11
478 Views
Last Modified: 2014-05-03
Hi, How do I remove duplicates based on 2 columns in SAS?
0
Comment
Question by:Wonderwall
  • 3
  • 2
  • 2
  • +2
11 Comments
 
LVL 9

Assisted Solution

by:bradanelson
bradanelson earned 125 total points
ID: 34911319
Here is an example on how to remove dups by 2 fields.

PROC SORT DATA=YourDataset NODUPKEY;
    BY Field1 Field2;
RUN;
0
 

Assisted Solution

by:autumnwings
autumnwings earned 125 total points
ID: 34911327
You are going to want to first sort your data set by the two variables you'd like to use as your key. Make sure you sort them in ascending/decending order as appropriate because the record that will be kept is the first one after you have sorted.

Then you will run a proc sort again and this time you will add the option 'nodupkey'. This option will only keep the first record in your dataset and eliminiate duplicates.

Example:
proc sort data=test ; by ID AGE ; run ;
proc sort data=test nodupkey ; by ID AGE ; run ;
0
 

Author Comment

by:Wonderwall
ID: 34911439
I did this
proc sort data=test ; by ID AGE ; run ;
proc sort data=test nodupkey ; by ID AGE ; run ;
and it got rid of the duplicates for the first record only, all the rest were kept
0
 

Expert Comment

by:autumnwings
ID: 34911488
It should have deleted any duplicates of the combination of ID and AGE, are you sure it didn't work?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:Wonderwall
ID: 34911582
yes, I looked at my output and it only worked on the first record
0
 
LVL 9

Expert Comment

by:bradanelson
ID: 34911588
Can you provide sample data to help troubleshoot your issue.
0
 
LVL 7

Accepted Solution

by:
d507201 earned 125 total points
ID: 34911961
This approach will create a data set with no duplicates--it will contain only unique records.  Where there are duplicates the record with the youngest age will be retained.  If you want the oldest age then change the BY statement to by ID descending Age.  If you want to capture the duplicate records in a data set then use the dupout= option.

  proc sort data=test ; by ID AGE ; run ;
  proc sort data=test nodupkey ; by ID AGE ; run ;

If you want to create a data set that contain only records that are unique to begin with--there is no conflicting age data--then try this approach.   This code will keep only IDs that have a single value for age-- they'll go into noQuestionAboutAge.  All other records will go into the questionableAge data set.  For example, both records id=100 age=18 and id=100 age=19 would be output to the questionableAge data set.

     proc sort data=test ;
        by id;
      run ;

     data noQuestionAboutAge   questionableAge;   set test;
      by id;
      if first.id and last.id then output noQuestionAboutAge;
      else output questionableAge;
     run;

first. and last. are temporary variables created when a DATA step is processed with BY groups.  I use it most often to control merging but it's also very useful for things like this and doing summarization in a DATA step.
0
 
LVL 11

Assisted Solution

by:theartfuldazzler
theartfuldazzler earned 125 total points
ID: 34913881
Hi

An easy code to remove all duplicates is:

PROC SQL;
  create table new_table as
  select * from old_table
 group by field1, field2
 having count(*) = 1;
quit;
0
 

Assisted Solution

by:Wonderwall
Wonderwall earned 0 total points
ID: 34921748
Thank you all for your help, finally found out that SAS stores dates as integers.  hmmm, problem fixed
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

SQL Command Tool comes with APEX under SQL Workshop. It helps us to make changes on the database directly using a graphical user interface. This helps us writing any SQL/ PLSQL queries and execute it on the database and we can create any database ob…
I annotated my article on ransomware somewhat extensively, but I keep adding new references and wanted to put a link to the reference library.  Despite all the reference tools I have on hand, it was not easy to find a way to do this easily. I finall…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now