Using SQLSAS or SAS to count duplicates in the same dataset

Hi,
I have an output dataset that I now need to count and identify duplicates in the same dataset. The dataset only has two variables .. ID_1 and ID_2. I have already run a proc summary on the dataset and classed it by IC_1 ID_2. So there are distinct ID_1 / ID_2 matches. But there are multiple ID_2's that can be associated with an ID_1. So for example, if ID_1 = 001 there could be ID_1 = 001, ID_2= AA; ID_1=001, ID_2=BB; ID_1=001, ID_2=CC etc etc..So some ID_1's have only one ID_2 associated with it and some have multiple ID_2's.
Ok so here is what I need to do. I need to be able to take a ID_1 / ID_2 match and search the rest of the ID_2s in the dataset to see if it is associated with any other ID_1s...excluding the initial ID_1 / ID_2 match. For example, if ID_1=001 and ID_2 = AA, then I want to find all the other ID_1s that have an ID_2=AA associated with it excluding ID_1=001.

Can I use proc sql to easily go through the dataset and count the number of ID_1's that each ID_2 is associated with. And also output each ID_1 that is associated with the ID_2 being looked at.

Appeciate any help!!
perrysf25Asked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
tobey1Connect With a Mentor Commented:
From reading the text above, it sounds like you could use a PROC FREQ to get your answer .

PROC FREQ DATA={your dataset};
  TABLES ID_2*ID_1 / MISSING NOROW NOCOL NOPERCENT;
RUN;

This will give you a cross-tab of all the ID_2 occurrences by ID_1.

If that is not what you are after, then I have miss understood the question.
0
 
wigmeisterCommented:
This first example will give you how many times id_1 occurs in your dataset, regardless of what is in id_2.  I wasn't sure by your example if that is what you wanted.  
proc sort data=test;
by id_1 id_2;

/*for the first one set counter to 0, increase by one for each additional record*/
data testout;
 set test;
by id_1 id_2;
if first.id_1 then dupcount = 0
else  dupcount + 1;
if last.id_1 then output;

If you want to see how many different id_2's there are for each id_1 then it will be something like this:

data testout;
 set test;
by id_1 id_2;
if first.id_2 then dupcount = 0
else  dupcount + 1;
if last.id_2 then output;

So some variation/combination of these two should get you what you want.  
Here is a brief reference:
http://www.pauldickman.com/teaching/sas/set_by.php
0
All Courses

From novice to tech pro — start learning today.