Using SQLSAS or SAS to count duplicates in the same dataset

Hi,
I have an output dataset that I now need to count and identify duplicates in the same dataset. The dataset only has two variables .. ID_1 and ID_2. I have already run a proc summary on the dataset and classed it by IC_1 ID_2. So there are distinct ID_1 / ID_2 matches. But there are multiple ID_2's that can be associated with an ID_1. So for example, if ID_1 = 001 there could be ID_1 = 001, ID_2= AA; ID_1=001, ID_2=BB; ID_1=001, ID_2=CC etc etc..So some ID_1's have only one ID_2 associated with it and some have multiple ID_2's.
Ok so here is what I need to do. I need to be able to take a ID_1 / ID_2 match and search the rest of the ID_2s in the dataset to see if it is associated with any other ID_1s...excluding the initial ID_1 / ID_2 match. For example, if ID_1=001 and ID_2 = AA, then I want to find all the other ID_1s that have an ID_2=AA associated with it excluding ID_1=001.

Can I use proc sql to easily go through the dataset and count the number of ID_1's that each ID_2 is associated with. And also output each ID_1 that is associated with the ID_2 being looked at.

Appeciate any help!!
perrysf25Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

tobey1Commented:
From reading the text above, it sounds like you could use a PROC FREQ to get your answer .

PROC FREQ DATA={your dataset};
  TABLES ID_2*ID_1 / MISSING NOROW NOCOL NOPERCENT;
RUN;

This will give you a cross-tab of all the ID_2 occurrences by ID_1.

If that is not what you are after, then I have miss understood the question.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
wigmeisterCommented:
This first example will give you how many times id_1 occurs in your dataset, regardless of what is in id_2.  I wasn't sure by your example if that is what you wanted.  
proc sort data=test;
by id_1 id_2;

/*for the first one set counter to 0, increase by one for each additional record*/
data testout;
 set test;
by id_1 id_2;
if first.id_1 then dupcount = 0
else  dupcount + 1;
if last.id_1 then output;

If you want to see how many different id_2's there are for each id_1 then it will be something like this:

data testout;
 set test;
by id_1 id_2;
if first.id_2 then dupcount = 0
else  dupcount + 1;
if last.id_2 then output;

So some variation/combination of these two should get you what you want.  
Here is a brief reference:
http://www.pauldickman.com/teaching/sas/set_by.php
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Databases

From novice to tech pro — start learning today.