SASnewbie
asked on
How do I find the distribution of common values between 2 datasets?
Hi,
I have 2 datasets (1 & 2) that have 4 variables in common (A, B. C, D). How do I find the frequency (or count) of the common values found in both datasets?
Variable A is the merge by variable.
For example:
Dataset1:
Variable B has 50 records containing one of these values (10, 15, 25, 45...)
Dataset2:
Variable B has 150 records containing one of these values (10, 15, 25, 45, 65 80...)
How can I produce an output like this?
Frequency of common values between dataset1 and dataset2:
VariableB
Value Dataset1 Freq Dataset2 Freq Common Freq % of Total Total (combined)
10 75 80 50 25.0% 200
15 25 40 25 12.5% 200
25 50 80 10 5.0% 200
And so on for Variables C and D...
Thanks in advance
I have 2 datasets (1 & 2) that have 4 variables in common (A, B. C, D). How do I find the frequency (or count) of the common values found in both datasets?
Variable A is the merge by variable.
For example:
Dataset1:
Variable B has 50 records containing one of these values (10, 15, 25, 45...)
Dataset2:
Variable B has 150 records containing one of these values (10, 15, 25, 45, 65 80...)
How can I produce an output like this?
Frequency of common values between dataset1 and dataset2:
VariableB
Value Dataset1 Freq Dataset2 Freq Common Freq % of Total Total (combined)
10 75 80 50 25.0% 200
15 25 40 25 12.5% 200
25 50 80 10 5.0% 200
And so on for Variables C and D...
Thanks in advance
what do you mean by common frequency? i'm not understanding how (using your example above) common frequency is 50 for value 10 when dataset1 frequency is 75 and dataset2 frequency is 80.
ASKER
hi lowaloysius,
The frequency is 75 when looking for the frequency of variableB value 10 in dataset1 alone.
The frequency is 80 when looking for the frequency of variableB value 10 in dataset2 alone.
The frequency is 50 when looking for the frequency of variableB value 10 where it is found in BOTH dataset1 AND dataset2.
Something like this:
rsubmit;proc freq data = dataset1;tables variableB / out = freq_results_dataset1;run; endrsubmit ;
rsubmit;proc freq data = dataset2;tables variableB / out = freq_results_dataset2 ;run;endrsubmit;
rsubmit;proc freq data = datasets_combined;tables variableB / out = freq_results_combined2;run ;endrsubmi t;
Thanks,
The frequency is 75 when looking for the frequency of variableB value 10 in dataset1 alone.
The frequency is 80 when looking for the frequency of variableB value 10 in dataset2 alone.
The frequency is 50 when looking for the frequency of variableB value 10 where it is found in BOTH dataset1 AND dataset2.
Something like this:
rsubmit;proc freq data = dataset1;tables variableB / out = freq_results_dataset1;run;
rsubmit;proc freq data = dataset2;tables variableB / out = freq_results_dataset2 ;run;endrsubmit;
rsubmit;proc freq data = datasets_combined;tables variableB / out = freq_results_combined2;run
Thanks,
ASKER
I want to change the combined proc freq to the below. It is created by merging dataset1 and dataset2 by VariableA:
rsubmit;proc freq data =datasets_merged;tables VariableB / out = freq_results_combined;run; endrsubmit ;
rsubmit;proc freq data =datasets_merged;tables VariableB / out = freq_results_combined;run;
and if i understand you correctly, you want 1 proc freq step to accomplish this? are you able to accept other solutions like calculating separately and creating a summary table before using proc print?
ASKER
hi lowaloysius,
I am open to any method.
I am open to any method.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi loyaloysius,
I have worked 1-3. Could you provide examples of how to do 4 and 5?
I have worked 1-3. Could you provide examples of how to do 4 and 5?
for #4 (i would assuming that every record has a value in variable B, and even if there's no value, it should still be counted):
proc sql;
select count(1) into :total from merged dataset name
quit;
for #5 (must run #4 to initialize the macro variable total first):
proc sql;
create table final_frequency as
select coalesce(a.variableB, b.variableB, c.variableB) as variableB,
a.frequency column name as dataset1_freq,
b.frequency column name as dataset2_freq,
c.frequency column name as common_freq,
c.frequency column name / &total as percent_of_total,
&total as total_combined
from summarized step 1 dataset name a
full join summarized step 2 dataset name b
on a.variableB = b.variableB
full join summarized step 3 dataset name c
on a.variableB = c.variableB;
quit;
then do proc print:
proc print data = final_frequency noobs; run;
note that the text in italics are tables names which you'll need to replace accordingly with what you have currently
note also i wrote this code blindly i.e. no testing is done as i don't have an environment to test to ensure the syntax etc is correct. however, if there are any errors, it shouldn't be too far off and if you need help, post the log for the section of code in question for troubleshooting :)
proc sql;
select count(1) into :total from merged dataset name
quit;
for #5 (must run #4 to initialize the macro variable total first):
proc sql;
create table final_frequency as
select coalesce(a.variableB, b.variableB, c.variableB) as variableB,
a.frequency column name as dataset1_freq,
b.frequency column name as dataset2_freq,
c.frequency column name as common_freq,
c.frequency column name / &total as percent_of_total,
&total as total_combined
from summarized step 1 dataset name a
full join summarized step 2 dataset name b
on a.variableB = b.variableB
full join summarized step 3 dataset name c
on a.variableB = c.variableB;
quit;
then do proc print:
proc print data = final_frequency noobs; run;
note that the text in italics are tables names which you'll need to replace accordingly with what you have currently
note also i wrote this code blindly i.e. no testing is done as i don't have an environment to test to ensure the syntax etc is correct. however, if there are any errors, it shouldn't be too far off and if you need help, post the log for the section of code in question for troubleshooting :)
ASKER
thank you. I'll give this a run!
From looking in the docs it looks like COMPARE can get you most of what you want if not all of it.
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#compare-overview.htm