Link to home
Start Free TrialLog in
Avatar of labradorchik
labradorchikFlag for United States of America

asked on

Creating Labels and Frequency list style reports in SAS code

Hello, I am trying to create labels for a few variables and then create frequency list style reports (not tables).

In the below  SAS code I am attempting to use var6* and var10 variable and apply labels to all frequencies output as applicable.
Note: * = data1, data2, or data3.  

title 'Match Data1, Data2, Data3 Comparison' ;

/* Creating Labels */

data dir1.output2;
  set dir1.output;
  LABEL  if var6data1 = D then 'Duplicate'
         if var6data1 = B then 'Bad Match' 
         if var6data1 = N then 'New Match'
         if var6data2 = D then 'Duplicate'
         if var6data2 = B then 'Bad Match' 
         if var6data2 = N then 'New Match'
         if var6data3 = D then 'Duplicate'
         if var6data3 = B then 'Bad Match' 
         if var6data3 = N then 'New Match';
run;

data dir1.output3;
  set dir1.output2;
  LABEL if var10 = 1 then 'All'
        if var10 = 2 then 'Data1-Data2'
        if var10 = 3 then 'Data1';
run;


/* Frequency of var10 and var6data1 by var6data2 by var6data3 variables in the code below */

proc freq data=dir1.output3;
  tables var10 / missing out=frequency.lis;
    title 'Frequency of Var10';
run;
           
proc freq data=dir1.output3;
  tables var6data1*var6data2*var6data3 / missing out=frequency.lis;
    title 'var6 Comparison';
run;    

Open in new window


Questions:
Is my code correct? Is there a better way of creating  labels for these variables and then creating/outputting frequencies?  How may I assign each label to a correct frequency output?
Since my input datasets contain 100 thousands of records I am looking for the most sufficient way possible.  

Thank you in advance!
Avatar of d507201
d507201
Flag of United States of America image

You can't assign labels using a LABEL statement and if-then statements.  You would have to create a new variable for each unique value of the source variable.  But you can do it with a format.  A format substitutes values for a variable when the variable is displayed, such as in a proc print.

Here's one way to do it.  As with SAS, there are multiple ways.  Google SAS FORMAT for ocumentation.

** Use a proc format to put the desired values in conceptual lookup table. ;
** The VALUE statement specifies the name of the format (or name of the look-up table).  Notice it starts with a dollar sign.  The $ indicates that the source values are character.  If the source values were numeric then we would not start the format name with a $. ;
** I think that the formatted value does not have a length limit (probably 32,000 chars), but realistically use no more than 30 or so chars for readability. ;
proc format;
            value $var6data
                  'D' = 'Duplicate'
                  'B' = 'Bad Match'
                  'N' = 'New Match'
                  other = 'Other';
run;

** Use a FORMAT statement to link the variable with its format. ;                        
** After doing this, whenever the variable is displayed you will see the formatted value, not the underlying value. ;
data dir1.output2;
  set dir1.output;  format var6data $var6data.;
  label var6data='Variable for Format Test';
run;

** When this prints you will see the formatted value. ;
proc print data=dir1.output2;
            var var6data;
run;
Avatar of labradorchik

ASKER

Thank you very much, d501201!

That's explain why I was not getting anything in my labels.
So, in my case, since I have three different variables for var6* (var6data1, var6data2, and var6data3) based on your example I can do the following, correct?  
I also corrected var10 based on your example following by the PROC FREQ steps.

/* Formatting var6* variables */
proc format;
     value $var6data1
     'D' = 'Duplicate'
     'B' = 'Bad Match'
     'N' = 'New Match';
run;

proc format;
     value $var6data2
     'D' = 'Duplicate'
     'B' = 'Bad Match'
     'N' = 'New Match';
run;

proc format;
     value $var6data3
     'D' = 'Duplicate'
     'B' = 'Bad Match'
     'N' = 'New Match';
run;

data dir1.output2;
  set dir1.output;
     format var6data1 $var6data1.;
  label var6data1='Variable from data1 for Format Test';
  
     format var6data2 $var6data2.;
  label var6data2='Variable from data2 for Format Test';
  
     format var6data3 $var6data3.;
  label var6data3='Variable from data3 for Format Test';
run;

proc print data=dir1.output2;
            var var6data1 var6data2 var6data3;
run; 


/* Formatting var10 variable */
proc format;
     value $var10
     '1' = 'All'
     '2' = 'Data1-Data2'
     '3' = 'Data1';
run;

data dir1.output3;
  set dir1.outpu2;
     format var10 $var10.;
  label var10='Variable from output2 for Format Test';
run;

proc print data=dir1.output3;
            var var10;
run; 


/* Creating Frequencies for var10 and var6* variables*/ 
proc freq data=dir1.output3;
  tables var10 / missing out=frequency.lis;
    title 'Frequency of Var10';
run;
           
proc freq data=dir1.output3;
  tables var6data1*var6data2*var6data3 / missing out=frequency.lis;
    title 'var6 Comparison';
run;   

Open in new window


May I somehow combine those 4 PROC FORMAT steps into one step?  Also, may I combine 2 PROC FREQ steps into one step?
You're on the right track.  Having a format for each variable will certainly work, but a format does not have to be variable-specific.  That is, multiple variables can share the same format.  Var6data1, 2, and 3 can all reference the same format, perhaps renamed $var6data, in a single FORMAT statement.  

And multiple label assignments can be specified in one LABEL statement.

If you ever get an error saying something like 'format not found', check to see that the format specified in the FORMAT statement ends in a dot.  Otherwise SAS will not recognize it as a format.  This is a pretty common error.  

data dir1.output2;
  set dir1.output;
     format var6data1 var6data2 var6data3 $var6data1.;
     label var6data1='Variable from data1 for Format Test'
              var6data2='Variable from data2 for Format Test'
             var6data3='Variable from data3 for Format Test';
run;
Great! That's good new that I can put together formats. In the value I just called it now as $vardatas.
Can I also somehow combine my PROC FREQ steps in just one step at the bottom of the code?

Here is my rewritten code:
* Formatting var6 variable */
proc format;
     value $var6datas
     'D' = 'Duplicate'
     'B' = 'Bad Match'
     'N' = 'New Match';
run;

data dir1.output2;
  set dir1.output;
     format var6data1 var6data2 var6data3 $var6datas.;
  label var6data1='Variable from data1 for Format Test';
  label var6data2='Variable from data2 for Format Test';
  label var6data3='Variable from data3 for Format Test';
run;

proc print data=dir1.output2;
            var var6data1 var6data2 var6data3;
run; 


/* Formatting var10 variable */
proc format;
     value $var10
     '1' = 'All'
     '2' = 'Data1-Data2'
     '3' = 'Data1';
run;

data dir1.output3;
  set dir1.outpu2;
     format var10 $var10.;
  label var10='Var10 variable from output2 for Format Test';
run;

proc print data=dir1.output3;
            var var10;
run; 


/* Creating Frequencies */ 
         
proc freq data=dir1.output3;
  tables var10 / missing out=frequency.lis;
    title 'Frequency of Var10';
run;
           
proc freq data=dir1.output3;
  tables var6data1*var6data2*var6data3 / missing out=frequency.lis;
    title 'var6 Comparison';
run;

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of d507201
d507201
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you for your suggestions, d507201!
I would have to test this with my testing data and I will let you know it went, but I think I just want all frequencies go to the report file frequiencies.lis
I am not sure if I am on a correct path here, since I have only one input dataset, which is called "outdata" and in this code I am just trying to produce frequencies/counts, which would appear in one file, - called "frequencies.lis".

I could't test the code yet but I am still not sure how PROC FORMAT statement in the beginning of the program will assign all labels for "var6*" (Note: * = data1, data2, or data3) and "var10" variable in PROC FREQ frequencies report at the bottom of the program? I just don't get the part how format assigns for example "Duplicate" label when variable "var6data1" has "D" in some of records in "outdata" dataset? And how this for example "Duplicate" label then assigned to variables in PROC FREQ or PROC PRINT? What should go instead of '...' in the code below?

data dir1.output2; 
 set dir1.outdata; 
  format var6data1 var6data2 var6data3 $var6datas.; 
    label var6data1='...'; 
    label var6data2='...'; 
    label var6data3='...'; 
run;

Open in new window

I have tested the code below and the code works and the report is produced with correct variables/counts/formats, just the report goes automatically straight to the "sas-program-name.lst" file and not to the "frequency.lst" file report that I wanted, so how would I direct "freq1" and "freq2" datasets to the "frequency.lst"?

Also, do you know how may I create a list style report instead of table style?

/* Formatting var6 and var10 variable */ 
proc format; 
value $var6datas 
  'D' = 'Duplicate'
  'B' = 'Bad Match'
  'N' = 'New Match';

value var10 
  1 = 'All'
  2 = 'Data1-Data2'
  3 = 'Data1';
quit; 

/* Creating Frequencies */ 
proc freq data=dir1.outdata noprint;
tables var10 / missing out=dir1.freq1;
tables var6data1*var6data2*var6data3 / missing out=dir1.freq2;
run;

/* Proc PRINTs */ 
proc print data=dir1.freq1;
format var6data1 var6data2 var6data3 $var6datas.;
Title 'Frequencies for var6'; 
run; 

proc print data=dir1.freq2;
format var10 var10.;
Title 'Frequencies for var10'; 
run; 

Open in new window

Thank you very much for your suggestions and help! The code worked as you mentioned in your comments.