Creating Labels and Frequency list style reports in SAS code

Hello, I am trying to create labels for a few variables and then create frequency list style reports (not tables).

In the below  SAS code I am attempting to use var6* and var10 variable and apply labels to all frequencies output as applicable.
Note: * = data1, data2, or data3.  

title 'Match Data1, Data2, Data3 Comparison' ;

/* Creating Labels */

data dir1.output2;
  set dir1.output;
  LABEL  if var6data1 = D then 'Duplicate'
         if var6data1 = B then 'Bad Match' 
         if var6data1 = N then 'New Match'
         if var6data2 = D then 'Duplicate'
         if var6data2 = B then 'Bad Match' 
         if var6data2 = N then 'New Match'
         if var6data3 = D then 'Duplicate'
         if var6data3 = B then 'Bad Match' 
         if var6data3 = N then 'New Match';
run;

data dir1.output3;
  set dir1.output2;
  LABEL if var10 = 1 then 'All'
        if var10 = 2 then 'Data1-Data2'
        if var10 = 3 then 'Data1';
run;


/* Frequency of var10 and var6data1 by var6data2 by var6data3 variables in the code below */

proc freq data=dir1.output3;
  tables var10 / missing out=frequency.lis;
    title 'Frequency of Var10';
run;
           
proc freq data=dir1.output3;
  tables var6data1*var6data2*var6data3 / missing out=frequency.lis;
    title 'var6 Comparison';
run;    

Open in new window


Questions:
Is my code correct? Is there a better way of creating  labels for these variables and then creating/outputting frequencies?  How may I assign each label to a correct frequency output?
Since my input datasets contain 100 thousands of records I am looking for the most sufficient way possible.  

Thank you in advance!
labradorchikAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

d507201Database Marketing ConsultantCommented:
You can't assign labels using a LABEL statement and if-then statements.  You would have to create a new variable for each unique value of the source variable.  But you can do it with a format.  A format substitutes values for a variable when the variable is displayed, such as in a proc print.

Here's one way to do it.  As with SAS, there are multiple ways.  Google SAS FORMAT for ocumentation.

** Use a proc format to put the desired values in conceptual lookup table. ;
** The VALUE statement specifies the name of the format (or name of the look-up table).  Notice it starts with a dollar sign.  The $ indicates that the source values are character.  If the source values were numeric then we would not start the format name with a $. ;
** I think that the formatted value does not have a length limit (probably 32,000 chars), but realistically use no more than 30 or so chars for readability. ;
proc format;
            value $var6data
                  'D' = 'Duplicate'
                  'B' = 'Bad Match'
                  'N' = 'New Match'
                  other = 'Other';
run;

** Use a FORMAT statement to link the variable with its format. ;                        
** After doing this, whenever the variable is displayed you will see the formatted value, not the underlying value. ;
data dir1.output2;
  set dir1.output;  format var6data $var6data.;
  label var6data='Variable for Format Test';
run;

** When this prints you will see the formatted value. ;
proc print data=dir1.output2;
            var var6data;
run;
labradorchikAuthor Commented:
Thank you very much, d501201!

That's explain why I was not getting anything in my labels.
So, in my case, since I have three different variables for var6* (var6data1, var6data2, and var6data3) based on your example I can do the following, correct?  
I also corrected var10 based on your example following by the PROC FREQ steps.

/* Formatting var6* variables */
proc format;
     value $var6data1
     'D' = 'Duplicate'
     'B' = 'Bad Match'
     'N' = 'New Match';
run;

proc format;
     value $var6data2
     'D' = 'Duplicate'
     'B' = 'Bad Match'
     'N' = 'New Match';
run;

proc format;
     value $var6data3
     'D' = 'Duplicate'
     'B' = 'Bad Match'
     'N' = 'New Match';
run;

data dir1.output2;
  set dir1.output;
     format var6data1 $var6data1.;
  label var6data1='Variable from data1 for Format Test';
  
     format var6data2 $var6data2.;
  label var6data2='Variable from data2 for Format Test';
  
     format var6data3 $var6data3.;
  label var6data3='Variable from data3 for Format Test';
run;

proc print data=dir1.output2;
            var var6data1 var6data2 var6data3;
run; 


/* Formatting var10 variable */
proc format;
     value $var10
     '1' = 'All'
     '2' = 'Data1-Data2'
     '3' = 'Data1';
run;

data dir1.output3;
  set dir1.outpu2;
     format var10 $var10.;
  label var10='Variable from output2 for Format Test';
run;

proc print data=dir1.output3;
            var var10;
run; 


/* Creating Frequencies for var10 and var6* variables*/ 
proc freq data=dir1.output3;
  tables var10 / missing out=frequency.lis;
    title 'Frequency of Var10';
run;
           
proc freq data=dir1.output3;
  tables var6data1*var6data2*var6data3 / missing out=frequency.lis;
    title 'var6 Comparison';
run;   

Open in new window


May I somehow combine those 4 PROC FORMAT steps into one step?  Also, may I combine 2 PROC FREQ steps into one step?
d507201Database Marketing ConsultantCommented:
You're on the right track.  Having a format for each variable will certainly work, but a format does not have to be variable-specific.  That is, multiple variables can share the same format.  Var6data1, 2, and 3 can all reference the same format, perhaps renamed $var6data, in a single FORMAT statement.  

And multiple label assignments can be specified in one LABEL statement.

If you ever get an error saying something like 'format not found', check to see that the format specified in the FORMAT statement ends in a dot.  Otherwise SAS will not recognize it as a format.  This is a pretty common error.  

data dir1.output2;
  set dir1.output;
     format var6data1 var6data2 var6data3 $var6data1.;
     label var6data1='Variable from data1 for Format Test'
              var6data2='Variable from data2 for Format Test'
             var6data3='Variable from data3 for Format Test';
run;
Your Guide to Achieving IT Business Success

The IT Service Excellence Tool Kit has best practices to keep your clients happy and business booming. Inside, you’ll find everything you need to increase client satisfaction and retention, become more competitive, and increase your overall success.

labradorchikAuthor Commented:
Great! That's good new that I can put together formats. In the value I just called it now as $vardatas.
Can I also somehow combine my PROC FREQ steps in just one step at the bottom of the code?

Here is my rewritten code:
* Formatting var6 variable */
proc format;
     value $var6datas
     'D' = 'Duplicate'
     'B' = 'Bad Match'
     'N' = 'New Match';
run;

data dir1.output2;
  set dir1.output;
     format var6data1 var6data2 var6data3 $var6datas.;
  label var6data1='Variable from data1 for Format Test';
  label var6data2='Variable from data2 for Format Test';
  label var6data3='Variable from data3 for Format Test';
run;

proc print data=dir1.output2;
            var var6data1 var6data2 var6data3;
run; 


/* Formatting var10 variable */
proc format;
     value $var10
     '1' = 'All'
     '2' = 'Data1-Data2'
     '3' = 'Data1';
run;

data dir1.output3;
  set dir1.outpu2;
     format var10 $var10.;
  label var10='Var10 variable from output2 for Format Test';
run;

proc print data=dir1.output3;
            var var10;
run; 


/* Creating Frequencies */ 
         
proc freq data=dir1.output3;
  tables var10 / missing out=frequency.lis;
    title 'Frequency of Var10';
run;
           
proc freq data=dir1.output3;
  tables var6data1*var6data2*var6data3 / missing out=frequency.lis;
    title 'var6 Comparison';
run;

Open in new window

d507201Database Marketing ConsultantCommented:
You should read up on Proc Freq because it can give you unexpected results.  The way you have it, you'll get multiple contingency tables...  the * says to cross the values specified.  I don't think that's what you want but maybe it is.  It'll take only a few moments to run.  I'm guessing what you want is a one-way listing of the counts for each variable--
                            tables var6data1 var6data2 var6data3 /missing;

I left out the out= because it won't do what I think you want.  The out= you have will create a SAS data set of the results but with the name you gave it will look for a libref named frequency and name the file lis.sas7bdat.  You want an output file of the results?  Use Proc Printto to capture the listing output to a file.  It's an alternative to cut-and-paste from the listing.

** Redirect the listing output to a file.
proc printto print='<path>/<filename>.<extension>';
run;
** Everything after the proc printto will go to the file. ;
proc freq ....;
run;
** Turn off the proc printoo redirection.
proc printto;
run;

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
labradorchikAuthor Commented:
Thank you for your suggestions, d507201!
I would have to test this with my testing data and I will let you know it went, but I think I just want all frequencies go to the report file frequiencies.lis
labradorchikAuthor Commented:
I am not sure if I am on a correct path here, since I have only one input dataset, which is called "outdata" and in this code I am just trying to produce frequencies/counts, which would appear in one file, - called "frequencies.lis".

I could't test the code yet but I am still not sure how PROC FORMAT statement in the beginning of the program will assign all labels for "var6*" (Note: * = data1, data2, or data3) and "var10" variable in PROC FREQ frequencies report at the bottom of the program? I just don't get the part how format assigns for example "Duplicate" label when variable "var6data1" has "D" in some of records in "outdata" dataset? And how this for example "Duplicate" label then assigned to variables in PROC FREQ or PROC PRINT? What should go instead of '...' in the code below?

data dir1.output2; 
 set dir1.outdata; 
  format var6data1 var6data2 var6data3 $var6datas.; 
    label var6data1='...'; 
    label var6data2='...'; 
    label var6data3='...'; 
run;

Open in new window

labradorchikAuthor Commented:
I have tested the code below and the code works and the report is produced with correct variables/counts/formats, just the report goes automatically straight to the "sas-program-name.lst" file and not to the "frequency.lst" file report that I wanted, so how would I direct "freq1" and "freq2" datasets to the "frequency.lst"?

Also, do you know how may I create a list style report instead of table style?

/* Formatting var6 and var10 variable */ 
proc format; 
value $var6datas 
  'D' = 'Duplicate'
  'B' = 'Bad Match'
  'N' = 'New Match';

value var10 
  1 = 'All'
  2 = 'Data1-Data2'
  3 = 'Data1';
quit; 

/* Creating Frequencies */ 
proc freq data=dir1.outdata noprint;
tables var10 / missing out=dir1.freq1;
tables var6data1*var6data2*var6data3 / missing out=dir1.freq2;
run;

/* Proc PRINTs */ 
proc print data=dir1.freq1;
format var6data1 var6data2 var6data3 $var6datas.;
Title 'Frequencies for var6'; 
run; 

proc print data=dir1.freq2;
format var10 var10.;
Title 'Frequencies for var10'; 
run; 

Open in new window

labradorchikAuthor Commented:
Thank you very much for your suggestions and help! The code worked as you mentioned in your comments.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Statistical Analysis System (SAS)

From novice to tech pro — start learning today.