SAS Data Sets Manipulations

Please see if my coding is correct or may be it needs some adjustments or additions? Am I missing anything from the requirements?

Note: Testing SAS datasets are not available at this time, so I can’t test my code through compiler yet.

Requirements:
1.      Output SAS dataset “Data7” and ASCII file “Data7.dat” from “Data5”.
   1.1.      If variable ‘Var10’=G then set variable ‘Var11’=Z
   1.2.      Otherwise leave variable ‘Var11’ as is
   1.3      Output SAS dataset “Data7” and ASCII file “Data7.dat”
2.       Write records to “FileLog.log”
   2.1        Count of records written to “Data7” with ‘Var11’=Z
   2.2        Count of records written to “Data7” with ‘Var11’=X
   2.3        Total number of records written to “Data7”

Coding:
/* 1*/  data dir.Data7;
               set dir.Data5;
              if Var10=’G’ then Var11=’Z’;
            run;
/*2*/  data _null_;
               set dir.Data7 end=last;
               length Var11_Z Var11_X 3.;
               retain  Var11_Z Var11_X 0;

              if Var11=’Z’ then Var11_Z +1;
              if Var11=’X’ then Var11_X +1;
           file “Data7.dat”;
	put @1    Var1     $char5.
	        @6   Var3     $char19.
                      @21 Var11  $char3.;
             If last the do;
               file dir.FileLog.log mod;
                      put ‘****** Total Records Written to Data7 with Var11=Z is ‘  Var11_Z;
                      put ‘****** Total Records Written to Data7 with Var11=X is ‘ Var11_X;
                      put ‘****** Total Records Written to Data7 is ‘ _n_;
             end;
         run; 

Open in new window

Any comments are welcome!
Thank you in advance!!
labradorchikAsked:
Who is Participating?
 
d507201Database Marketing ConsultantCommented:
data dir.Data7;              
              set dir.Data5 end=last;
               length Var11_Z Var11_X 3.;
               retain  Var11_Z Var11_X 0;

              if Var10=’G’ then Var11=’Z’;              
              if Var11=’Z’ then Var11_Z=Var11_Z +1;          
              ELSE if Var11=’X’ then Var11_X=Var11_X +1;   /* changed this to an if-then-else construct */

               <everything else is ok>
run;

Extra tip -- the END= option can cause unexpected behavior, so when you use it be careful to insure that it will always execute.  Subsetting a dataset can cause an IF LAST statement to not execute.  If, for example, you used a subsetting IF (if var10='A' then delete;  if var10 ne 'A';) and your _last_ record was an 'A' then the IF LAST condition would never be true-- the last record was deleted before it got to that IF statement.

A work-around for this problem sometimes is to use the SET statement's WHERE= option to subset the data before it comes into the program data vector.  This way there will always be a last record to check, but it could be that you need those records in the DATA step for something else.  In that case you'd have to use two DATA steps.

I've attached some code that illustrated the issue.

SAS is great for handling complcated scenarios... there is always more than one solution.  Some are simpler, more elegant or robust than others but there is always more than one.  
data test;
do n=1 to 5;
  output;
end;
run;

data _null_; set test end=eof;

  ** Dropping the last record, so the EOF condition is never true. ;
  if n=5 then delete;

  retain count 0;
  if n > 0 then count=count+1;

  put n=;

  if eof then do;
    put 'total n recs is ' count;
  end;
run;


data _null_; set test end=eof;

  ** Here we move the counter.  The number of records will be wrong but will not print to the log anyway. ;
  retain count 0;
  if n > 0 then count=count+1;

  if n=5 then delete;

  put n=;

  if eof then do;
    put 'total n recs is ' count;
  end;
run;

** This gives what we want. ;
data _null_; set test(where=(n ne 5)) end=eof;

  retain count 0;
  if n > 0 then count=count+1;

  put n;

  if eof then do;
    put 'total n recs is ' count;
  end;
run;

Open in new window

0
 
aikimarkCommented:
@labradorchik,

Have you run your code?
Since this is coursework, your questions should be a bit more specific.
0
 
d507201Database Marketing ConsultantCommented:
It takes only a couple minutes to create a test dataset  of a dozen obs using a do loop, random numbers, and an output statement.

I'm an old-school SAS programmer and I've never liked this syntax used in the assignment statements.  I presume it works but have never tried it.       if Var11=’Z’ then Var11_Z +1;  

It's more SASish if it's written as   if Var11=’Z’ then Var11_Z=Var11_Z+1;  

Yes, I know that some people are big on saving keystrokes, but one of the great things about SAS is its English-like syntax that is readable and intuitive.

Regarding the code, you don't need to do this in two steps.  You can create external files at the same time as the dataset.  Importantly, doing it in two steps causes a performance hit because the data must be read twice... if you're working with millions of records with hundreds of variables and doing complex derivations and transformations you certainly want to avoid that.  Use as few DATA steps as possible.

Your first FILE statement will work but it will put the output file into I think the directory that SAS initializes from.  You may or may not know where that is so you should explicitly control the destination by using a fully-defined filename, such as
     file 'c:\temp\data7.dat';

Your second FILE statement won't work because it's not pointing to an external location.  SAS will see it as catalog.library.member I think.  You need to reference a filename in quotes as you did with the first FILE statement.  Also, the IF LAST statement has a misspelled 'then'.  SAS might know what you want to do and fix it but if it does it will write a warning.

That's all I saw from glancing at the code.
0
 
labradorchikAuthor Commented:
Hello! Thank you very much for your valuable comments!! I changed those things in the code. I understand that my code is not efficient as it might be, but how do I create external files at the same time as the dataset? I thought I had to use two data steps for this process. What's the trick? :)

data dir.Data7;
               set dir.Data5;
              if Var10=’G’ then Var11=’Z’;
            run;

        data _null_;
               set dir.Data7 end=last;
               length Var11_Z Var11_X 3.;
               retain  Var11_Z Var11_X 0;

              if Var11=’Z’ then Var11_Z=Var11_Z +1;
              if Var11=’X’ then Var11_X=Var11_X +1;
           file “dir.Data7.dat”;
      put @1    Var1     $char5.
            @6    Var3     $char19.
              @21  Var11   $char3.;
             If last then do;
               file “dir.FileLog.log” mod;
                      put ‘****** Total Records Written to Data7 with Var11=Z is ‘  Var11_Z;
                      put ‘****** Total Records Written to Data7 with Var11=X is ‘ Var11_X;
                      put ‘****** Total Records Written to Data7 is ‘ _n_;
             end;
         run;
0
 
labradorchikAuthor Commented:
Wow, it actually makes sense to me now!! I usually use WHERE option, but I thought it's not necessary in this coding. Thank you again for your valuable comments and examples!  

Could you please look at my another question that I had yesterday. I am not even sure if I write my code correctly to the requirements given?!?!      

http://www.experts-exchange.com/Database/SAS/Q_27395835.html

====edit: added link to question
aikimark -- ZA
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.