Solved

# SAS Data Sets Manipulations

Posted on 2011-10-13
543 Views
Please see if my coding is correct or may be it needs some adjustments or additions? Am I missing anything from the requirements?

Note: Testing SAS datasets are not available at this time, so I can’t test my code through compiler yet.

Requirements:
1.      Output SAS dataset “Data7” and ASCII file “Data7.dat” from “Data5”.
1.1.      If variable ‘Var10’=G then set variable ‘Var11’=Z
1.2.      Otherwise leave variable ‘Var11’ as is
1.3      Output SAS dataset “Data7” and ASCII file “Data7.dat”
2.       Write records to “FileLog.log”
2.1        Count of records written to “Data7” with ‘Var11’=Z
2.2        Count of records written to “Data7” with ‘Var11’=X
2.3        Total number of records written to “Data7”

Coding:
``````/* 1*/  data dir.Data7;
set dir.Data5;
if Var10=’G’ then Var11=’Z’;
run;
/*2*/  data _null_;
set dir.Data7 end=last;
length Var11_Z Var11_X 3.;
retain  Var11_Z Var11_X 0;

if Var11=’Z’ then Var11_Z +1;
if Var11=’X’ then Var11_X +1;
file “Data7.dat”;
put @1    Var1     \$char5.
@6   Var3     \$char19.
@21 Var11  \$char3.;
If last the do;
file dir.FileLog.log mod;
put ‘****** Total Records Written to Data7 with Var11=Z is ‘  Var11_Z;
put ‘****** Total Records Written to Data7 with Var11=X is ‘ Var11_X;
put ‘****** Total Records Written to Data7 is ‘ _n_;
end;
run;
``````
0

LVL 44

Expert Comment

Since this is coursework, your questions should be a bit more specific.
0

LVL 7

Expert Comment

It takes only a couple minutes to create a test dataset  of a dozen obs using a do loop, random numbers, and an output statement.

I'm an old-school SAS programmer and I've never liked this syntax used in the assignment statements.  I presume it works but have never tried it.       if Var11=’Z’ then Var11_Z +1;

It's more SASish if it's written as   if Var11=’Z’ then Var11_Z=Var11_Z+1;

Yes, I know that some people are big on saving keystrokes, but one of the great things about SAS is its English-like syntax that is readable and intuitive.

Regarding the code, you don't need to do this in two steps.  You can create external files at the same time as the dataset.  Importantly, doing it in two steps causes a performance hit because the data must be read twice... if you're working with millions of records with hundreds of variables and doing complex derivations and transformations you certainly want to avoid that.  Use as few DATA steps as possible.

Your first FILE statement will work but it will put the output file into I think the directory that SAS initializes from.  You may or may not know where that is so you should explicitly control the destination by using a fully-defined filename, such as
file 'c:\temp\data7.dat';

Your second FILE statement won't work because it's not pointing to an external location.  SAS will see it as catalog.library.member I think.  You need to reference a filename in quotes as you did with the first FILE statement.  Also, the IF LAST statement has a misspelled 'then'.  SAS might know what you want to do and fix it but if it does it will write a warning.

That's all I saw from glancing at the code.
0

Author Comment

Hello! Thank you very much for your valuable comments!! I changed those things in the code. I understand that my code is not efficient as it might be, but how do I create external files at the same time as the dataset? I thought I had to use two data steps for this process. What's the trick? :)

data dir.Data7;
set dir.Data5;
if Var10=’G’ then Var11=’Z’;
run;

data _null_;
set dir.Data7 end=last;
length Var11_Z Var11_X 3.;
retain  Var11_Z Var11_X 0;

if Var11=’Z’ then Var11_Z=Var11_Z +1;
if Var11=’X’ then Var11_X=Var11_X +1;
file “dir.Data7.dat”;
put @1    Var1     \$char5.
@6    Var3     \$char19.
@21  Var11   \$char3.;
If last then do;
file “dir.FileLog.log” mod;
put ‘****** Total Records Written to Data7 with Var11=Z is ‘  Var11_Z;
put ‘****** Total Records Written to Data7 with Var11=X is ‘ Var11_X;
put ‘****** Total Records Written to Data7 is ‘ _n_;
end;
run;
0

LVL 7

Accepted Solution

data dir.Data7;
set dir.Data5 end=last;
length Var11_Z Var11_X 3.;
retain  Var11_Z Var11_X 0;

if Var10=’G’ then Var11=’Z’;
if Var11=’Z’ then Var11_Z=Var11_Z +1;
ELSE if Var11=’X’ then Var11_X=Var11_X +1;   /* changed this to an if-then-else construct */

<everything else is ok>
run;

Extra tip -- the END= option can cause unexpected behavior, so when you use it be careful to insure that it will always execute.  Subsetting a dataset can cause an IF LAST statement to not execute.  If, for example, you used a subsetting IF (if var10='A' then delete;  if var10 ne 'A';) and your _last_ record was an 'A' then the IF LAST condition would never be true-- the last record was deleted before it got to that IF statement.

A work-around for this problem sometimes is to use the SET statement's WHERE= option to subset the data before it comes into the program data vector.  This way there will always be a last record to check, but it could be that you need those records in the DATA step for something else.  In that case you'd have to use two DATA steps.

I've attached some code that illustrated the issue.

SAS is great for handling complcated scenarios... there is always more than one solution.  Some are simpler, more elegant or robust than others but there is always more than one.
``````data test;
do n=1 to 5;
output;
end;
run;

data _null_; set test end=eof;

** Dropping the last record, so the EOF condition is never true. ;
if n=5 then delete;

retain count 0;
if n > 0 then count=count+1;

put n=;

if eof then do;
put 'total n recs is ' count;
end;
run;

data _null_; set test end=eof;

** Here we move the counter.  The number of records will be wrong but will not print to the log anyway. ;
retain count 0;
if n > 0 then count=count+1;

if n=5 then delete;

put n=;

if eof then do;
put 'total n recs is ' count;
end;
run;

** This gives what we want. ;
data _null_; set test(where=(n ne 5)) end=eof;

retain count 0;
if n > 0 then count=count+1;

put n;

if eof then do;
put 'total n recs is ' count;
end;
run;
``````
0

Author Comment

Wow, it actually makes sense to me now!! I usually use WHERE option, but I thought it's not necessary in this coding. Thank you again for your valuable comments and examples!

Could you please look at my another question that I had yesterday. I am not even sure if I write my code correctly to the requirements given?!?!

http://www.experts-exchange.com/Database/SAS/Q_27395835.html

aikimark -- ZA
0

## Featured Post

### Suggested Solutions

Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
Many companies are looking to get out of the datacenter business and to services like Microsoft Azure to provide Infrastructure as a Service (IaaS) solutions for legacy client server workloads, rather than continuing to make capital investments in h…
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…