Link to home
Start Free TrialLog in
Avatar of labradorchik
labradorchikFlag for United States of America

asked on

SAS Data Sets Manipulations

Hello,
I am trying to create multiple datasets named "Data_'Var1'_" from one dataset "OneData".
Dataset "OneData" contains variable "Var1" in its records, so based on that variable I would like to output multiple datasets "Data_'Var1'_" with "Var1" included in those datasets names and only with that particular variable "Var1" .

For example, if dataset "OneData" has only 5 records with "Var1" equal to
03
11
99
75
44

 then
I would have 5 datasets created from "OneData" dataset with the following names:
 
"Data_03_"
"Data_11_"
"Data_99_"
"Data_75_"
"Data_44_"

each of these 5 datasets should only have one record (with that particular "Var1") written to them.
 
Will I be able to create these datasets with the following code?  If not, what should I change or add anything in my codding?

%macro multiple;
   data lib.Data_&i_;
     set lib.OneData
      %do i=03 %to 99;
       %let Var1=%sysfunc(putn(&i,z2.));
     %end;
     ;
   run;
 %mend;

 options mprint;
 %multiple

Thanks!
Avatar of Aloysius Low
Aloysius Low
Flag of Singapore image

you won't be able to create the tables that you wanted... the quickest and cleanest way is to use the call execute method:

%macro execute;
      data _null_;
            set lib.OneData;
            call execute("data lib.Data_"||trim(left(var1))||"_;");
            call execute("var1 = "||trim(left(var1))||";");
            call execute("run;");
      run;
%mend;
%execute;

alternatively, you could read all the variable values into a global variable, separated by a space, then loop through the global variable... for each value in the global variable (separated by a space), create the dataset that you want and set the value into the dataset
Avatar of labradorchik

ASKER

Thank you lowaloysius!
I have been testing your macro example, it works, but not in a way I was imaging my output datasets to be.
I guess what I really would like to get as my output: datasets with a name "Data_'Var1'" but all records with "Var1" would have to stay inside of of this dataset.  For example if "OneData" has three records with "Var1"=11 then I would get all those three records in the "Data_11", which will appear only one time.  Right now as my output I am getting three datasets with  the same name "Data_ 11".

Note: For my output none of my datastes should have the same name. So, there will be only one "Data_ 11" dataset even if "OneData" dataset has 3 three records of "Var"=11. All 3 records should be inside of  "Data_ 11" dataset.

What would be a way to read records from "OneData" and output to datasets "Data_'Var1'" with all records included in them for that exact "Var1" value?
Hello,
I am still trying to get this one solved. Any help or comments will be greatly appreciated!
Thanks!
I'm kinda lost here about the records you are expecting in the output datasets... are you able to give a more concrete example using the original one in your question?
could you also post the log which generates 3 of the same data?

I am trying to write all three records with var1=28 to "Data_28" dataset, - there should be only one "Data_28" dataset at the end, and not three as I have now. I am just testing this code with only var1=28, but at the end there will be var1= from 01 through 99. I hope this clears up a little bit my previous post. Let me know if you have any other questions. Thank you in advance!!

Here is the log of that code with three records where var1=28:

%macro execute;
369              data _null_;
370                    set lib.OneData;
371                    call execute("data lib.Data_"||trim(left(var1))||";");
372                    call execute("var1 = "||trim(left(var1))||";");
373                    call execute("run;");
374              run;
375        %mend;
376        %execute;
MPRINT(EXECUTE):   data _null_;
MPRINT(EXECUTE):   set lib.OneData;
MPRINT(EXECUTE):   call execute("data lib.Data_"||trim(left(var1))||";");
MPRINT(EXECUTE):   call execute("var1 = "||trim(left(var1))||";");
MPRINT(EXECUTE):   call execute("run;");
MPRINT(EXECUTE):   run;

MPRINT(EXECUTE):   data lib.Data_28;
MPRINT(EXECUTE):   var1 = 28;
MPRINT(EXECUTE):   run;
MPRINT(EXECUTE):   data lib.Data_28;
MPRINT(EXECUTE):   var1 = 28;
MPRINT(EXECUTE):   run;
MPRINT(EXECUTE):   data lib.Data_28;
MPRINT(EXECUTE):   var1 = 28;
MPRINT(EXECUTE):   run;
NOTE: There were 3 observations read from the data set lib.OneData.

NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
     
NOTE: CALL EXECUTE generated line.
1         + data lib.Data_28;
2         + var1 = 28;
3         + run;

NOTE: Compression was disabled for data set lib.Data_28 because compression overhead wou
ld increase the size of the data set.
NOTE: The data set lib.Data_28 has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
     
4         + data lib.Data_28;
5         + var1 = 28;
6         + run;

NOTE: Compression was disabled for data set lib.Data_28 because compression overhead wou
ld increase the size of the data set.
NOTE: The data set lib.Data_28 has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.02 seconds
     
7         + data lib.Data_28;
8         + var1 = 28;
9         + run;

NOTE: Compression was disabled for data set lib.Data_28 because compression overhead wou
ld increase the size of the data set.
NOTE: The data set lib.Data_28 has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.02 seconds
does this mean that there's 3 records of Var1=28 inside DataOne? and if this happens, what you want is 1 dataset named Data_28 with 3 records of Var1=28?

i'll need to test my code back in office before I can post something (to ensure that it works!), but what i have in mind is to sort the data by Var1 and then have a by statement in the call execute data null step which will check if first.Var1 then execute the data step line, and if last.Var1 then execute the run statement, and if anything in between just output...

(again, this idea needs to be tested when i'm back in office in about 10hrs time :))
Sounds good lowaloysius! Thank you in advance!!
And yes, you are absolutely correct: "DataOne" has currently three records with Var=1 and I would like as my output just only 1 dataset named Data_28 with 3 records of Var1=28.
Note: later on "DataOne" will have hundreds of records with Var1= from 01 to 99, so I should only have 99 datasets with multiple records at the end.
just tested the code to confirm it's working as expected...

so first step is to sort:
proc sort data=lib.OneData out=work.OneDataSorted;
by Var1;
quit;

then perform the call execute steps by including the by statement in the NULL data step and to check if first.Var1 and last.Var1 to decide what to do:
%macro execute;
  data _null_;
    set work.OneDataSorted;
    by var1;
    if first.var1 then do; /*if current record is first occurrence of the value in Var1, execute the data step to create the dataset*/
      call execute("data lib.Data_"||trim(left(var1))||"_;");
    end;

    call execute("var1 = "||trim(left(var1))||";output;"); /*regardless whether it is the first, last or anywhere in between, output the current record into the output dataset*/

    if last.var1 then do; /*if current record is the last occurrence of the value in Var1, execute the run statement*/
      call execute("run;");
    end;
  run; /*close off the data NULL step*/
%mend;
%execute;
Thank you very much for quick response back!

I just run this code and it looks like I have a problem with "output" statement. Do I need to put anything before "output" statement?
I notice if I take "output" statement out, then I only get one record written into Data_28 and not two records as I have right now in OneData dataset. Also, I noticed that only Var1 is written into Data_28 but what about other variables ( I have 10 other variables in the OneData dataset that need to be outputted as well)?

Note: I currently have two records with Var1= 28 in the OneData dataset.
Please see my log:

386  %macro execute;
387    data _null_;
388     set lib.OneData;
389      by Var1;
390       if first.Var1 then do;
391         call execute("data lib.Data_"||trim(left(Var1))||";");
392       end;
393         call execute("Var1 = "||trim(left(Var1))||"output;");
394       if last.Var1 then do;
395         call execute("run;");
396       end;
397    run;
398  %mend;
399  %execute;
MPRINT(EXECUTE):   data _null_;
MPRINT(EXECUTE):   set lib.OneData;
MPRINT(EXECUTE):   by Var1;
MPRINT(EXECUTE):   if first.Var1 then do;
MPRINT(EXECUTE):   call execute("data lib.Data_"||trim(left(Var1))||";");
MPRINT(EXECUTE):   end;
MPRINT(EXECUTE):   call execute("Var1 = "||trim(left(Var1))||"output;");
MPRINT(EXECUTE):   if last.Var1 then do;
MPRINT(EXECUTE):   call execute("run;");
MPRINT(EXECUTE):   end;
MPRINT(EXECUTE):   run;

MPRINT(EXECUTE):   data lib.Data_28;
MPRINT(EXECUTE):   Vat1 = 28output;
MPRINT(EXECUTE):   Var1 = 28output;
MPRINT(EXECUTE):   run;
NOTE: There were 2 observations read from the data set lib.OneData.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds

NOTE: CALL EXECUTE generated line.
1   + data lib.Data_28;
NOTE: Line generated by the CALL EXECUTE routine.
2   + Var1 = 28output;
                   ------
                   22
NOTE: Line generated by the CALL EXECUTE routine.
3   + Var1 = 28output;
                   ------
                   22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +, -, /, ;, <, <=, <>, =, >, ><,
              >=, AND, EQ, GE, GT, IN, LE, LT, MAX, MIN, NE, NG, NL, NOTIN, OR, ^=, |, ||, ~=.

4   + run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set lib.Data_28 may be incomplete.  When this step was stopped there were 0 observations and
         2 variables.
WARNING: Data set lib.Data_28 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.02 seconds
OK, I just got my second record with adding an extra ; before the output statement. So it worked! :)
Just one last thing: How can I make sure I write all the variable from OneData into Data_28?
At this moment I only see two records with Var1 in Data_28, but no other variables are shown.
ASKER CERTIFIED SOLUTION
Avatar of Aloysius Low
Aloysius Low
Flag of Singapore image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Wow... what a beauty!! :) Worked perfectly! Thank you!!!!