?
Solved

SAS Data Sets Manipulations

Posted on 2011-02-23
14
Medium Priority
?
882 Views
Last Modified: 2013-11-16

Hello, I am trying to append or put together numerous SAS datasets (note: every dataset has its own number) into one dataset "FinalData". Here is example of how my SAS datasets look like:
DataSet_<653>
DataSet_<352>
DataSet_<193>
DataSet_<number> and so on...

The problem that I am experiencing is that my program has to somehow look for all those SAS datasets (with different numbers) in my directory folder first before appending them together into one dataset "FinalData". First, my program has to search all datasets look like "DataSet_<number>" available in the directory and than append them into "FinalData". There can be 5 or 500 datasets with the name like "DataSet_<number>" in my directory.
Note: all these datasets have the same variables.

I realized that I can not just use "PROC APPEND" function since I want my program to search for those datasets (with those individual numbers) automatically and than append them into "FinalData" dataset.

Should I use "%sysfunc" within my macro statement? How do I search for an existence of datasets in my directory?

Any examples will really help!
Thank you!


0
Comment
Question by:labradorchik
  • 6
  • 6
  • 2
14 Comments
 
LVL 14

Expert Comment

by:Aloysius Low
ID: 34966425
to check for existence of datasets in a library, you could use the following statement (which will also output into a sas dataset (WORK.TEMP) which you'll need to use it later:

proc contents data = [source library]._ALL_ memtype=data out=work.temp;
run;

now that you have output the data into a temporary dataset, you'll need the following code to concatenate the datasets together. basically what this does is to read WORK.TEMP (the dataset produced earlier) for columns LIBNAME (the library which you have specified ealier) and MEMNAME (the dataset names in this example, because of the MEMTYPE option specified earlier) and use the call execute function to produce the code required:

%macro concatall;
      data _null_;      
            set work.temp end=last;
            length alldata $2000;
            retain alldata;
            if _N_ eq 1 then do;
                  call execute("data finalout;");
                  alldata = "";
            end;
            
            alldata = trim(left(alldata)) || " " || trim(left(libname))||"."||trim(left(memname));

            if last then do;
                  call execute("set " || trim(left(alldata)) || ";");
                  call execute("run;");
            end;
      run;
%mend;

%concatall;
0
 
LVL 14

Expert Comment

by:Aloysius Low
ID: 34966460
to elaborate on what the macro is doing:

the first data step (data _null_) and the set statement (set work.temp) is to initiate the reading of the dataset WORK.TEMP which contains your dataset listing which you would like to process.

the definition of alldata is just to have a super long character string to contain the list of all datasets read out from WORK.TEMP. this will be used in the data step (data finalout) within the call execute statement.

the call execute statement is a very powerful statement, which often i use it to optimize processing as it can help to process data in 1 pass rather than multiple passes. however, it is also harder to debug. read up on it :)

essentially, it will store the commands to be executed into memory, and only executing it when it encounters the run statement (the one matching data _null_ in this case). hence, the final code generated by the call execute statement would be something you would write (as if it's hard coding), if you see exactly what is in your library.
0
 
LVL 11

Expert Comment

by:theartfuldazzler
ID: 34967679
Hi


There might be two easier methods:

1.  Using DICTIONARY tables


PROC SQL noprint;
    select memname into: datasets separated by " "
  from dictionary.tables where LIBNAME = 'WORK' and memname like 'DATASET_%';

DATA New;
  SET &datasets;
RUN;

/*Note - the libname in the Dictionary tables and the memname variable values must be in CAPS.*/

Method 2:  Using colon modifier;

In SAS 9.2, you can use the colon modifier as a type of "wildcard" for datasets:


OPTIONS NODSNFERR;
DATA New;
   SET Work,DataSets_:;
RUN;
0
Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

 

Author Comment

by:labradorchik
ID: 34971226
lowaloysius, thank you very much for your comments!

I am not sure if I can use your example since my directory also contains other datasets (maybe a few thousands of other datastes)  that I do not need to use in my program. I only need to use those datastes with the following structure "DataSet_<number>" (<number> represents a specific number for that particular dataset).
I am still learning macros, so I am not sure I can figure out if your code only searches for particular datasets (with structure of "DataSet_<number>")  in my directory.

This is what I am trying to do:
  Search in my directory "dir2011" for datasets "DataSet_<number>" (<number> represents a specific number for that particular dataset) and than read all records from all "DataSet_<number>" datasets into one dataset "FinalData".

Note: I already know what numbers will appear in the names of those datasets (with names "DataSet_<number>") and what numbers will not appear. To my knowledge, datasets with a name "DataSet_<number>" should be not more than 10 or 20.

I was thinking may I can use "DO WHILE LOOP" for this?

0
 
LVL 14

Expert Comment

by:Aloysius Low
ID: 34971336
my macro as it is above will process all datasets residing in the directory. if there may be other datasets which may reside in the same directory, then you could take an additional step to filter the output of proc contents (work.temp in my example above) to do something like
proc sql;
  create table filtered_temp as
    select * from work.temp
    where memname like "DataSet_%";
quit;

this will not work though, if there may be datasets which contains names like dataset_[alphabets] in the same directory...is this a possibility?

what you are thinking is definitely not wrong. yes you may use a do while loop, but again enclose in a macro. however, you will be restricting yourself to that many iterations.
0
 

Author Comment

by:labradorchik
ID: 34972810
I do not think there are names like dataset_[alphabets] in the same directory. I think I want to restrict myself to a specific pool of numbers.

OK, this is what I came up with for my "do loop", does this look correct or am I missing something?
Note: "dir2011" is directory where all those datasets are located. Also, I know for sure that those 10 or 20 datasets will have a number between 190 and 700 in their dataset name.

%macro combine;
  data FinalData ;
    set
    %do i = 190 %to 700;
      dir2011.DataSet&i
    %end;
  run;
%mend;

Will this pull all needed datasets from "dir2011" directory and than append them into one "FinalData" dataset?
0
 

Author Comment

by:labradorchik
ID: 34972903
Ooops! I forgot to enclose the following at the end of my code:

options mprint;
%combine
0
 
LVL 14

Expert Comment

by:Aloysius Low
ID: 34975995
yes and no

it will only work if the dataset you are looking for exists. since you said it's going to be 10-20 out of a range of 711 datasets here, it's not going to work. it'll be better if you could check which are the datasets that are currently present, which is what i have posted earlier that you do a filter to pick up only those that you really needed before calling the macro to concatenate them

just a short note on your code - you said the dataset names will be like DataSet_<number>, however, your code is DataSet&i. The underscore is missing.

also, you are missing a final semi-colon for your set statement, which you'll need after the %end statement
0
 

Author Comment

by:labradorchik
ID: 35009849
I think I would like to try code this without involving PROC SQL, if this is possible of course.
For testing purposes I placed only three datasets (DataSet_03, DataSet_08, and DataSet_10) in my directory "dir2011".

After I run my code I get the following errors:

ERROR: File dir2011.DataSet_1.DATA does not exist.
ERROR: File dir2011.DataSet_2.DATA does not exist.
ERROR: File dir2011.DataSet_3.DATA does not exist.
ERROR: File dir2011.DataSet_4.DATA does not exist.
ERROR: File dir2011.DataSet_5.DATA does not exist.
ERROR: File dir2011.DataSet_6.DATA does not exist.
ERROR: File dir2011.DataSet_7.DATA does not exist.
ERROR: File dir2011.DataSet_8.DATA does not exist.
ERROR: File dir2011.DataSet_9.DATA does not exist.
ERROR: File dir2011.DataSet_11.DATA does not exist.

So, my program for some reason only recognizes DataSet_10.DATA, but does not find the other two datasets (DataSet_03.DATA and DataSet_08.DATA). Looking at the ERROR messages I can see that my code does not recognize zero in front of a number.  Also, program just stops because of those errors.
How can I code my program so that when a dataset is not found in my directory, my program just skips that particular dataset and just keep searching for a next available dataset?
Also, how can I search and append to "FinalData" only those records where "Var5" is equal to 2?  Where should I place Var5='2' in my code?

%macro combine;
  data FinalData ;
    set
    %do i = 01 %to 11;  
      dir2011.DataSet_&i
    %end;
   ;
  run;
%mend;
options mprint;
%combine
0
 
LVL 14

Expert Comment

by:Aloysius Low
ID: 35013574
this is exactly why i said that your method will not work - even if there's 1 missing dataset out of 1000, or 1 million, it will still not work because the dataset don't exist.

the best and probably the only way is to check for the dataset existence, like the first example i (or theartfuldazzler) have given...
0
 
LVL 11

Assisted Solution

by:theartfuldazzler
theartfuldazzler earned 800 total points
ID: 35014424
Hi

If you are using SAS 9.2 - the missing dataset error you mentioned can be "turned off" with:

OPTIONS NODSNFERR;
0
 

Author Comment

by:labradorchik
ID: 35018074
Thank you, theartfuldazzler!
"NODSNFERR" seems to work fine for me!  :)

Any recommendations on how can include Var5=2 in my code?
Note:  I just need to have this, so only records with condition "Var5=2" will be written to "FinalData" dataset.  Where exactly should I place condition Var5='2' in my macro code?

%macro combine;
  data FinalData ;
    set
    %do i = 01 %to 11;  
      dir2011.DataSet_&i
       while Var5='2';                   /*  <= is this correct?  */
    %end;
   ;
  run;
%mend;
options mprint nodsnferr;
%combine
0
 
LVL 14

Accepted Solution

by:
Aloysius Low earned 1200 total points
ID: 35018122
glad to know that the option works for you, though i don't like it for it not being a proper and graceful solution...

as for your var5, you can put it before the run statement after the semi-colon before it... and use 'if' instead of 'while'

0
 

Author Closing Comment

by:labradorchik
ID: 35018216
Everything works fine! :)
Thank you both for all your help!!
0

Featured Post

Fill in the form and get your FREE NFR key NOW!

Veeam is happy to provide a FREE NFR server license to certified engineers, trainers, and bloggers.  It allows for the non‑production use of Veeam Agent for Microsoft Windows. This license is valid for five workstations and two servers.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Backups and Disaster RecoveryIn this post, we’ll look at strategies for backups and disaster recovery.
One of the most important things in an application is the query performance. This article intends to give you good tips to improve the performance of your queries.
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
Despite its rising prevalence in the business world, "the cloud" is still misunderstood. Some companies still believe common misconceptions about lack of security in cloud solutions and many misuses of cloud storage options still occur every day. …
Suggested Courses

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question