Solved

SAS Data Sets Manipulations

Posted on 2011-02-23
14
756 Views
Last Modified: 2013-11-16

Hello, I am trying to append or put together numerous SAS datasets (note: every dataset has its own number) into one dataset "FinalData". Here is example of how my SAS datasets look like:
DataSet_<653>
DataSet_<352>
DataSet_<193>
DataSet_<number> and so on...

The problem that I am experiencing is that my program has to somehow look for all those SAS datasets (with different numbers) in my directory folder first before appending them together into one dataset "FinalData". First, my program has to search all datasets look like "DataSet_<number>" available in the directory and than append them into "FinalData". There can be 5 or 500 datasets with the name like "DataSet_<number>" in my directory.
Note: all these datasets have the same variables.

I realized that I can not just use "PROC APPEND" function since I want my program to search for those datasets (with those individual numbers) automatically and than append them into "FinalData" dataset.

Should I use "%sysfunc" within my macro statement? How do I search for an existence of datasets in my directory?

Any examples will really help!
Thank you!


0
Comment
Question by:labradorchik
  • 6
  • 6
  • 2
14 Comments
 
LVL 14

Expert Comment

by:Aloysius Low
Comment Utility
to check for existence of datasets in a library, you could use the following statement (which will also output into a sas dataset (WORK.TEMP) which you'll need to use it later:

proc contents data = [source library]._ALL_ memtype=data out=work.temp;
run;

now that you have output the data into a temporary dataset, you'll need the following code to concatenate the datasets together. basically what this does is to read WORK.TEMP (the dataset produced earlier) for columns LIBNAME (the library which you have specified ealier) and MEMNAME (the dataset names in this example, because of the MEMTYPE option specified earlier) and use the call execute function to produce the code required:

%macro concatall;
      data _null_;      
            set work.temp end=last;
            length alldata $2000;
            retain alldata;
            if _N_ eq 1 then do;
                  call execute("data finalout;");
                  alldata = "";
            end;
            
            alldata = trim(left(alldata)) || " " || trim(left(libname))||"."||trim(left(memname));

            if last then do;
                  call execute("set " || trim(left(alldata)) || ";");
                  call execute("run;");
            end;
      run;
%mend;

%concatall;
0
 
LVL 14

Expert Comment

by:Aloysius Low
Comment Utility
to elaborate on what the macro is doing:

the first data step (data _null_) and the set statement (set work.temp) is to initiate the reading of the dataset WORK.TEMP which contains your dataset listing which you would like to process.

the definition of alldata is just to have a super long character string to contain the list of all datasets read out from WORK.TEMP. this will be used in the data step (data finalout) within the call execute statement.

the call execute statement is a very powerful statement, which often i use it to optimize processing as it can help to process data in 1 pass rather than multiple passes. however, it is also harder to debug. read up on it :)

essentially, it will store the commands to be executed into memory, and only executing it when it encounters the run statement (the one matching data _null_ in this case). hence, the final code generated by the call execute statement would be something you would write (as if it's hard coding), if you see exactly what is in your library.
0
 
LVL 11

Expert Comment

by:theartfuldazzler
Comment Utility
Hi


There might be two easier methods:

1.  Using DICTIONARY tables


PROC SQL noprint;
    select memname into: datasets separated by " "
  from dictionary.tables where LIBNAME = 'WORK' and memname like 'DATASET_%';

DATA New;
  SET &datasets;
RUN;

/*Note - the libname in the Dictionary tables and the memname variable values must be in CAPS.*/

Method 2:  Using colon modifier;

In SAS 9.2, you can use the colon modifier as a type of "wildcard" for datasets:


OPTIONS NODSNFERR;
DATA New;
   SET Work,DataSets_:;
RUN;
0
 

Author Comment

by:labradorchik
Comment Utility
lowaloysius, thank you very much for your comments!

I am not sure if I can use your example since my directory also contains other datasets (maybe a few thousands of other datastes)  that I do not need to use in my program. I only need to use those datastes with the following structure "DataSet_<number>" (<number> represents a specific number for that particular dataset).
I am still learning macros, so I am not sure I can figure out if your code only searches for particular datasets (with structure of "DataSet_<number>")  in my directory.

This is what I am trying to do:
  Search in my directory "dir2011" for datasets "DataSet_<number>" (<number> represents a specific number for that particular dataset) and than read all records from all "DataSet_<number>" datasets into one dataset "FinalData".

Note: I already know what numbers will appear in the names of those datasets (with names "DataSet_<number>") and what numbers will not appear. To my knowledge, datasets with a name "DataSet_<number>" should be not more than 10 or 20.

I was thinking may I can use "DO WHILE LOOP" for this?

0
 
LVL 14

Expert Comment

by:Aloysius Low
Comment Utility
my macro as it is above will process all datasets residing in the directory. if there may be other datasets which may reside in the same directory, then you could take an additional step to filter the output of proc contents (work.temp in my example above) to do something like
proc sql;
  create table filtered_temp as
    select * from work.temp
    where memname like "DataSet_%";
quit;

this will not work though, if there may be datasets which contains names like dataset_[alphabets] in the same directory...is this a possibility?

what you are thinking is definitely not wrong. yes you may use a do while loop, but again enclose in a macro. however, you will be restricting yourself to that many iterations.
0
 

Author Comment

by:labradorchik
Comment Utility
I do not think there are names like dataset_[alphabets] in the same directory. I think I want to restrict myself to a specific pool of numbers.

OK, this is what I came up with for my "do loop", does this look correct or am I missing something?
Note: "dir2011" is directory where all those datasets are located. Also, I know for sure that those 10 or 20 datasets will have a number between 190 and 700 in their dataset name.

%macro combine;
  data FinalData ;
    set
    %do i = 190 %to 700;
      dir2011.DataSet&i
    %end;
  run;
%mend;

Will this pull all needed datasets from "dir2011" directory and than append them into one "FinalData" dataset?
0
 

Author Comment

by:labradorchik
Comment Utility
Ooops! I forgot to enclose the following at the end of my code:

options mprint;
%combine
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 14

Expert Comment

by:Aloysius Low
Comment Utility
yes and no

it will only work if the dataset you are looking for exists. since you said it's going to be 10-20 out of a range of 711 datasets here, it's not going to work. it'll be better if you could check which are the datasets that are currently present, which is what i have posted earlier that you do a filter to pick up only those that you really needed before calling the macro to concatenate them

just a short note on your code - you said the dataset names will be like DataSet_<number>, however, your code is DataSet&i. The underscore is missing.

also, you are missing a final semi-colon for your set statement, which you'll need after the %end statement
0
 

Author Comment

by:labradorchik
Comment Utility
I think I would like to try code this without involving PROC SQL, if this is possible of course.
For testing purposes I placed only three datasets (DataSet_03, DataSet_08, and DataSet_10) in my directory "dir2011".

After I run my code I get the following errors:

ERROR: File dir2011.DataSet_1.DATA does not exist.
ERROR: File dir2011.DataSet_2.DATA does not exist.
ERROR: File dir2011.DataSet_3.DATA does not exist.
ERROR: File dir2011.DataSet_4.DATA does not exist.
ERROR: File dir2011.DataSet_5.DATA does not exist.
ERROR: File dir2011.DataSet_6.DATA does not exist.
ERROR: File dir2011.DataSet_7.DATA does not exist.
ERROR: File dir2011.DataSet_8.DATA does not exist.
ERROR: File dir2011.DataSet_9.DATA does not exist.
ERROR: File dir2011.DataSet_11.DATA does not exist.

So, my program for some reason only recognizes DataSet_10.DATA, but does not find the other two datasets (DataSet_03.DATA and DataSet_08.DATA). Looking at the ERROR messages I can see that my code does not recognize zero in front of a number.  Also, program just stops because of those errors.
How can I code my program so that when a dataset is not found in my directory, my program just skips that particular dataset and just keep searching for a next available dataset?
Also, how can I search and append to "FinalData" only those records where "Var5" is equal to 2?  Where should I place Var5='2' in my code?

%macro combine;
  data FinalData ;
    set
    %do i = 01 %to 11;  
      dir2011.DataSet_&i
    %end;
   ;
  run;
%mend;
options mprint;
%combine
0
 
LVL 14

Expert Comment

by:Aloysius Low
Comment Utility
this is exactly why i said that your method will not work - even if there's 1 missing dataset out of 1000, or 1 million, it will still not work because the dataset don't exist.

the best and probably the only way is to check for the dataset existence, like the first example i (or theartfuldazzler) have given...
0
 
LVL 11

Assisted Solution

by:theartfuldazzler
theartfuldazzler earned 200 total points
Comment Utility
Hi

If you are using SAS 9.2 - the missing dataset error you mentioned can be "turned off" with:

OPTIONS NODSNFERR;
0
 

Author Comment

by:labradorchik
Comment Utility
Thank you, theartfuldazzler!
"NODSNFERR" seems to work fine for me!  :)

Any recommendations on how can include Var5=2 in my code?
Note:  I just need to have this, so only records with condition "Var5=2" will be written to "FinalData" dataset.  Where exactly should I place condition Var5='2' in my macro code?

%macro combine;
  data FinalData ;
    set
    %do i = 01 %to 11;  
      dir2011.DataSet_&i
       while Var5='2';                   /*  <= is this correct?  */
    %end;
   ;
  run;
%mend;
options mprint nodsnferr;
%combine
0
 
LVL 14

Accepted Solution

by:
Aloysius Low earned 300 total points
Comment Utility
glad to know that the option works for you, though i don't like it for it not being a proper and graceful solution...

as for your var5, you can put it before the run statement after the semi-colon before it... and use 'if' instead of 'while'

0
 

Author Closing Comment

by:labradorchik
Comment Utility
Everything works fine! :)
Thank you both for all your help!!
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

This article describes some very basic things about SQL Server filegroups.
I annotated my article on ransomware somewhat extensively, but I keep adding new references and wanted to put a link to the reference library.  Despite all the reference tools I have on hand, it was not easy to find a way to do this easily. I finall…
The goal of this video is to provide viewers with basic examples to understand how to use strings and some functions related to them in the C programming language.
The goal of this video is to provide viewers with basic examples to understand how to create, access, and change arrays in the C programming language.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now