?
Solved

SAS Data Sets Manipulations

Posted on 2011-10-17
9
Medium Priority
?
514 Views
Last Modified: 2013-11-16
Hello, I am trying to do the following:
   Sort "Data5" dataset no duplicate keys on ‘Var3’, output to "TempData" temporary dataset (keep only ‘Var3’).
   For each variable ‘Number’ (this variable has numbers from 2 to 18, also number 23 and 34) in "File1.dat" (except number 34), merge "TempData" with "Data3<number>" using variable ‘VarID’ in "Data3<number>" = ‘Var3’(11:19) in "TempData". One "Data3<number>" to 0, 1 "TempData" (one to one match or one to none). Keep only matching "TempData" records.
   Output only a single “Data6” SAS dataset and ASCII file “Data6.dat” containing the "TempData" records for all numbers.

proc sort data=dir.Data5 out=TempData (keep=Var3) nodupkey;
             by Var3;
run;

data DataFile;
      infile “File1.dat”;
      input @1 Number $char2.;
run;

data dir.Data6 (keep=VarID);
          set DataFile;
             ptds=’dir: Data3’||number||’.sas7bdat’;
             if fileexist(ptds) the do;
                  call symput(‘nbr’, number);
                  call execute(‘&nbr’);
             end;
                if (&nbr ne ‘34’) then do;  
                    VarID=substr(Var3,  11,19);
                  merge  TempData(in=a) dir.Data3&nbr (in=b);
                    by VarID;
                   if a then output;
run;

Code has not been tested or ran through the compiler yet. Based on the requrements, please comment if the above code is not correct or/and should be written in some other way.


 
0
Comment
Question by:labradorchik
  • 6
  • 3
9 Comments
 

Author Comment

by:labradorchik
ID: 36982219
Anyone would like to comments on the code? Any comments are welcome!  
Thank you!
0
 
LVL 14

Expert Comment

by:Aloysius Low
ID: 36983792
the code after reading in your file will definitely not work

why are you checking for file existence? is the table "Data3<number>" expected to be present, or are you supposed to handle it if it's missing? i'm saying this is because it's not in your requirements.

also, you will not be able to access a macro variable value within the same data step as the call symput statement.

lastly, you'll need to write a macro to achieve the last requirement, and do consider filtering out the value '34' from DataFile so that you will only process what is required.
0
 

Author Comment

by:labradorchik
ID: 36989470
Thank you lowaloysius:!

I am still trying to figure out how to code this. I think I need to check for "Data3<number>" existence, because there might be more than one of these files in the directory with a different number within the name of the "Data3<number>" dataset.

As far as macros goes I feel totally clueless.  How can I make "&nbr" work? I just run my code and I am constantly getting the following error under "&nbr"

WARNING: Apparent symbolic reference NBR not resolved.
ERROR 22-322: Syntax error, expecting one of the following: a name,
              a quoted string, a numeric constant, a datetime constant,
              a missing value, INPUT, PUT.


Where do you think the problem might be?
Thank you!
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 14

Expert Comment

by:Aloysius Low
ID: 36990077
there might be a number of data3<number> existing but is it given that the data3<number> that you are going to need will be present? if yes, and since there's no requirement to check for existence for graceful handling, then there might be no need to do so...

however, if you really need to do so, one way is to use call execute which will generate the required code on the fly...

you are getting the error, because like i already said, you can't have a macro variable assignment and the usage of the macro variable within 1 data step... i would say you'll need a macro and a few steps to achieve your last requirement:
1. assign a row number for each record in DataFile
macro start
2. use proc sql to read in Number into your macro variable NBR
3. use an if-else condition to determine which of the following to execute
  a. if Data3<number> exists, merge the 2 datasets
  b. else don't do anything
macro end
0
 

Author Comment

by:labradorchik
ID: 36993758

Thank you very much for your comments! I placed the mojority of my code in macro, but Data3<number> still does not resolve. Error says that that  " Data3<number>" data can't be found (I have 2 of those datasets in the "dir" directory, so datasets are there).  
I made new changes in bold, please see below:

proc sort data=dir.Data5 out=TempData (keep=Var3) nodupkey;
             by Var3;
run;

data DataFile;
      infile “File1.dat”;  
      input @1 Number $char2.;
run;
/*Note:   "File1.dat" already has a few row numbers for variable 'Number' */

%macro first (nbr);
data dir.Data6 (keep=VarID);
          set DataFile end=last;
             ptds=’dir: Data3’||&nbr||’.sas7bdat’;
             if fileexist(ptds) the do;
                  call symput(‘nbr’, number);
                  call execute(‘&nbr’);
             end;
                if (&nbr ne ‘34’) then do;  
                    VarID=substr(Var3,  11,19);
                  merge  TempData(in=a) dir.Data3&nbr (in=b);
                    by VarID;
                   if a then output;
               end;
             
                if last then do;
                  file dir.Data6.dat;
                     put @1  Var1 $char2.
                           @3 Var2 $char2.
                           @5 V....               ;
                end;  

run;

%mend first;
%first (number)

0
 
LVL 14

Accepted Solution

by:
Aloysius Low earned 2000 total points
ID: 36997575
you will need to split the creation of data6 dataset into at least 4 steps... the merge and output to file cannot be on the same data step as data6.

also, are you sure ptds value is correct? try checking the value to make sure that the path you are checking is correct.
0
 

Author Comment

by:labradorchik
ID: 36997790
Great! Thanks!! I think I got it this time... will post code within the next 10 hours.
0
 

Author Comment

by:labradorchik
ID: 37000020
proc sort data=dir.Data5 out=TempData (keep=Var3) nodupkey;
             by Var3;
run;

%macro first (nbr);
data dir.Data6;
             ds=’dir:Data3&nbr..sas7bdat’;
             if fileexist(ds) the do;
            end;
                if (&nbr ^ne 34) then do;  
                    VarID=substr(Var3,  11,19);
                  merge  TempData(in=a) dir.Data3&nbr (in=b);
                    by VarID;
                   if a then output;
               end;
             
                data _null_;
                set dir.Data6;
                  file dir.Data6.dat;
                     put @1  Var1 $char2.
                           @3 Var2 $char2.
                           @5 V....               ;
run;
%mend first;

data _null_;
     infile “File1.dat”;  
      input @1 Number $char2.;
       call symput ('nbr', number);
       call execute ('%first'(&nbr)');
run;
/*Note:   "File1.dat" already has a few row numbers for variable 'Number' */

Errors that stops program process is: ERROR: File dir.Data32.DATA does not exist.
Then I get all the same errors for other (3 to 18, also number 23 and 34) datasets in the directory.  I guess because of this error, my merges are not happening at all.  
Note: all datasets with numbers (2 to 18, also number 23 and 34) are present in the "dir" directory.
I am not sure if my "ds=dir.file" structure is correct.  


Any comments?
0
 

Author Closing Comment

by:labradorchik
ID: 37004433
Thank you for your comments! I just posted another similar question, please see if you can comment on that one. Thank you in advance!!
0

Featured Post

Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

We live in a world of interfaces like the one in the title picture. VBA also allows to use interfaces which offers a lot of possibilities. This article describes how to use interfaces in VBA and how to work around their bugs.
MSSQL DB-maintenance also needs implementation of multiple activities. However, unprecedented errors can hamper the database management. In that case, deploying Stellar SQL Database Toolkit ensures fast and accurate database and backup repair as wel…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question