Solved

SAS Data Sets Manipulations

Posted on 2011-02-10
4
607 Views
Last Modified: 2013-11-16
Hello, I am trying to do the following:

Create "NewData" SAS dataset.

Steps:
   1. Read in all records from already existed "OldData1" SAS dataset and sort them by 'Var1', 'Var2', and 'Var3'.
   2. Read in all records from already existed "OldData2" SAS dataset and sort them by 'Var11', 'Var12', and 'Var13'.
         Note: 'Var1', 'Var2', and 'Var3' in "OldData1" have the same info as 'Var11', 'Var12', and 'Var13 in "OldData2".
   3. Merge "OldData1" and "OldData2" by 'Var1', 'Var2', and 'Var3'.
   4. Extract all variables in "OldData2" for matched records in "OldData1".
   5. Keep all records in "OldData1" and count the number of records in "NewData" and send counts to "Data.Log" file.

*************************
The following code is what I have wrote so far. Please let me know if I need to adjust, change or delete anything. I can not run my code through compiler yet because "OldData1" and "OldData2" are not finished yet.
Thank you!


data temp1;
  set lib.OldData1;
 run;

 proc sort data=temp1;
  by Var1 Var2 Var3;
 run;

 data temp2 (rename=(Var1=Var11 Var2=Var12 Var3=Var13));
  set lib.OldData2;
 run;

 proc sort data=temp2;
  by Var1 Var2 Var3;
 run;

 data lib.NewData (drop=count);
   length count 8;
   retain count 0;
  merge temp1 (in=a) temp2 (in=b) end=lastrec;
   by Var1 Var2 Var3;
    if b;
   count + 1;

 if lastrec then do;
  file Data.Log;
    put "Total Records Read-in from all NewData dataset  " count;
 end;
run;


0
Comment
Question by:labradorchik
  • 2
  • 2
4 Comments
 
LVL 14

Expert Comment

by:Aloysius Low
Comment Utility
your renaming of variable in data step for temp2 is not correct. it should be var11=var1 and so on

you are supposed to keep all records from olddata1 so the join is left outer join instead of right outer join in your merge step. in this case, you should have done an if a instead of if b.

to keep all variables from olddata2, you could specify in your merge statement that temp1 (keep = var1 var2 var3), or alternatively, when you extract data from olddata1 into temp1, just keep those 3 variables.

you have missed out defining a file statement for your file output.
0
 

Author Comment

by:labradorchik
Comment Utility
Thank you lowaloysius for your comments!!
My mistake, I meant to code "Var11=Var1", - not "Var1=Var11".
I defined my file statement in my original coding, just did not include it here in my coding question as it seems obvious.
I also changed "if a" instead of "if b".

I have a question about your third comment: What if my "Olddata2" data set has many variables (like 30 or 40), would it be more efficient to keep all variables from "Olddata2" in some other coding way? I just usually use "keep" statement when I have to keep up to 10 variables but not more then that.

Also, when I test my code I do not currently get any errors and I see that I am creating "Data.Log" but nothing is written to it (Data.Log just stays empty) for some reason.  What can be a problem?
0
 
LVL 14

Accepted Solution

by:
Aloysius Low earned 500 total points
Comment Utility
if you don't specify a keep statement and if there's no drop statement, all variables from that dataset is kept by default. however, the problem here is that you didn't specify any keep or drop variables in your merge statement, hence, all variables from olddata1 and olddata2 would be kept. this is why i asked you to do a keep on olddata1, keeping var1, var2 and var3.

i believe that you would have an error running the code actually, and that the log file isn't the one really generated by your code. as i mentioned, you have not defined a filename statement for your file statement. otherwise, you could also enclose the the filename in quotes i.e. file "D:\logfile.log". if it's still not working, then attaching the output log could help :)
0
 

Author Closing Comment

by:labradorchik
Comment Utility
Thank you, [b]lowaloysius[/b]!
After a few fixed errors everything worked out fine.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
Many companies are looking to get out of the datacenter business and to services like Microsoft Azure to provide Infrastructure as a Service (IaaS) solutions for legacy client server workloads, rather than continuing to make capital investments in h…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now