[Last Call] Learn how to a build a cloud-first strategyRegister Now


SAS Data Sets Manipulations

Posted on 2011-10-13
Medium Priority
Last Modified: 2013-11-16

Trying to create a new SAS datasets & ASCII file from already given datasets/ASCII.

Given SAS datasets:
 "Data3<number>" (Data3<number> dataset may have different numbers (from 2 to 18, also number 23 and 34) inside its dataset name).
 "Data4<number>" (Data4<number> dataset may have different numbers (from 2 to 18, also number 23 and 34) inside its dataset name).
Given ASCII file:

Creating SAS datasets:
 "TempData" (temporary dataset)
 "DataFile"  (temporary dataset)
“ Data6”

Creating ASCII file:

Requirements for coding:
 1. Combine datasets "Data1" and "Data2" to "Data5" datatset
 2. Create “Data6”.
       2.1. Sort "Data5" dataset no duplicate keys on ‘Var3’, output to "TempData" temporary dataset (keep only ‘Var3’).
       2.2. For each variable ‘Number’ (this variable has numbers from 2 to 18, also number 23 and 34) in "File1.dat" (except number 34), merge "TempData" with "Data3<number>" using ‘VarID’ in "Data3<number>" = ‘Var3’(11:19) in "TempData".
            2.2.1. One "Data3<number>" to 0, 1 "TempData" (one to one match or one to none).
            2.2.2. Keep only matching "TempData" records.
2.3. Output only a single “Data6” SAS dataset and ASCII file “Data6.dat”containing the "TempData" records for all numbers.

This is what I have coded so far, but I am not sure how valid my code is. Also, testing SAS datasets are not available yet, so I can’t run my code to find out what errors I have.

/*1*/	data dir.Data5;
  	    set dir.Data1 dir.Data2;

/*2*/	2.1.  proc sort data=dir.Data5 out=TempData (keep=Var3) nodup;
                  by Var3;

        2.2.  data DataFile;
                  infile “File1.dat”;
                  input @1 Number $char2.;

        2.3.  data dir.Data6 (keep=VarID); 
                  set DataFile;
                   ptds=’dir: Data3’||number||’.sas7bdat’;
                    if fileexist(ptds) the do;
                     call symput(‘nbr’, number);
                     call execute(‘&nbr’);
                     if (&nbr ne ‘34’) then do;   
                      VarID=substr(Var3,  11,19);
                   merge  TempData(in=a) dir.Data3&nbr (in=b);
                     by VarID;
                  if a then output;

Open in new window

I know that my code looks confusing, but requirements are correct. What am I missing in my code? Also, how would I as well create and output “Data6.dat”?

Any comments are welcome!
Thank you in advance!!

Question by:labradorchik
  • 3

Author Comment

ID: 36969451
Anyone would like to make a comment on my code? What is the best technique to code for these requirements?  Should I have used MACRO instead?
Please, any comments are welcome!! :)

Accepted Solution

d507201 earned 2000 total points
ID: 36971273
Item 2.1  When unduplicating with SORT, the NODUP and NODUPKEY options do different things.  Your requirements are to unduplicate based on the value of a Key-- that's what NODUPKEY does.   You're using NODUP and that removes records that are exact duplicates.  

Tips and Clues -- When unduplicating, SAS keeps the first unique record so sometimes you need to first sort the data to get it in a specific order--oldest record first, maybe, based on a date--before unduplicating.

Other items not looked at yet.

Author Comment

ID: 36971329
Thank you!
So, I guess I should have used NODUPKEY instead, like this:

proc sort data=dir.Data5 out=TempData (keep=Var3) nodupkey;
             by Var3;

Or, do I need to use NODBKEY on the TempData dataset in the next step (different proc sort step)?

Author Closing Comment

ID: 37004441
Thank you for your comments!! I just posted another similar question, please see if you can comment on that one. Thank you in advance!!

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article, I’ll look at how you can use a backup to start a secondary instance for MongoDB.
MSSQL DB-maintenance also needs implementation of multiple activities. However, unprecedented errors can hamper the database management. In that case, deploying Stellar SQL Database Toolkit ensures fast and accurate database and backup repair as wel…
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
Suggested Courses

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question