Solved

SAS Data Sets Manipulations

Posted on 2011-03-16
16
1,133 Views
Last Modified: 2013-11-16
Hello,
I would like to very that one SAS dataset "Data1" has the same amount of records/counts that my second SAS dataset "Data2". For testing purposes I placed exactly 20 records in "Data1" dataset and 20 records in "Data2" dataset, so they should match.
This is what I have so far:

data temp1;
 length count1 10;
 retain count1 0;
  set mylab.Data1;
count1 + 1;
run;

data temp2;
 length count2 10;
 retain count2 0;
  set mylab.Data2;
count2 + 1;
run;


How would I compare both datasets' records/counts (count1 and count2) automatically in my program?  I know how to compare records/counts within one dataset but what if I have two different datasets?? Should I use macro, if so, how should I use it?
Note: records/counts in my both datasets should equal, if not I would have write a message to an another file saying "There is discrepancy in records count! "

Thanks in advance!
0
Comment
Question by:labradorchik
  • 7
  • 5
  • 4
16 Comments
 
LVL 14

Expert Comment

by:Aloysius Low
ID: 35148716
i would use proc compare... however i'm at home now, so i don't have the full syntax (sorry it isn't something i use frequently, so i'm not able to recall the syntax offhand)...

you could see http://support.sas.com/techsup/technote/ts440.pdf for hints on how to do it...

before you use proc compare, you should sort both datasets according to the variables you would like to compare.. typically when i do a 'exact match' comparison, i would sort them by all the variables and compare by observation ID (aka row number)... alternatively, you could also compare by key fields...

to implement the last step, the idea would be to check the output dataset... however, i'll need to be in office to write some code and test so that i can advise further on this so that's going to be another 9 hours or so (it's night time here now...)
0
 

Author Comment

by:labradorchik
ID: 35148829
lowaloysius: I am also looking into this problem myself, but please do post an example of this verification whenever you do have a chance!
Note: I can probably do this on just one key variable.
0
 

Author Comment

by:labradorchik
ID: 35150613
lowaloysius: From your link above I see PROC COMPARE is used to compare variables but not records. In my case I just need to compare two datasets and make sure they have the same amount of records written.
I was thinking to use MACROs to store 'count1' and 'count2' and then compare two MACRO counts together, but I do not know how to do that yet.
0
 
LVL 7

Expert Comment

by:d507201
ID: 35150967
I use Proc Compare quite a bit and it can be used to compare records.  Here's some log and output content generated comparing a test read of an external file to the SAS data set used to create it.  The proc compares all observations and all variables.  Record comparison is done on a 1:1 basis so the datasets must have the same number of obs and they must be in the same sort order.

1741  proc compare
1742      base=trigger.demand_call_file_&_fileDate
1743      compare=testRead;
1744  run;

<some output not shown>
Number of Observations in Common: 293305.
Total Number of Observations Read from TRIGGER.DEMAND_CALL_FILE_20110315: 293305.
Total Number of Observations Read from WORK.TESTREAD: 293305.

Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 293305.

If any records are not exactly the same then the Number of Obs with Some Compared Vars Unequal will not be zero.  Proc Compare has lots of options for identifying differences so you'll want to read up on them in the on-line documentation.  The code examples are very good.

As might be expected, the larger the data sets being compared the slower Proc Compare runs.  This example has 293,305 records each with 7 fields and it takes about 6 minutes to run.  2 fields are numeric and 6 are character but I don't know if the mix makes a difference in performance.
0
 
LVL 7

Accepted Solution

by:
d507201 earned 350 total points
ID: 35151288
Rereading your original message, proc compare is a overkill for comparing record counts.  Here's how to do it using the data sets' internal metadata--it returns the answer immediately.  For each data set, open it, put the number of obs into a macro variable, close the data set.  Then use data _null_ to compare the values.

%let _dsid=%sysfunc(open(sashelp.class));
%let _numRecs1=%sysfunc(attrn(%eval(&_dsid-0),nobs));
%let _rc=%sysfunc(close(%eval(&_dsid-0)));

%let _dsid=%sysfunc(open(sashelp.company));
%let _numRecs2=%sysfunc(attrn(%eval(&_dsid-0),nobs));
%let _rc=%sysfunc(close(%eval(&_dsid-0)));

data _null_;
      if "&_numRecs1" ne "&_numRecs2" then put "Number of records is not equal";
      else put "Number of records is equal";
run;
0
 

Author Comment

by:labradorchik
ID: 35151938
d507201 Thanks!!  This this exactly what I was looking for!
Questions: Are "_dsid"  and  "_rc" stand for counter names in your code?  Did you place "_" before these two names for a reason or its just part of the name of those counter names?

Also, If I need to output message to file "File1" after verifying my counts should my code look like below?

  data _null_;
      if "&_numRecs1" ne "&_numRecs2" then
       file File1 mod;
              put "Number of records is not equal";
      else put "Number of records is equal";
run;

Of course, I would have to define "File1" location directory prior to this data step.
0
 
LVL 14

Expert Comment

by:Aloysius Low
ID: 35153390
are you trying to compare records and the count or just the count? the way i read it is compare records... if all you need is the count comparison, you can refer to SASHELP.VTABLE. This is a view containing metadata information about tables currently registered with the SAS session.

if you need to compare records (contents of a table), then you'll need PROC COMPARE.

proc sql;
    select nobs into :count1 from SASHELP.VTABLE where LIBNAME = "[library name for table 1]" and MEMNAME = "[table name 1]";
    select nobs into :count2 from SASHELP.VTABLE where LIBNAME = "[library name for table 2]" and MEMNAME = "[table name 2]";
quit;


for your code to output to file, you have placed the file statement in the wrong place - need to swap lines with the if statement:
data _null_;
  file File1 mod;
    if &count1 ne &count2 then
      put "Number of records is not equal";
    else put "Number of records is equal";
run;

also, i would not recommend putting the variables within quotation marks. it's better to compare the numbers as numbers instead of text.
0
 

Author Comment

by:labradorchik
ID: 35157276
I am sorry for the confusion! I just need to verify counts of records in both datastes by comparing the counts.

Note: there also more things to consider in my code that I forgot to mention before:
1. "Data1" has many records with 'Var5' equal from 01 to 99 in those records. So, for example 'Var5' may be found with the same value of 01 or other values in two or three records within "Data1".    
2. "Data2" only has two variables:
   a. 'Var5' which has values from 01 to 99.
   b. 'Var4' which has a value or a number of counts for each value in 'Var5'.  


So will this code below work or should I better use PROQ SQL with SASHELP.VTABLE?  I think I can leave first three lines of code and the last five, but in the middle section of code (with "Data2") I would probably have to change a few things. Since 'Var4' in "Data2" has already number of counts for each value in 'Var5' I can probably just use that number to compare my counts with "Data1".  
Any help would be greatly appreciated!

%let _dsid=%sysfunc(open(mylab.Data1));
%let _numRecs1=%sysfunc(attrn(%eval(&_dsid-0),nobs));
%let _rc=%sysfunc(close(%eval(&_dsid-0)));

%let _dsid=%sysfunc(open(mylab.Data2));
%let _numRecs2=%sysfunc(attrn(%eval(&_dsid-0),nobs));    
%let _rc=%sysfunc(close(%eval(&_dsid-0)));


data _null_;
   file File1 mod;
      if &_numRecs1 ne &_numRecs2 then put "Number of records is not equal";
      else put "Number of records is equal";
run;

Thanks!
0
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 
LVL 14

Expert Comment

by:Aloysius Low
ID: 35157402
from your description, it appears that Var4 in Data2 is not useful and meant to add confusion...

also, it's not clear from your post whether duplicate records in Data1 should be counted as 1 or the actual number of times the records appear

finally, it's also not clear if you are supposed to match the record count by the value in Var5. what i mean is if in Data1, you have 4 records of Var5 = "01", and in Data2, you have 1 record where Var5 = "01" and Var4 = 3, are you supposed to prompt that there's a mismatch, even though the final record count (count of rows in Data1 and Data2) tallies? if not, it adds further to my first statement that Var4 serves to confuse only. this means that either way, the original suggestion and the alternative i have provided will work.

whether or not to use the PROC SQL provided by me, or the system functions is up to you - both serve the same function. the SASHELP.VTABLE presents a much easier way to understand the metadata of the table.
0
 

Author Comment

by:labradorchik
ID: 35158968
lowaloysius:
You question was:
"also, it's not clear from your post whether duplicate records in Data1 should be counted as 1 or the actual number of times the records appear"

I just doubled check this, - there will be no duplicate records in Data1, so it is actual number of times records appear in the Data1 dataset.
Both datastes designed that way that they should both match.  There should be no mismatches.

Thank you in advance!
0
 
LVL 7

Expert Comment

by:d507201
ID: 35159604
For the code I sent, "_dsid"  holds the data set identifier that SAS assigns when it opens the data set and  "_rc" is a sort of dummy variable--return code--that's needed for the close function to work.  I usually start my macro variable names with "_" as another visual cue.  

If you're comparing record counts regardless of duplicatest or the record contents then either will work.  If you need to exclude dups or apply conditions then use a select count () statement in proc sql so that you can use the unique constraint or apply where criteria.
0
 
LVL 14

Expert Comment

by:Aloysius Low
ID: 35161900
if that's the case, Var4 in Data2 serves no purpose but to add confusion, unless there's specific requirement to use it... given this, it does not matter which method you use - SASHELP.VTABLES, or to run the system functions to read the number of observations
0
 

Author Comment

by:labradorchik
ID: 35199783
Hello,
After running below code I received the following messages:

%let _dsid=%sysfunc(open(mylab.Data1));
%let _numRecs1=%sysfunc(attrn(%eval(&_dsid-0),nobs));
%let _rc=%sysfunc(close(%eval(&_dsid-0)));

%let _dsid=%sysfunc(open(mylab.Data2));
%let _numRecs2=%sysfunc(attrn(%eval(&_dsid-0),nobs));    
%let _rc=%sysfunc(close(%eval(&_dsid-0)));

data _null_;
   file File1 mod;
      if &_numRecs1 ne &_numRecs2 then
         put  "Current Data1 count is                     "  &_numRecs1;
         put "Expected records count in Data2 is  "  &_numRecs2;
 else put "Number of records is equal";
run;

**************************************************************************************
546           put "Current Data1 count is           " &_numRecs1;
NOTE: Line generated by the macro variable "_NUMRECS1".
546   3
         -
        22
ERROR 22-322: Syntax error, expecting one of the following: a name, #, +, @.

547           put "Expected records count in Data2 is " &_numRecs2;
NOTE: Line generated by the macro variable "_NUMRECS2".
547   11
         --
        22
ERROR 22-322: Syntax error, expecting one of the following: a name, #, +, @.

548
549            else put "Number of Records is Equal";
                  ----
                  160
ERROR 160-185: No matching IF-THEN clause.
***********************************************************************************************************

Any suggestions?  
0
 
LVL 14

Assisted Solution

by:Aloysius Low
Aloysius Low earned 150 total points
ID: 35203402
could you please post the entire log? what you have posted is not enough to determine the cause of the first 2 errors...

for the 3rd error, if you have more than 1 line in an if-else statement, as in your case, you have to use the do statement to enclose them properly:

if (condition) then do;
  statement1;
  statement2;
end;
else do;
  statement3;
  statement4;
end;
0
 
LVL 7

Expert Comment

by:d507201
ID: 35207104
In this case the macro variables won't resolve proberly unless they're in double quotes.  SAS is looking for a variables named 3 and 11 and obviously those are invalid names.

put  "Current Data1 count is                       &_numRecs1";
put "Expected records count in Data2 is    &_numRecs2";
0
 

Author Closing Comment

by:labradorchik
ID: 35208703
Thanks to both of you!!  Both of your comments helped me a lot!!
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Suggested Solutions

Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
Many companies are looking to get out of the datacenter business and to services like Microsoft Azure to provide Infrastructure as a Service (IaaS) solutions for legacy client server workloads, rather than continuing to make capital investments in h…
Via a live example, show how to take different types of Oracle backups using RMAN.
This video shows how to Export data from an Oracle database using the Datapump Export Utility.  The corresponding Datapump Import utility is also discussed and demonstrated.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now