Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

SAS Data Sets Manipulations

Posted on 2011-03-16
16
Medium Priority
?
1,163 Views
Last Modified: 2013-11-16
Hello,
I would like to very that one SAS dataset "Data1" has the same amount of records/counts that my second SAS dataset "Data2". For testing purposes I placed exactly 20 records in "Data1" dataset and 20 records in "Data2" dataset, so they should match.
This is what I have so far:

data temp1;
 length count1 10;
 retain count1 0;
  set mylab.Data1;
count1 + 1;
run;

data temp2;
 length count2 10;
 retain count2 0;
  set mylab.Data2;
count2 + 1;
run;


How would I compare both datasets' records/counts (count1 and count2) automatically in my program?  I know how to compare records/counts within one dataset but what if I have two different datasets?? Should I use macro, if so, how should I use it?
Note: records/counts in my both datasets should equal, if not I would have write a message to an another file saying "There is discrepancy in records count! " 

Thanks in advance!
0
Comment
Question by:labradorchik
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 4
16 Comments
 
LVL 14

Expert Comment

by:Aloysius Low
ID: 35148716
i would use proc compare... however i'm at home now, so i don't have the full syntax (sorry it isn't something i use frequently, so i'm not able to recall the syntax offhand)...

you could see http://support.sas.com/techsup/technote/ts440.pdf for hints on how to do it...

before you use proc compare, you should sort both datasets according to the variables you would like to compare.. typically when i do a 'exact match' comparison, i would sort them by all the variables and compare by observation ID (aka row number)... alternatively, you could also compare by key fields...

to implement the last step, the idea would be to check the output dataset... however, i'll need to be in office to write some code and test so that i can advise further on this so that's going to be another 9 hours or so (it's night time here now...)
0
 

Author Comment

by:labradorchik
ID: 35148829
lowaloysius: I am also looking into this problem myself, but please do post an example of this verification whenever you do have a chance!
Note: I can probably do this on just one key variable.
0
 

Author Comment

by:labradorchik
ID: 35150613
lowaloysius: From your link above I see PROC COMPARE is used to compare variables but not records. In my case I just need to compare two datasets and make sure they have the same amount of records written.
I was thinking to use MACROs to store 'count1' and 'count2' and then compare two MACRO counts together, but I do not know how to do that yet.
0
Veeam Task Manager for Hyper-V

Task Manager for Hyper-V provides critical information that allows you to monitor Hyper-V performance by displaying real-time views of CPU and memory at the individual VM-level, so you can quickly identify which VMs are using host resources.

 
LVL 7

Expert Comment

by:d507201
ID: 35150967
I use Proc Compare quite a bit and it can be used to compare records.  Here's some log and output content generated comparing a test read of an external file to the SAS data set used to create it.  The proc compares all observations and all variables.  Record comparison is done on a 1:1 basis so the datasets must have the same number of obs and they must be in the same sort order.

1741  proc compare
1742      base=trigger.demand_call_file_&_fileDate
1743      compare=testRead;
1744  run;

<some output not shown>
Number of Observations in Common: 293305.
Total Number of Observations Read from TRIGGER.DEMAND_CALL_FILE_20110315: 293305.
Total Number of Observations Read from WORK.TESTREAD: 293305.

Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 293305.

If any records are not exactly the same then the Number of Obs with Some Compared Vars Unequal will not be zero.  Proc Compare has lots of options for identifying differences so you'll want to read up on them in the on-line documentation.  The code examples are very good.

As might be expected, the larger the data sets being compared the slower Proc Compare runs.  This example has 293,305 records each with 7 fields and it takes about 6 minutes to run.  2 fields are numeric and 6 are character but I don't know if the mix makes a difference in performance.
0
 
LVL 7

Accepted Solution

by:
d507201 earned 1400 total points
ID: 35151288
Rereading your original message, proc compare is a overkill for comparing record counts.  Here's how to do it using the data sets' internal metadata--it returns the answer immediately.  For each data set, open it, put the number of obs into a macro variable, close the data set.  Then use data _null_ to compare the values.

%let _dsid=%sysfunc(open(sashelp.class));
%let _numRecs1=%sysfunc(attrn(%eval(&_dsid-0),nobs));
%let _rc=%sysfunc(close(%eval(&_dsid-0)));

%let _dsid=%sysfunc(open(sashelp.company));
%let _numRecs2=%sysfunc(attrn(%eval(&_dsid-0),nobs));
%let _rc=%sysfunc(close(%eval(&_dsid-0)));

data _null_;
      if "&_numRecs1" ne "&_numRecs2" then put "Number of records is not equal";
      else put "Number of records is equal";
run;
0
 

Author Comment

by:labradorchik
ID: 35151938
d507201 Thanks!!  This this exactly what I was looking for!
Questions: Are "_dsid"  and  "_rc" stand for counter names in your code?  Did you place "_" before these two names for a reason or its just part of the name of those counter names?

Also, If I need to output message to file "File1" after verifying my counts should my code look like below?

  data _null_;
      if "&_numRecs1" ne "&_numRecs2" then
       file File1 mod;
              put "Number of records is not equal";
      else put "Number of records is equal";
run;

Of course, I would have to define "File1" location directory prior to this data step.
0
 
LVL 14

Expert Comment

by:Aloysius Low
ID: 35153390
are you trying to compare records and the count or just the count? the way i read it is compare records... if all you need is the count comparison, you can refer to SASHELP.VTABLE. This is a view containing metadata information about tables currently registered with the SAS session.

if you need to compare records (contents of a table), then you'll need PROC COMPARE.

proc sql;
    select nobs into :count1 from SASHELP.VTABLE where LIBNAME = "[library name for table 1]" and MEMNAME = "[table name 1]";
    select nobs into :count2 from SASHELP.VTABLE where LIBNAME = "[library name for table 2]" and MEMNAME = "[table name 2]";
quit;


for your code to output to file, you have placed the file statement in the wrong place - need to swap lines with the if statement:
data _null_;
  file File1 mod;
    if &count1 ne &count2 then
      put "Number of records is not equal";
    else put "Number of records is equal";
run;

also, i would not recommend putting the variables within quotation marks. it's better to compare the numbers as numbers instead of text.
0
 

Author Comment

by:labradorchik
ID: 35157276
I am sorry for the confusion! I just need to verify counts of records in both datastes by comparing the counts.

Note: there also more things to consider in my code that I forgot to mention before:
1. "Data1" has many records with 'Var5' equal from 01 to 99 in those records. So, for example 'Var5' may be found with the same value of 01 or other values in two or three records within "Data1".    
2. "Data2" only has two variables:
   a. 'Var5' which has values from 01 to 99.
   b. 'Var4' which has a value or a number of counts for each value in 'Var5'.  


So will this code below work or should I better use PROQ SQL with SASHELP.VTABLE?  I think I can leave first three lines of code and the last five, but in the middle section of code (with "Data2") I would probably have to change a few things. Since 'Var4' in "Data2" has already number of counts for each value in 'Var5' I can probably just use that number to compare my counts with "Data1".  
Any help would be greatly appreciated!

%let _dsid=%sysfunc(open(mylab.Data1));
%let _numRecs1=%sysfunc(attrn(%eval(&_dsid-0),nobs));
%let _rc=%sysfunc(close(%eval(&_dsid-0)));

%let _dsid=%sysfunc(open(mylab.Data2));
%let _numRecs2=%sysfunc(attrn(%eval(&_dsid-0),nobs));    
%let _rc=%sysfunc(close(%eval(&_dsid-0)));


data _null_;
   file File1 mod;
      if &_numRecs1 ne &_numRecs2 then put "Number of records is not equal";
      else put "Number of records is equal";
run;

Thanks!
0
 
LVL 14

Expert Comment

by:Aloysius Low
ID: 35157402
from your description, it appears that Var4 in Data2 is not useful and meant to add confusion...

also, it's not clear from your post whether duplicate records in Data1 should be counted as 1 or the actual number of times the records appear

finally, it's also not clear if you are supposed to match the record count by the value in Var5. what i mean is if in Data1, you have 4 records of Var5 = "01", and in Data2, you have 1 record where Var5 = "01" and Var4 = 3, are you supposed to prompt that there's a mismatch, even though the final record count (count of rows in Data1 and Data2) tallies? if not, it adds further to my first statement that Var4 serves to confuse only. this means that either way, the original suggestion and the alternative i have provided will work.

whether or not to use the PROC SQL provided by me, or the system functions is up to you - both serve the same function. the SASHELP.VTABLE presents a much easier way to understand the metadata of the table.
0
 

Author Comment

by:labradorchik
ID: 35158968
lowaloysius:
You question was:
"also, it's not clear from your post whether duplicate records in Data1 should be counted as 1 or the actual number of times the records appear"

I just doubled check this, - there will be no duplicate records in Data1, so it is actual number of times records appear in the Data1 dataset.
Both datastes designed that way that they should both match.  There should be no mismatches.

Thank you in advance!
0
 
LVL 7

Expert Comment

by:d507201
ID: 35159604
For the code I sent, "_dsid"  holds the data set identifier that SAS assigns when it opens the data set and  "_rc" is a sort of dummy variable--return code--that's needed for the close function to work.  I usually start my macro variable names with "_" as another visual cue.  

If you're comparing record counts regardless of duplicatest or the record contents then either will work.  If you need to exclude dups or apply conditions then use a select count () statement in proc sql so that you can use the unique constraint or apply where criteria.
0
 
LVL 14

Expert Comment

by:Aloysius Low
ID: 35161900
if that's the case, Var4 in Data2 serves no purpose but to add confusion, unless there's specific requirement to use it... given this, it does not matter which method you use - SASHELP.VTABLES, or to run the system functions to read the number of observations
0
 

Author Comment

by:labradorchik
ID: 35199783
Hello,
After running below code I received the following messages:

%let _dsid=%sysfunc(open(mylab.Data1));
%let _numRecs1=%sysfunc(attrn(%eval(&_dsid-0),nobs));
%let _rc=%sysfunc(close(%eval(&_dsid-0)));

%let _dsid=%sysfunc(open(mylab.Data2));
%let _numRecs2=%sysfunc(attrn(%eval(&_dsid-0),nobs));    
%let _rc=%sysfunc(close(%eval(&_dsid-0)));

data _null_;
   file File1 mod;
      if &_numRecs1 ne &_numRecs2 then
         put  "Current Data1 count is                     "  &_numRecs1;
         put "Expected records count in Data2 is  "  &_numRecs2;
 else put "Number of records is equal";
run;

**************************************************************************************
546           put "Current Data1 count is           " &_numRecs1;
NOTE: Line generated by the macro variable "_NUMRECS1".
546   3
         -
        22
ERROR 22-322: Syntax error, expecting one of the following: a name, #, +, @.

547           put "Expected records count in Data2 is " &_numRecs2;
NOTE: Line generated by the macro variable "_NUMRECS2".
547   11
         --
        22
ERROR 22-322: Syntax error, expecting one of the following: a name, #, +, @.

548
549            else put "Number of Records is Equal";
                  ----
                  160
ERROR 160-185: No matching IF-THEN clause.
***********************************************************************************************************

Any suggestions?  
0
 
LVL 14

Assisted Solution

by:Aloysius Low
Aloysius Low earned 600 total points
ID: 35203402
could you please post the entire log? what you have posted is not enough to determine the cause of the first 2 errors...

for the 3rd error, if you have more than 1 line in an if-else statement, as in your case, you have to use the do statement to enclose them properly:

if (condition) then do;
  statement1;
  statement2;
end;
else do;
  statement3;
  statement4;
end;
0
 
LVL 7

Expert Comment

by:d507201
ID: 35207104
In this case the macro variables won't resolve proberly unless they're in double quotes.  SAS is looking for a variables named 3 and 11 and obviously those are invalid names.

put  "Current Data1 count is                       &_numRecs1";
put "Expected records count in Data2 is    &_numRecs2";
0
 

Author Closing Comment

by:labradorchik
ID: 35208703
Thanks to both of you!!  Both of your comments helped me a lot!!
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Microsoft Access is a place to store data within tables and represent this stored data using multiple database objects such as in form of macros, forms, reports, etc. After a MS Access database is created there is need to improve the performance and…
Recently I was talking with Tim Sharp, one of my colleagues from our Technical Account Manager team about MongoDB’s scalability. While doing some quick training with some of the Percona team, Tim brought something to my attention...
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question