Avatar of labradorchik
labradorchik
Flag for United States of America asked on

How to output SAS listing of variables with ASCII special characters?

I have three SAS datasets (data1, data2, data3) with variables Var1, Var2, Var3, and Var4 in each dataset. I would like to output a list file (print listing) of the unique Var1 and a list file for unique Var3 from all datasets (data1, data2, data3), that contain a special character (standard ASCII special characters, examples: ? ( ) / ; etc…).    Then, just to add Var5 to each list to indicate the source of each record from those datasets. Duplicate records, that is records on more than one list, will be included.

Just a Note:
if data1 then Var5 is A
if data2 then Var5 is B
if data3 then Var5 is C

How can this be done?
DatabasesProgrammingDB Reporting Tools

Avatar of undefined
Last Comment
labradorchik

8/22/2022 - Mon
Ian

Hi there labradorchik,

The code below is along the lines of something that should work for you. var5 has a bit more than what you asked for. If the A, B, C is really important that it shouldn't be hard to change the code to get that.  Either change the names of the dataset data1, data2, date3 to be A, B, C or change the logic to detect the end of each dataset as read in.

It may be useful to

1.

add in an observation counter (position within source data set); and

2.

sort the output listing.

3.

use a different function call if the "punctuation" character set is not what you want. Check the sas documentation for function  ANYPUNCT
data combined;

    set data1 
          data2
          data3
          INDSNAME=var5;

***********************************************
* Note: this sets var5 to be the name of the  *
* dataset where the observation came from     *
* rather than A, B or C                       *
***********************************************; 

run;
***********************************************************
* Dataset combined has all 3 datasets combined in one     *
* long new dataset.  Variable var5 gives name of original *
* data set.                                               *
*                                                         *
* Next select unique entries where the column contains a  *
* punctuation character.                                  *
**********************************************************;


proc sql;

title "List of records with special characters in VAR1";
    select unique
          var1,
          var5
    from combined
   where (anyPunct(var1) > 0);
 
title "List of records with special characters in VAR3";
    select unique
          var3,
          var5
    from combined
   where (anyPfunct(var3) > 0);

quit;
title;

Open in new window


Regards,

Ian
labradorchik

ASKER
Hi Ian,
Thank you very much for your explanations and examples!!
I am having a hard time grasping "INDSNAME=Var5"
I read on line what INDSNAME code does and I am not sure if this is what I am looking for.
You also mentioned that there is a way to "change the logic to detect the end of each dataset as read in". May be this would work better?
Because I know for sure that I can not change the names of my datasets to A, B, and C unless it can be done just as a temporary. Datasets' names should have the same names as before.

At this moment I am getting the following error in the proc SQL portion of the code:
data combined;
    set data1 
          data2
          data3
          INDSNAME=var5;
run;

proc sql;
title "List of records with special characters in VAR1";
    select unique
          var1,
          var5
    from combined
   where (anyPunct(var1) > 0);
ERROR: The following columns were not found in the contributing tables: var5.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
 
title "List of records with special characters in VAR3";
    select unique
          var3,
          var5
    from combined
   where (anyPfunct(var3) > 0);
ERROR: The following columns were not found in the contributing tables: var5.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
NOTE: The SAS System stopped processing this step because of errors.
quit;
title;

Open in new window

labradorchik

ASKER
Is this a proper form of getting name values from each dataset?

data combined;

    set data1 (in=a)

        data2 (in=b)

        data3 (in=c)

if a then from=A;

if b then from=B;

if c then from=C;

run; 

Open in new window

I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Ian

Hi there labradorchik,

That is a good way of doing it.  Slight corrections below.

data combined (keep = var1 var3 var5);

    length Var5 $ 1;    ** our new variable that shows where obs came from **;

   ** Read in the 3 datasets one after the other **;
    set data1 (in=a)

        data2 (in=b)

        data3 (in=c)  ;

** Set Var5 to indicate where observation came from **;
Var5 = "*";                   ** Assume an error somewhere - just in case **;
if a then Var5 = "A";

if b then Var5 = "B";

if c then Var5 = "C";

run;

Open in new window




BTW: The INDSNAME=Var5" will create a variable Var5 and set it to the name of the dataset used for each observation.  

With the variables that SAS creates (like Var5 in first example and a, b, c in your code),  those variables are not automatically put into the "output vector". If we want them we must say so!  My earlier example should have had the line

data combined (keep=Var1 var3 var5);

Open in new window


to ensure that the Var5 was carried over into the output dataset.  The PROC SQL could not find the Var5 because the data step calculated it but omitted it from the output record. - Sorry for my mistake.


Ian
labradorchik

ASKER
Thank you very much Ian!!
Below code worked as it should.

data combined (keep = Var1 Var3 Var5);
    length Var5 $ 1;    

    set data1 (in=a)
        data2 (in=b)
        data3 (in=c);

Var5 = "*";                  
if a then Var5 = "A";
if b then Var5 = "B";
if c then Var5 = "C";
run;

proc sql;
title "List of records with special characters in VAR1";
    select unique
          Var1,
          Var5
    from combined
   where (anyPunct(Var1) > 0);
 
title "List of records with special characters in VAR3";
    select unique
          Var3,
          Var5
    from combined
where (anyPunct(Var3) > 0);
quit;
title;

Open in new window


Just two questions:
1) Currently I can see print listing which is output from the Proc SQL but there is no permanent file created for the listing. How can I send my print listing for example to finaloutput.lis file?  

2) What option may I use to do the same code procedure for unique Va1 and unique Var3 but just if these variables contain a number?
Note: originally Var1 and Var3 supposed to contain just letters (names) but to catch errors I would like to check all names for special characters and numbers as well.
labradorchik

ASKER
Update:
I just figured out the answer to my question #2 with anyDigit function.
I am still trying to create a permanet finaloutput.lis file for question #1.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER CERTIFIED SOLUTION
Ian

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
labradorchik

ASKER
Ian, thank you very much for all your comments and suggestions!!!
Everything worked perfectly! :)