labradorchik
asked on
How to output SAS listing of variables with ASCII special characters?
I have three SAS datasets (data1, data2, data3) with variables Var1, Var2, Var3, and Var4 in each dataset. I would like to output a list file (print listing) of the unique Var1 and a list file for unique Var3 from all datasets (data1, data2, data3), that contain a special character (standard ASCII special characters, examples: ? ( ) / ; etc…). Then, just to add Var5 to each list to indicate the source of each record from those datasets. Duplicate records, that is records on more than one list, will be included.
Just a Note:
if data1 then Var5 is A
if data2 then Var5 is B
if data3 then Var5 is C
How can this be done?
Just a Note:
if data1 then Var5 is A
if data2 then Var5 is B
if data3 then Var5 is C
How can this be done?
ASKER
Hi Ian,
Thank you very much for your explanations and examples!!
I am having a hard time grasping "INDSNAME=Var5"
I read on line what INDSNAME code does and I am not sure if this is what I am looking for.
You also mentioned that there is a way to "change the logic to detect the end of each dataset as read in". May be this would work better?
Because I know for sure that I can not change the names of my datasets to A, B, and C unless it can be done just as a temporary. Datasets' names should have the same names as before.
At this moment I am getting the following error in the proc SQL portion of the code:
Thank you very much for your explanations and examples!!
I am having a hard time grasping "INDSNAME=Var5"
I read on line what INDSNAME code does and I am not sure if this is what I am looking for.
You also mentioned that there is a way to "change the logic to detect the end of each dataset as read in". May be this would work better?
Because I know for sure that I can not change the names of my datasets to A, B, and C unless it can be done just as a temporary. Datasets' names should have the same names as before.
At this moment I am getting the following error in the proc SQL portion of the code:
data combined;
set data1
data2
data3
INDSNAME=var5;
run;
proc sql;
title "List of records with special characters in VAR1";
select unique
var1,
var5
from combined
where (anyPunct(var1) > 0);
ERROR: The following columns were not found in the contributing tables: var5.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
title "List of records with special characters in VAR3";
select unique
var3,
var5
from combined
where (anyPfunct(var3) > 0);
ERROR: The following columns were not found in the contributing tables: var5.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
NOTE: The SAS System stopped processing this step because of errors.
quit;
title;
ASKER
Is this a proper form of getting name values from each dataset?
data combined;
set data1 (in=a)
data2 (in=b)
data3 (in=c)
if a then from=A;
if b then from=B;
if c then from=C;
run;
Hi there labradorchik,
That is a good way of doing it. Slight corrections below.
BTW: The INDSNAME=Var5" will create a variable Var5 and set it to the name of the dataset used for each observation.
With the variables that SAS creates (like Var5 in first example and a, b, c in your code), those variables are not automatically put into the "output vector". If we want them we must say so! My earlier example should have had the line
to ensure that the Var5 was carried over into the output dataset. The PROC SQL could not find the Var5 because the data step calculated it but omitted it from the output record. - Sorry for my mistake.
Ian
That is a good way of doing it. Slight corrections below.
data combined (keep = var1 var3 var5);
length Var5 $ 1; ** our new variable that shows where obs came from **;
** Read in the 3 datasets one after the other **;
set data1 (in=a)
data2 (in=b)
data3 (in=c) ;
** Set Var5 to indicate where observation came from **;
Var5 = "*"; ** Assume an error somewhere - just in case **;
if a then Var5 = "A";
if b then Var5 = "B";
if c then Var5 = "C";
run;
BTW: The INDSNAME=Var5" will create a variable Var5 and set it to the name of the dataset used for each observation.
With the variables that SAS creates (like Var5 in first example and a, b, c in your code), those variables are not automatically put into the "output vector". If we want them we must say so! My earlier example should have had the line
data combined (keep=Var1 var3 var5);
to ensure that the Var5 was carried over into the output dataset. The PROC SQL could not find the Var5 because the data step calculated it but omitted it from the output record. - Sorry for my mistake.
Ian
ASKER
Thank you very much Ian!!
Below code worked as it should.
Just two questions:
1) Currently I can see print listing which is output from the Proc SQL but there is no permanent file created for the listing. How can I send my print listing for example to finaloutput.lis file?
2) What option may I use to do the same code procedure for unique Va1 and unique Var3 but just if these variables contain a number?
Note: originally Var1 and Var3 supposed to contain just letters (names) but to catch errors I would like to check all names for special characters and numbers as well.
Below code worked as it should.
data combined (keep = Var1 Var3 Var5);
length Var5 $ 1;
set data1 (in=a)
data2 (in=b)
data3 (in=c);
Var5 = "*";
if a then Var5 = "A";
if b then Var5 = "B";
if c then Var5 = "C";
run;
proc sql;
title "List of records with special characters in VAR1";
select unique
Var1,
Var5
from combined
where (anyPunct(Var1) > 0);
title "List of records with special characters in VAR3";
select unique
Var3,
Var5
from combined
where (anyPunct(Var3) > 0);
quit;
title;
Just two questions:
1) Currently I can see print listing which is output from the Proc SQL but there is no permanent file created for the listing. How can I send my print listing for example to finaloutput.lis file?
2) What option may I use to do the same code procedure for unique Va1 and unique Var3 but just if these variables contain a number?
Note: originally Var1 and Var3 supposed to contain just letters (names) but to catch errors I would like to check all names for special characters and numbers as well.
ASKER
Update:
I just figured out the answer to my question #2 with anyDigit function.
I am still trying to create a permanet finaloutput.lis file for question #1.
I just figured out the answer to my question #2 with anyDigit function.
I am still trying to create a permanet finaloutput.lis file for question #1.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Ian, thank you very much for all your comments and suggestions!!!
Everything worked perfectly! :)
Everything worked perfectly! :)
The code below is along the lines of something that should work for you. var5 has a bit more than what you asked for. If the A, B, C is really important that it shouldn't be hard to change the code to get that. Either change the names of the dataset data1, data2, date3 to be A, B, C or change the logic to detect the end of each dataset as read in.
It may be useful to
1.
add in an observation counter (position within source data set); and2.
sort the output listing.3.
use a different function call if the "punctuation" character set is not what you want. Check the sas documentation for function ANYPUNCTOpen in new window
Regards,
Ian