Link to home
Start Free TrialLog in
Avatar of labradorchik
labradorchikFlag for United States of America

asked on

How to maintain the format and contents of variables from ASCII file to SAS dataset?

Hi guys,
I am trying to create an output record in data1 for every input record in file1.  Each output record would contain all input record variables and I would have to maintain the format and contents of all variables (i.e. right/left justified, embedded spaces etc.).  Is RIGHT option is correct to use in this case or it has to be incorporated with a PUT statement somehow?  I guess my question is what would I do of those variables that are currently left justified to make them right justified and also for those variable with embedded spaces?

Input: file1.dat  
Note: layout is given for this file. Some variables are right and some are left justified. Some also have embedded spaces.

Output: data1.sas7bdat
Note: This output SAS dataset file contains the same data as the input file1. All fields are character.

data lib1.[b]data1[/b];
 	infile lib1.[b]file1[/b];    
 	input var1 $ 1-19 
 		  var2 $ 20-34 
 		  var3 $ 35-53 
 		  var4 $ 54-68 
 		  var5 $ 69-93
 		  var6 $ 94-98 
 		  var7 $ 99-103 
 		  var8 $ 104-105;

   var5=right(var5);  /* justifying var5 variable to the right */ 
   var8=right(var8);  /* justifying var8 variable to the right */ 
run;

Open in new window

Avatar of ozo
ozo
Flag of United States of America image

cp file1.dat data1.sas7bdat
since the file is a fixed width file, if you specify the start and end positions correctly, and everything being imported as text, what is left-aligned will stay left-aligned, and what is right-aligned will stay right-aligned.

however, the data may not appear as right-aligned due to the character width (space character taking up less space then say the letter 'a')
Avatar of labradorchik

ASKER

Ozo, thank you for your comment but your way would not work. It has to be done in the SAS data step.

Lowaloysius, as always, thank you for your comments!
I just checked and I am specifying my start and end positions correctly based on the layout of the file, so what am I doing wrong?  

I am still getting var5 and var8 left justified on my output data1.sas7bdat data set but on the original file1.dat input file's layout these are right justified.  Is below format correct when justifying by force variables to the right or it can be done some other way?
var5=right(var5);

Open in new window

yes, the RIGHT function would work also..

if it is not appearing correctly, maybe you could check the length of the fields - are they as wide as you expect? e.g. var5 is defined as position 69-93, so the length of the field should be 25. otherwise, it could be the cause of your problem right now.

if the field lengths are not defined correctly, you can add the LENGTH statement to set it right.
Thank you for your comments, lowaloysius!
I checked all my positions and seams like all correct.  I think I was not clear in my comments. What I am trying to do is to make all data in var5 and var8 right justified and zero filled, All other variables can stay left justified as they are but va5 and var8 I am trying to make right justified as well as zero filled. Please see below if this makes sense?

data lib1.[b]data1[/b];
 	infile lib1.[b]file1[/b];    
 	input var1 $ 1-19 
 		  var2 $ 20-34 
 		  var3 $ 35-53 
 		  var4 $ 54-68 
 		  var5 $ 69-70
 		  var6 $ 71-98  
  		var7 $ 99-103 
 		  var8 $ 104-105;

  var5=right(put(var5, z2.));
  var8=right(put(var8, z2.));
  
run;

Open in new window


With the code above I am getting ERROR saying that format z2. is not recognized.
that is because var5 and var8 are read in as character fields.. you will need to convert them into numeric first before you can use the zn. format:
var5=right(put(input(var5, best.), z2.));

you would like to have a leading zero in this case if the number is single digit?
Thank you!!
Yes, I think leading zero would be nice, so the data would be in character format, right justified, zero filled and look like this below for var5 and var8:

var5   var8
    56       03
    01       93
    05       08


So, code should like below?

data lib1.data1;
 	infile lib1.file1;    
 	input var1 $ 1-19 
 		  var2 $ 20-34 
 		  var3 $ 35-53 
 		  var4 $ 54-68 
 		  var5 $ 69-70
 		  var6 $ 71-98  
  		  var7 $ 99-103 
 		  var8 $ 104-105;

var5=right(put(input(var5, best.), z2.));
var8=right(put(input(var8, best.), z2.));  
run;

Open in new window

yes, that should work, but as usual, post any error messages you got should it be otherwise :)
No errors this time and both variables right justified and zero filled but now I am getting var5 and var8 as numeric variables on my output dataset instead of character variables.
What if I input var5 and var8 as numeric from the beginning as below.  Would it make things easier to make these two variables right justified, zero filled as well as make them character as the final format on the output dataset?

data lib1.data1;
 	infile lib1.file1;    
 	input var1 $ 1-19 
 		  var2 $ 20-34 
 		  var3 $ 35-53 
 		  var4 $ 54-68 
 		  var5 69-70
 		  var6 $ 71-98  
  		  var7 $ 99-103 
 		  var8 104-105;

var5=right(put(var5, best.), z2.));
var8=right(put(var8, best.), z2.));  
run;

Open in new window

With the code above I am getting the following error:  The z2. option is not recognized and will be ignored.
that's odd and isn't what i'm expecting.. you mean you did a proc contents and the field is now of type numeric, or displays only numeric data (i.e. there used to be character data, but now character data is gone, only numeric data is left?)

if you wish to convert data read in as numeric instead of text (which I don't recommend, as it could lead to loss of data), you need to use a length statement to indicate that it's of type numeric, use the informat statement and leave the input statement as it is. removing the $ doesn't help if i'm not wrong.. however, i'll need a SAS environment to test it out, which i don't have now.. chances are the field var5 and var8 are still of type character (i.e. you didn't actually read in the data as numeric), hence you got that error (recall you saw the exact same error message earlier? ;)
Thank you!
All these variables in the dataset have numeric data but I am talking specifically about the format of var5 and var8 - these are still numeric and not character type.  I really would like to read all variables as character variables. I tried different ways of reading data and for some reason these two variables' fields still appear as numeric type after I apply right justification, zero filled format and switch back from numeric to character type.
you are really reading them as character values (as evidenced by the error message that z2. format is not recognized...

however, what puzzles me, and i'll need an environment to test out is it turning into a numeric variable - it's unfathomable! the put function would have converted any numeric value to string, so i'm wondering what's going on.. i'll need a SAS environment to test that out, but only in about 8 hours time (it's nearly midnight here now).

just to confirm - when you run var5=right(put(input(var5, best.), z2.)); what values do you see on screen? does "5     " appear as "    05"? or "     5"?
what if you run var5=put(input(var5, best.), z2.); what values do you now see? "05"?
For #1 (reading var5 and var8 as character type)

data lib1.data1;
 	infile lib1.file1;    
 	input var1 $ 1-19 
 		  var2 $ 20-34 
 		  var3 $ 35-53 
 		  var4 $ 54-68 
 		  var5 $ 69-70
 		  var6 $ 71-98  
  		  var7 $ 99-103 
 		  var8 $ 104-105;

var5=right(put(input(var5, best.), z2.));
var8=right(put(input(var8, best.), z2.));  
run;

Open in new window


I see var5 and var8 variables as below:
var5        var8
"05    "     "06    "

So, both variables are zero filled and character type but not right justified. Also, no errors or warnings found in the above code.


For #2 (reading var5 and var8 as numeric type)

data lib1.data1;
 	infile lib1.file1;    
 	input var1 $ 1-19 
 		  var2 $ 20-34 
 		  var3 $ 35-53 
 		  var4 $ 54-68 
 		  var5 69-70
 		  var6 $ 71-98  
  		  var7 $ 99-103 
 		  var8 104-105;

var5=right(put(input(var5, best.), z2.));
var8=right(put(input(var8, best.), z2.));  
run;

Open in new window


I see var5 and var8 variables as below:
var5        var8
"     5"     "     6"

So, both variables are right justified but not zero filled and not character.  Both variables have numeric type format on the output dataset.  Also, no errors or warnings found in the above code.

For #3 (reading var5 and var8 as character type and without RIGHT option)

data lib1.data1;
 	infile lib1.file1;    
 	input var1 $ 1-19 
 		  var2 $ 20-34 
 		  var3 $ 35-53 
 		  var4 $ 54-68 
 		  var5 $69-70
 		  var6 $ 71-98  
  		  var7 $ 99-103 
 		  var8 $104-105;

var5=put(input(var5, best.), z2.);
var8=put(input(var8, best.), z2.);  
run;

Open in new window


I see var5 and var8 variables as below:
var5        var8
"05    "     "06    "

So, both variables are zero filled and character type but not right justified. Also, no errors or warnings found in the above code.

I guess results for #1 and #3 are the same.

Note: this is my first program that I am running my SAS code in Windows environment. I used to run my SAS code only in Unix environment.  But it is still SAS 9.4 and there should be no differences, correct?
ASKER CERTIFIED SOLUTION
Avatar of Aloysius Low
Aloysius Low
Flag of Singapore image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you!
I just tried code below with reading var5 and var8 as character type and adding RIGHT option on a different line:

data lib1.data1;
 	infile lib1.file1;    
 	input var1 $ 1-19 
 		  var2 $ 20-34 
 		  var3 $ 35-53 
 		  var4 $ 54-68 
 		  var5 $69-70
 		  var6 $ 71-98  
  		  var7 $ 99-103 
 		  var8 $104-105;

var5=put(input(var5, best.), z2.);
var5=right(var5);

var8=put(input(var8, best.), z2.);  
var8=right(var8);

run;

Open in new window



I see var5 and var8 variables as below:
var5        var8
"05    "     "06    "

So, still both variables are zero filled and character type but still not right justified. Above code also has no errors or warnings.

I am starting to believe that there is a problem with SAS for Windows environment because I think the same concept used to work in the Unix environment SAS.
Is there a way to make these variables right justified if I do right justification in a different/another data step?
I just noticed that the length for var5 and var8 Haas been limited to 2 characters.. in that case there's no space on the right to justify the data to the right.. u need to define a longer length to see that
Based on the layout of the input ASSCI file both variables' format is only 2 characters in length so I am not sure if I can change that because there are set start and end positions for these two variables i.e.
var4 $ 54-68 
var5 $ 69-70
var6 $ 71-98  
var7 $ 99-103 
var8 $ 104-105

Open in new window


How can I define these variables with more than two characters if these are set for 2 characters in length?
you could try the length statement before the input to define var5 and var8 (not too sure if it will work), or otherwise, create var5 and var8 with temporary names, and define the length of the actual var5 and var8 after the input statement and drop the temporary var5 and var8 variables as the final step
Thank you! When I place length statement right before input as below I can see that var5 and var8 are right justified, zero filled, and these fields are character type but in this case both of these variables are appearing as first variables on the output dataset, which is not the order these should be appearing.

length var5 $8 var8 $8;
 input ...

Open in new window


I guess at this time, my question is:
Since I can see already that both variables are actually right justified just it is not visible can I leave the code as below, without the length statement?  It will not be visible when anyone looks at the output dataset that these fields are right justified but we actually know that these are still right justified :) Does that make sense?

data lib1.data1;
 	infile lib1.file1;    
 	input var1 $ 1-19 
 		  var2 $ 20-34 
 		  var3 $ 35-53 
 		  var4 $ 54-68 
 		  var5 $69-70
 		  var6 $ 71-98  
  		  var7 $ 99-103 
 		  var8 $104-105;

var5=put(input(var5, best.), z2.);
var5=right(var5);

var8=put(input(var8, best.), z2.);  
var8=right(var8);

run; 

Open in new window

well that would be your call (but it's quite a long shot isn't it?) ;)

if you wish to have the columns in place, then you'll need to specify the columns one by one in the length statement before the input statement, or you'll need to write an alter table statement to change the length before running the code to right-align - there are endless possibilities :)
Lowaloysius, thank you very much for all your comments and suggestions!! It looks like I will running the last code example.
just note that if you were running that code, the var5 = right(var5); essentially have no effect.. kind of like a redundant code ;)