Link to home
Create AccountLog in
Avatar of labradorchik
labradorchikFlag for United States of America

asked on

Creating SAS dataset from an ASCII file

I am trying to create data1 (SAS dataset) from existing file1.dat (ASCII file).

Note: both (input and out file) are located in the dir1 directory. Var5 on file1.dat has variables Var5A, Var5B, Var5C, Var5D, and Var5E included in Var5.

This is what i am trying to do:

1. only extracting records where first three digits of Var5B is not equal 804
2. creating Var10 variable by extracting from Var5 variable (from file1.dat) in the following order:
   Var5B (positioning 10-16)
   Var5C (positioning 17-20)
   Var5D (positioning 21-22)
   Var5E (positioning 23-26)
3. extracting Var6 (positioning 44-45)
4. extracting the following variables:
   Var15 (positioning 81-95)
   Var16 (positioning 96-111)
   Var17 (positioning 112-115)
   Var18 (positioning 150-153)
5. writing total number of input and output records counts to another file (counts.log)

This is what I have for my coding.  I think I got correct everything but #1 and #5.
What needs to be done to implement #1 and #5?


filename file1 'dir1/file1.dat';

data dir1/data1;
    infile file1;
  input @1   Var5B   $char7.
           @8   Var5C   $char4.
           @12 Var5D   $char2.   
           @14 Var5E   $char4.   
           @19 Var6     $char2.
           @21 Var15   $char15.
           @36 Var16   $char15.
           @41 Var17   $char4.
           @45 Var18   $char4.;
run;

Open in new window

SOLUTION
Avatar of theartfuldazzler
theartfuldazzler
Flag of South Africa image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
Avatar of labradorchik

ASKER

theartfuldazzler, thank you very much!

I am not sure why yet, but I am not getting any records written into data1 dataset. Also, It looks like I did not create Var10 in data1 dataset from Var5 (Var5A, Var5B, Var5C, Var5D, and Var5E). You see, Var5 (in the file1.dat) has these five character variables, but when I transfer/create data for Var10 in the data1 I just need to create these last four character variables (Var5B, Var5C, Var5D, and Var5E with starting position 10 to ending position 26). That's what I was trying to do in my example above but I am not sure that is correct.      

I also tried input variables like this:
input  Var5B Var5C Var5D Var5E Var6 Var15 Var16 Var17 Var18;

Open in new window

but that also did not work. I still don't get anything written into the data1 dataset. I checked file1.dat file to make sure all data in present and it is present.
Am I using a wrong way of creating a dataset file from an ASCII file?
Please disregard my first (initial) requirements. I was mistaken in part #2 and #3! This is what I am trying to accomplish:

I am trying to create data1 (SAS dataset) from existing file1.dat (ASCII file).

Note: both (input and out file) are located in the dir1 directory.

1. only extracting records where first three digits of Var10 is not equal 804
2. creating Var10 variable by extracting from Var5 variable (from file1.dat)
   Note: Var5 positioning from 10 to 26 on file1.dat
3. extracting Var6 (positioning 35-45)
4. extracting the following variables:
   Var15 (positioning 81-95)
   Var16 (positioning 96-111)
   Var17 (positioning 112-115)
   Var18 (positioning 150-153)
5. writing total number of input and output records counts to another file (counts.log)

The code I am currently running:

filename file1 'dir1/file1.dat';

data dir1/data1;
    infile file1 end=eof;
    file file2 'dir1/counts.log';

RETAIN Countin 0;
RETAIN Countout 0;
  input @10   Var5     $char16.
           @35   Var6     $char11.   
           @81   Var15   $char15.
           @96   Var16   $char15.
           @112 Var17   $char4.
           @150 Var18   $char4.;
Var10=Var5;

IF substr(Var10,1,3) = '804' then delete;
 ELSE CountOut = CountOut + 1;
         CountIn = CountIn + 1;

if eof then do;
   PUT "# records read: " Countin;
   PUT "# records output: " Countout;
END;
 DROP Countin Countout Var5;
run;

Open in new window


I noticed that if I omit lines 5-8 and 17-25 of code then I correctly write all records from file1.dat to data1 SAS dataset. I think something is wrong with the count file initiation because the very first error that I receive is on the 5th line of code:

 file file2 'dir1/counts.log';

Open in new window


Any suggestions how to implement input and output counts into this program?
SOLUTION
Avatar of Aloysius Low
Aloysius Low
Flag of Singapore image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER CERTIFIED SOLUTION
Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
Hi lowaloysius,
Thank you for your comments and explanations!
Code run fine with no mistakes and both output files (dir1/data1 and dir1/counts.log) where present.

After running your code I noticed a few things:
1. data1 dataset still has 804 records present for Var10.  (I checked all positioning for all variables and they all correct).  
2. Counts are the same on both input and output file and there should be less on the output file (based on minus 804 records) .  

By just looking at your code, I think everything is logically correct but how come 804 records are still on the output file?
Hi Ian,
WOW, one can actually learn all SAS operations from your explanations!
Thank you very much!

Your code worked perfectly well!!

Data1 dataset has all records except 804 records as it should, and counts.log file contains correct counts of all input and output records.

Thank you again!! :)
labradorchick, is there any warnings output from the code?

i didn't test the code posted, just something i wrote off the top of my head as i didn't have access to the software so there might be something which looks right but isn't right ;)
Hi lowaloysius,
No, there were no warnings or errors in your code. Everything run fine but the ouput data1 still had 804 records on it and counts in the counts.log file were exactly the same on input and output file.  

After I run ShannonEE's code everything appeared to be correct so I already closed this question.
Thank you very much for your continued support!! :)
Note to lowaloysius,

In your code
2. filter record from temporary data set (from 1)
     where substr(Var10,1,3) NE '804'
data dir1.data1;
  set temp_data (where = (substr(Var10,1,3) NE '804'));
run;

Open in new window

you have not defined a library for dir1.  My experience is that this will fail.  Maybe on unix systems it may assume  libname dir1 "dir1"; but I a bit wary of that.

The library being undefined  would go someway to explaining why  labradorchik got the results she reported.  Only some parts of your code will fail because sas dataset  dir1/data1 previously produced by labradorchik may still be hanging around!
well ShannonEE, when i use library names especially those already in the code snippets provided by the poster, the assumption is that it is already defined by the poster but not posted... anyway it should have nothing to do with the way the output has turned out, more like an error would be produced saying that the libname is not defined