Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 313
  • Last Modified:

fixed width file problem

how to do the following in perl or awk, if state is MN then insert value "C" into code field, if state is TX then insert "T" in code field.  if state doesn't match any critiera than put "O" into code filed.  the file is fixed width.
 
fname  lname   st         code
chris    bauer    MN            
john     smith     TX              
jim       brown    AL    
 
thanks,
0
bodi77
Asked:
bodi77
  • 9
  • 8
  • 2
  • +1
1 Solution
 
ozoCommented:
perl -pe '(2..0)&&s/((MN)|TX)?[^\n\w]*$/$1\t${[qw(C T O)]}[!$1+!$2]/' file
0
 
bodi77Author Commented:
it could be, but how does it know what fixed postion to look for the MN and TX?  This is a fixed width file.   it has to be certian postion.  
0
 
ghostdog74Commented:
awk:
awk 'NR>1 && $3 == "MN"{ printf "%s\tC\n",$0;next }
     NR>1 && $3 == "TX"{ printf "%s\tO\n",$0;next }
     $3 !~ /MX|TX/ { print }
' file

Open in new window

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
ozoCommented:
What position does it have to be in?
0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
Even if the file is "fixed width", it's easier just to work on the colum number.
Therefore, the Perl and awk solutions simply take the first three words and
add a fourth (C, T or O).
The (default) field separators are "white spaces", blank(s) and tab(s)
0
 
bodi77Author Commented:
that's a problem because the data was just and exmaple, between some columns there is no space.  
so it would look like this
john            smith    mn  
christopherbauer    tx
bill     swcharinagormn
0
 
ozoCommented:
perl -pe '(2..0)&&s/(?<=^.{20})(((MN)|TX)|..)/$1\t${[qw(C T O)]}[!$2+!$3]/i' file
0
 
bodi77Author Commented:
that works, but have another question about it.  
so I want MN, IA, WI  to = C
and TX, AL, NV to = T
and anything else to = O

I wrote an entire script that has 60 lines to acomplish what you are doing in one line.  can you explain what it is you are wrting?  thanks
0
 
bodi77Author Commented:
ozo,
I guess I only need to know a few things about your one liner.  

perl -pe '(2..0)&&s/(?<=^.{20})(((MN)|TX)|..)/$1\t${[qw(C T O)]}[!$2+!$3]/i' file
the 20 is the state field starts, where do you tell it how long the field is?
where is it that you are defining what postion to put the C, T , O

thanks,
0
 
bodi77Author Commented:
to clerify,  read postion 20, two characters.  If the states are MN, IA or WI  write to postion 30, one character, a "C".   if states are TX, AL, NV write into postion 30 a T
and any other vaule in position 20, two characters,  write and "O" into postion 30.
0
 
ozoCommented:
perl -pe '(2..0)&&s/(?<=^.{20})(((MN|IA|WI)|TX|AL|NV)|..)(.{8}).?/$1$4${[qw(C T O)]}[!$2+!$3]/i' file
Assuming there is already something in positions 22-29
0
 
ozoCommented:
Also assuming that you don't want to add the letter to the first line
fname  lname   st         code
0
 
bodi77Author Commented:
first row was only for an exmaple, so yes I do want to do the first row in production.
and I will be using this for multiple files with different formats, the only thing I will need to change is the postion of the lookup and the postion of the insert.  I think the postion of the lookup is the 20 in your one liner.  but do you define the postion of the insert (postion 30)  there will actully be data after the postion 31.  sorry if I have not been clear.  

01234567890123456789012345678901234567890
christophersbauer          MNxxxxxxxxT9559552087
0
 
ghostdog74Commented:

awk 'BEGIN{OFS=FS=""}
($19=="M" && $20=="N") || ( $19=="W" && $20=="I" ) || ( $19=="I" && $20=="A" ){
    $30="C"
}
($19=="T" && $20=="X") || ( $19=="A" && $20=="L" ) || ( $19=="N" && $20=="V" ){
    $30="T"
}
1' file

Open in new window

0
 
ozoCommented:
then omit the (2..0)&&
perl -pe 'substr($_,30,1)=${[qw(O T C)]}[grep $_,substr($_,20,2)=~/((MN|IA|WI)|TX|AL|NV)/]' file
0
 
bodi77Author Commented:
ozo the logic works great but how can I get it to change the values in the file instead of displaying on the screen.  

thanks,
C
0
 
ozoCommented:
perl -i.bak -pe 'substr($_,30,1)=${[qw(O T C)]}[grep $_,substr($_,20,2)=~/((MN|IA|WI)|TX|AL|NV)/]' file
0
 
bodi77Author Commented:
ozo, that's amazing!!! thanks alot.  
0
 
bodi77Author Commented:
Ozo, the script worked great for a test file, but when I tried on a production file that has 300,000 records.  I recieved a out of memory error

and also will the script work a a ebsidec file?
0
 
ozoCommented:
records are separated by a "\n" in ascii, which is a byte with value 0x0A which is RPT in ebcdic
If that character is not found, it will try to read the entire file as as single record
you could redefine the record separator and change the characters in the qw and // to the appropriate ebcdic characters, or see some of the considerations in
perldoc perlebcdic
if the records are fixed length, perhaps it doesn't have a special character to separate records,
in which case you might set
$/ = \lengthofrecord
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 9
  • 8
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now