Link to home
Start Free TrialLog in
Avatar of Pradeep0308
Pradeep0308Flag for India

asked on

Convert OverPunch

Hi All,

I need to do conversion of overpunch:

Code        Change To
}                        p
J                        q
K                        r
L                        s
M                        t
N                        u
O                        v
P                        x
Q                        y
R                        z

I want to do a script which will replace this overpunch value with the correct value. I have included a test file. So for example:

IMD02VALUTA, BUNKER, DRIVMEDELSTILLÄGG                                          
QTY0147 000000000000000                                                        
ROA0112500000000000000422}EUR                                                  
ROA06X2500000000000000422}SEK                                                  
CUX212  SEK4  0000         0000000001000000                                    
ROA2123 00000000000000528}EUR                                                  
ROA26Z2500000000000004008MSEK                                                  
ROA31Z2300000000000005015LSEK                                                  
ROA411  00000000000000106}EUR                                                  
ROA51Z1 00000000000001006RSEK                                                  
PRI01CAL000000000000000      000000000        

The overpunch value will always come in ROA record position 26. So this should be replaced with the above table and then ^r^n and ^n should be removed from the file and should look like the TestOutput.txt file attached.

I want to do this through either a unix/awk script and want to do it on bulk files in a folder. How to achieve this?

Thanks
Pradeep
Test.txt
TestOutput.txt
Avatar of tel2
tel2
Flag of New Zealand image

Hi Pradeep,

> "So this should be replaced with the above table and then ^r^n and ^n should be removed from the file and should look like the TestOutput.txt file attached."
When you say "^r^n and ^n should be removed" are you talking about CR-LF and LF?  If so, why are they still in TestOutput.txt?

> "I want to do this through either a unix/awk script and want to do it on bulk files in a folder. How to achieve this?"
Will you accept a Perl solution?  Many UNIX/Linux systems come with Perl.
How about a sed solution?

tel2
Avatar of Pradeep0308

ASKER

Yes that is right I want to remove CR,LD and LF I had missed it in the TestOutput.txt file it seems.

I am not sure if Perl is supported by our system. Sed solution should be fine as long as I can run them on bulk files kept in a folder?

Thanks
Pradeep
Hi Pradeep,

Please create and upload a version of TestOutput.txt which is completely correct.  This can then be used to ensure the output from our script is the same, and helps to make sure we have understood your requirements.

Please check whether you have Perl loaded on your system by typing:
   perl -v
and tell me what output that command gives you.  If that command doesn't give you an error message, are you happy to have a Perl solution?

What flavour of UNIX/Linux is it, anyway (e.g. AIX, HPUX, Redhat, CentOS, etc)?

tel2
SOLUTION
Avatar of Abhimanyu Suri
Abhimanyu Suri
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi tel2,

I ran this:

perl -v

This is perl, v5.8.8 built for aix-thread-multi

Copyright 1987-2006, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

We are using AIX Unix flavour. I have tried to upload the file again named TestOutput.txt
TestOutput.txt
Thanks Pradeep,

Would you be happy to accept a Perl solution now that we know that your system has Perl?
yes I dont see an issue with that.
OK - I'll have a go at it.

Meanwhile, what do you think about Abhimanyu's solution?
I am testing that but it is missing removing CRLF or LF in the unix script
Yes, I noticed that.
Also, the 2 "grep -i" commands should ideally be just "grep", because case insensitivity isn't required, but that probably wouldn't make any difference with your data.
Hi Pradeep,

OK - back from dinner now.

Looking at the last 4 rows of this:

Code        Change To
O                        v
P                        x
Q                        y
R                        z

Did you intentionally exclude "w" from the "Change To" column?  If so, why?
This is the complete list:

R y
Q x
P w
O v
N u
M t
L s
K r
J q
} p
{ 0
A 1
B 2
C 3
D 4
E 5
F 6
G 7
H 8
I 9
Your original post had "P" being changed to "x".
Your latest post has "P" being changed to "w".
Which is correct?

And do you want me to use this "complete list"?
Hi again Pradeep,

I'm using your "complete list" for the following solutions:

Let's assume you want to process all the .txt files in the current directory.

This command will overwrite all the .txt files with the processed version:
    perl -i -pe 's/^(ROA.{22})(.)(.+)$/$1.$2=~tr|A-R{}|1-9q-y0p|r.$3/e;s/\r?\n//' *.txt

This command will rename each .txt file to from <basename>.txt <basename>.txt.old and write the output to the original name (i.e. <basename>.txt).
    perl -i.old -pe 's/^(ROA.{22})(.)(.+)$/$1.$2=~tr|A-R{}|1-9q-y0p|r.$3/e;s/\r?\n//' *.txt

Now you know why I wanted to use Perl.  Should run much faster than most shell scripts with a "read", etc, too, but that would only be significantly noticeable for large input files.

The output will not be quite like the file you provided because of your changes to your list of substitutions.

If neither of the above are adequate, please let me know why not.

tel2
Handling carriage return is pretty simple,

sed 's/\r//g' parse.log

For better readability you can add another function to the script

func_removecarr()
{
sed 's/\r//g' parse.log > parse_wo_carr.log
# once tested and satisfied add below line
# rm -rf parse.log
# Also you may want to append logfiles with some identifier when handling many files together
}

Here is an example

So, I have introduced couple of CR and LF in the example

sed 's/$/\r\n/g' abc.txt > abc1.txt

cat -v abc1.txt
QTY0147 000000000000000^M

ROA0112500000000000000422pEUR^M

ROA06X2500000000000000422pSEK^M

CUX212^M  SEK4  0000         0000000001000000^M

ROA2123 00000000000000528pEUR^M

ROA411  00000000000000106pEUR^M

PRI01CAL000000000000000      000000000^M

sed 's/\r//g' abc1.txt|awk 'NF'

QTY0147 000000000000000
ROA0112500000000000000422pEUR
ROA06X2500000000000000422pSEK
CUX212  SEK4  0000         0000000001000000
ROA2123 00000000000000528pEUR
ROA411  00000000000000106pEUR
PRI01CAL000000000000000      000000000

Now, coming back to perl vs shell, obviously perl has a much richer library
When it comes to speed, it is not noticeable for couple 100 thousand records.

I am not a perl guy, but it is amazing to see that all of it can be done in a one liner.
I tried sed/awk but still had to put in context and clauses.

Maybe it is time to stop procrastinating about learning perl and actually learn it :)
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Muscle reflex :), fingers automatically go with "grep -i"

For sed |awk 'NF', I misunderstood that everything supposed to be in one long string .

I will definitely start working on PERL

Thanks "tel2" for your inputs
No worries, Abhimanyu.

> "For sed |awk 'NF', I misunderstood that everything supposed to be in one long string."
One long line is what you will get when you remove all \r\n and \n.

How would you do it?

tel2
Probably something like this

sed 's/$/\r\n/g' abc.txt > abc1.txt

cat -v abc1.txt

QTY0147 000000000000000                                                         ^M

ROA0112500000000000000422}EUR                                                   ^M

ROA06X2500000000000000422}SEK                                                   ^M

CUX212  SEK4  0000         0000000001000000                                     ^M

ROA2123 00000000000000528}EUR                                                   ^M

ROA411  00000000000000106}EUR                                                   ^M

PRI01CAL000000000000000      000000000  ^M

sed 's/\r//g' abc1.txt |tr '\n' ' ' > abc2.txt

cat abc2.txt


QTY0147 000000000000000                                                           ROA0112500000000000000422}EUR                                                     ROA06X2500000000000000422}SEK                                                     CUX212  SEK4  0000         0000000001000000                                       ROA2123 00000000000000528}EUR                                                     ROA411  00000000000000106}EUR                                                     PRI01CAL000000000000000      000000000
That's close, Abhimanyu, but the problems with:
    sed 's/\r//g' abc1.txt |tr '\n' ' ' > abc2.txt
are:
1. tr will replace '\n' with a space.  Try this instead: tr -d '\n'
2. sed will remove ALL '\r' chars even if they don't have a '\n' after them.  In most situations this won't be a problem.  sed may be able to handle it, but I'm not sure how, so I'd use Perl.
Having thought about it more, Abhimanyu...

A simpler way to do this:
    sed 's/\r//g' abc1.txt |tr '\n' ' '
would be this:
    tr -d '\r\n' <abc1.txt
which deletes any of the listed characters.

But if you want to avoid removing any '\r' which are not followed by '\n', you could do:
    sed 's/\r$//g' abc1.txt |tr '\n' ' '   # Note the added '$'
Thanks both. I have tested both solutions and they are working as expected. Appreciate both of your assistance.
Thank you both. I have made a split of the points accordingly.
Welcome :)

It was a good learning for me too, thanks to "tel2"
A pleasure doing business with you all.  It was an interesting task.

Abhimanyu, my last command:
    sed 's/\r$//g' abc1.txt | tr '\n' ' '   # Note the added '$'
should have read:
    sed 's/\r$//g' abc1.txt | tr -d '\n'   # Note the added '$'

Also, re doing this kind of task in bash, you shouldn't need to add more code (conditions) for each match (substitution) that you need to cater for.  You should just be able to grep it from your match.txt file, etc.  A faster way could be to store that list of pairs from match.txt in a hash (i.e. associative array) at the beginning of the script, then just look up the hash after that.  I've never used hash's in shell scripts, but I use them in Perl a lot as they are a very convenient/powerful data structure.  Have you used them in bash?