asked on

Convert OverPunch

Hi All,

I need to do conversion of overpunch:

Code        Change To
}                        p
J                        q
K                        r
L                        s
M                        t
N                        u
O                        v
P                        x
Q                        y
R                        z

I want to do a script which will replace this overpunch value with the correct value. I have included a test file. So for example:

IMD02VALUTA, BUNKER, DRIVMEDELSTILLÄGG
QTY0147 000000000000000
ROA0112500000000000000422}EUR
ROA06X2500000000000000422}SEK
CUX212 SEK4 0000 0000000001000000
ROA2123 00000000000000528}EUR
ROA26Z2500000000000004008MSEK
ROA31Z2300000000000005015LSEK
ROA411 00000000000000106}EUR
ROA51Z1 00000000000001006RSEK
PRI01CAL000000000000000 000000000

The overpunch value will always come in ROA record position 26. So this should be replaced with the above table and then ^r^n and ^n should be removed from the file and should look like the TestOutput.txt file attached.

I want to do this through either a unix/awk script and want to do it on bulk files in a folder. How to achieve this?

Thanks
Pradeep
Test.txt
TestOutput.txt

tel2

Hi Pradeep,

> "So this should be replaced with the above table and then ^r^n and ^n should be removed from the file and should look like the TestOutput.txt file attached."
When you say "^r^n and ^n should be removed" are you talking about CR-LF and LF? If so, why are they still in TestOutput.txt?

> "I want to do this through either a unix/awk script and want to do it on bulk files in a folder. How to achieve this?"
Will you accept a Perl solution? Many UNIX/Linux systems come with Perl.
How about a sed solution?

tel2

Pradeep0308

ASKER

Yes that is right I want to remove CR,LD and LF I had missed it in the TestOutput.txt file it seems.

I am not sure if Perl is supported by our system. Sed solution should be fine as long as I can run them on bulk files kept in a folder?

Thanks
Pradeep

tel2

Hi Pradeep,

Please create and upload a version of TestOutput.txt which is completely correct. This can then be used to ensure the output from our script is the same, and helps to make sure we have understood your requirements.

Please check whether you have Perl loaded on your system by typing:
perl -v
and tell me what output that command gives you. If that command doesn't give you an error message, are you happy to have a Perl solution?

What flavour of UNIX/Linux is it, anyway (e.g. AIX, HPUX, Redhat, CentOS, etc)?

tel2

SOLUTION

Abhimanyu Suri

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Pradeep0308

ASKER

Hi tel2,

I ran this:

perl -v

This is perl, v5.8.8 built for aix-thread-multi

Copyright 1987-2006, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

We are using AIX Unix flavour. I have tried to upload the file again named TestOutput.txt
TestOutput.txt

tel2

Thanks Pradeep,

Would you be happy to accept a Perl solution now that we know that your system has Perl?

Pradeep0308

ASKER

yes I dont see an issue with that.

tel2

OK - I'll have a go at it.

Meanwhile, what do you think about Abhimanyu's solution?

Pradeep0308

ASKER

I am testing that but it is missing removing CRLF or LF in the unix script

tel2

Yes, I noticed that.
Also, the 2 "grep -i" commands should ideally be just "grep", because case insensitivity isn't required, but that probably wouldn't make any difference with your data.

tel2

Hi Pradeep,

OK - back from dinner now.

Looking at the last 4 rows of this:

Code Change To
O v
P x
Q y
R z

Did you intentionally exclude "w" from the "Change To" column? If so, why?

Pradeep0308

ASKER

This is the complete list:

R y
Q x
P w
O v
N u
M t
L s
K r
J q
} p
{ 0
A 1
B 2
C 3
D 4
E 5
F 6
G 7
H 8
I 9

tel2

Your original post had "P" being changed to "x".
Your latest post has "P" being changed to "w".
Which is correct?

And do you want me to use this "complete list"?

tel2

Hi again Pradeep,

I'm using your "complete list" for the following solutions:

Let's assume you want to process all the .txt files in the current directory.

This command will overwrite all the .txt files with the processed version:
perl -i -pe 's/^(ROA.{22})(.)(.+)$/$1.$2=~tr|A-R{}|1-9q-y0p|r.$3/e;s/\r?\n//' *.txt

This command will rename each .txt file to from <basename>.txt <basename>.txt.old and write the output to the original name (i.e. <basename>.txt).
perl -i.old -pe 's/^(ROA.{22})(.)(.+)$/$1.$2=~tr|A-R{}|1-9q-y0p|r.$3/e;s/\r?\n//' *.txt

Now you know why I wanted to use Perl. Should run much faster than most shell scripts with a "read", etc, too, but that would only be significantly noticeable for large input files.

The output will not be quite like the file you provided because of your changes to your list of substitutions.

If neither of the above are adequate, please let me know why not.

tel2

Abhimanyu Suri

Handling carriage return is pretty simple,

sed 's/\r//g' parse.log

For better readability you can add another function to the script

func_removecarr()
{
sed 's/\r//g' parse.log > parse_wo_carr.log
# once tested and satisfied add below line
# rm -rf parse.log
# Also you may want to append logfiles with some identifier when handling many files together
}

Here is an example

So, I have introduced couple of CR and LF in the example

sed 's/$/\r\n/g' abc.txt > abc1.txt

cat -v abc1.txt
QTY0147 000000000000000^M

ROA0112500000000000000422pEUR^M

ROA06X2500000000000000422pSEK^M

CUX212^M SEK4 0000 0000000001000000^M

ROA2123 00000000000000528pEUR^M

ROA411 00000000000000106pEUR^M

PRI01CAL000000000000000 000000000^M

sed 's/\r//g' abc1.txt|awk 'NF'

QTY0147 000000000000000
ROA0112500000000000000422pEUR
ROA06X2500000000000000422pSEK
CUX212 SEK4 0000 0000000001000000
ROA2123 00000000000000528pEUR
ROA411 00000000000000106pEUR
PRI01CAL000000000000000 000000000

Now, coming back to perl vs shell, obviously perl has a much richer library
When it comes to speed, it is not noticeable for couple 100 thousand records.

I am not a perl guy, but it is amazing to see that all of it can be done in a one liner.
I tried sed/awk but still had to put in context and clauses.

Maybe it is time to stop procrastinating about learning perl and actually learn it :)

ASKER CERTIFIED SOLUTION

tel2

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Abhimanyu Suri

Muscle reflex :), fingers automatically go with "grep -i"

For sed |awk 'NF', I misunderstood that everything supposed to be in one long string .

I will definitely start working on PERL

Thanks "tel2" for your inputs

tel2

No worries, Abhimanyu.

> "For sed |awk 'NF', I misunderstood that everything supposed to be in one long string."
One long line is what you will get when you remove all \r\n and \n.

How would you do it?

tel2

Abhimanyu Suri

Probably something like this

sed 's/$/\r\n/g' abc.txt > abc1.txt

cat -v abc1.txt

QTY0147 000000000000000 ^M

ROA0112500000000000000422}EUR ^M

ROA06X2500000000000000422}SEK ^M

CUX212 SEK4 0000 0000000001000000 ^M

ROA2123 00000000000000528}EUR ^M

ROA411 00000000000000106}EUR ^M

PRI01CAL000000000000000 000000000 ^M

sed 's/\r//g' abc1.txt |tr '\n' ' ' > abc2.txt

cat abc2.txt

QTY0147 000000000000000 ROA0112500000000000000422}EUR ROA06X2500000000000000422}SEK CUX212 SEK4 0000 0000000001000000 ROA2123 00000000000000528}EUR ROA411 00000000000000106}EUR PRI01CAL000000000000000 000000000

tel2

That's close, Abhimanyu, but the problems with:
sed 's/\r//g' abc1.txt |tr '\n' ' ' > abc2.txt
are:
1. tr will replace '\n' with a space. Try this instead: tr -d '\n'
2. sed will remove ALL '\r' chars even if they don't have a '\n' after them. In most situations this won't be a problem. sed may be able to handle it, but I'm not sure how, so I'd use Perl.

tel2

Having thought about it more, Abhimanyu...

A simpler way to do this:
sed 's/\r//g' abc1.txt |tr '\n' ' '
would be this:
tr -d '\r\n' <abc1.txt
which deletes any of the listed characters.

But if you want to avoid removing any '\r' which are not followed by '\n', you could do:
sed 's/\r$//g' abc1.txt |tr '\n' ' ' # Note the added '$'

Pradeep0308

ASKER

Thanks both. I have tested both solutions and they are working as expected. Appreciate both of your assistance.

Pradeep0308

ASKER

Thank you both. I have made a split of the points accordingly.

Abhimanyu Suri

Welcome :)

It was a good learning for me too, thanks to "tel2"

tel2

A pleasure doing business with you all. It was an interesting task.

Abhimanyu, my last command:
sed 's/\r$//g' abc1.txt | tr '\n' ' ' # Note the added '$'
should have read:
sed 's/\r$//g' abc1.txt | tr -d '\n' # Note the added '$'

Also, re doing this kind of task in bash, you shouldn't need to add more code (conditions) for each match (substitution) that you need to cater for. You should just be able to grep it from your match.txt file, etc. A faster way could be to store that list of pairs from match.txt in a hash (i.e. associative array) at the beginning of the script, then just look up the hash after that. I've never used hash's in shell scripts, but I use them in Perl a lot as they are a very convenient/powerful data structure. Have you used them in bash?