Pradeep0308
asked on
Convert OverPunch
Hi All,
I need to do conversion of overpunch:
Code Change To
} p
J q
K r
L s
M t
N u
O v
P x
Q y
R z
I want to do a script which will replace this overpunch value with the correct value. I have included a test file. So for example:
IMD02VALUTA, BUNKER, DRIVMEDELSTILLÄGG
QTY0147 000000000000000
ROA0112500000000000000422} EUR
ROA06X2500000000000000422} SEK
CUX212 SEK4 0000 0000000001000000
ROA2123 00000000000000528}EUR
ROA26Z2500000000000004008M SEK
ROA31Z2300000000000005015L SEK
ROA411 00000000000000106}EUR
ROA51Z1 00000000000001006RSEK
PRI01CAL000000000000000 000000000
The overpunch value will always come in ROA record position 26. So this should be replaced with the above table and then ^r^n and ^n should be removed from the file and should look like the TestOutput.txt file attached.
I want to do this through either a unix/awk script and want to do it on bulk files in a folder. How to achieve this?
Thanks
Pradeep
Test.txt
TestOutput.txt
I need to do conversion of overpunch:
Code Change To
} p
J q
K r
L s
M t
N u
O v
P x
Q y
R z
I want to do a script which will replace this overpunch value with the correct value. I have included a test file. So for example:
IMD02VALUTA, BUNKER, DRIVMEDELSTILLÄGG
QTY0147 000000000000000
ROA0112500000000000000422}
ROA06X2500000000000000422}
CUX212 SEK4 0000 0000000001000000
ROA2123 00000000000000528}EUR
ROA26Z2500000000000004008M
ROA31Z2300000000000005015L
ROA411 00000000000000106}EUR
ROA51Z1 00000000000001006RSEK
PRI01CAL000000000000000 000000000
The overpunch value will always come in ROA record position 26. So this should be replaced with the above table and then ^r^n and ^n should be removed from the file and should look like the TestOutput.txt file attached.
I want to do this through either a unix/awk script and want to do it on bulk files in a folder. How to achieve this?
Thanks
Pradeep
Test.txt
TestOutput.txt
ASKER
Yes that is right I want to remove CR,LD and LF I had missed it in the TestOutput.txt file it seems.
I am not sure if Perl is supported by our system. Sed solution should be fine as long as I can run them on bulk files kept in a folder?
Thanks
Pradeep
I am not sure if Perl is supported by our system. Sed solution should be fine as long as I can run them on bulk files kept in a folder?
Thanks
Pradeep
Hi Pradeep,
Please create and upload a version of TestOutput.txt which is completely correct. This can then be used to ensure the output from our script is the same, and helps to make sure we have understood your requirements.
Please check whether you have Perl loaded on your system by typing:
perl -v
and tell me what output that command gives you. If that command doesn't give you an error message, are you happy to have a Perl solution?
What flavour of UNIX/Linux is it, anyway (e.g. AIX, HPUX, Redhat, CentOS, etc)?
tel2
Please create and upload a version of TestOutput.txt which is completely correct. This can then be used to ensure the output from our script is the same, and helps to make sure we have understood your requirements.
Please check whether you have Perl loaded on your system by typing:
perl -v
and tell me what output that command gives you. If that command doesn't give you an error message, are you happy to have a Perl solution?
What flavour of UNIX/Linux is it, anyway (e.g. AIX, HPUX, Redhat, CentOS, etc)?
tel2
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi tel2,
I ran this:
perl -v
This is perl, v5.8.8 built for aix-thread-multi
Copyright 1987-2006, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
We are using AIX Unix flavour. I have tried to upload the file again named TestOutput.txt
TestOutput.txt
I ran this:
perl -v
This is perl, v5.8.8 built for aix-thread-multi
Copyright 1987-2006, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
We are using AIX Unix flavour. I have tried to upload the file again named TestOutput.txt
TestOutput.txt
Thanks Pradeep,
Would you be happy to accept a Perl solution now that we know that your system has Perl?
Would you be happy to accept a Perl solution now that we know that your system has Perl?
ASKER
yes I dont see an issue with that.
OK - I'll have a go at it.
Meanwhile, what do you think about Abhimanyu's solution?
Meanwhile, what do you think about Abhimanyu's solution?
ASKER
I am testing that but it is missing removing CRLF or LF in the unix script
Yes, I noticed that.
Also, the 2 "grep -i" commands should ideally be just "grep", because case insensitivity isn't required, but that probably wouldn't make any difference with your data.
Also, the 2 "grep -i" commands should ideally be just "grep", because case insensitivity isn't required, but that probably wouldn't make any difference with your data.
Hi Pradeep,
OK - back from dinner now.
Looking at the last 4 rows of this:
Code Change To
O v
P x
Q y
R z
Did you intentionally exclude "w" from the "Change To" column? If so, why?
OK - back from dinner now.
Looking at the last 4 rows of this:
Code Change To
O v
P x
Q y
R z
Did you intentionally exclude "w" from the "Change To" column? If so, why?
ASKER
This is the complete list:
R y
Q x
P w
O v
N u
M t
L s
K r
J q
} p
{ 0
A 1
B 2
C 3
D 4
E 5
F 6
G 7
H 8
I 9
R y
Q x
P w
O v
N u
M t
L s
K r
J q
} p
{ 0
A 1
B 2
C 3
D 4
E 5
F 6
G 7
H 8
I 9
Your original post had "P" being changed to "x".
Your latest post has "P" being changed to "w".
Which is correct?
And do you want me to use this "complete list"?
Your latest post has "P" being changed to "w".
Which is correct?
And do you want me to use this "complete list"?
Hi again Pradeep,
I'm using your "complete list" for the following solutions:
Let's assume you want to process all the .txt files in the current directory.
This command will overwrite all the .txt files with the processed version:
perl -i -pe 's/^(ROA.{22})(.)(.+)$/$1. $2=~tr|A-R {}|1-9q-y0 p|r.$3/e;s /\r?\n//' *.txt
This command will rename each .txt file to from <basename>.txt <basename>.txt.old and write the output to the original name (i.e. <basename>.txt).
perl -i.old -pe 's/^(ROA.{22})(.)(.+)$/$1. $2=~tr|A-R {}|1-9q-y0 p|r.$3/e;s /\r?\n//' *.txt
Now you know why I wanted to use Perl. Should run much faster than most shell scripts with a "read", etc, too, but that would only be significantly noticeable for large input files.
The output will not be quite like the file you provided because of your changes to your list of substitutions.
If neither of the above are adequate, please let me know why not.
tel2
I'm using your "complete list" for the following solutions:
Let's assume you want to process all the .txt files in the current directory.
This command will overwrite all the .txt files with the processed version:
perl -i -pe 's/^(ROA.{22})(.)(.+)$/$1.
This command will rename each .txt file to from <basename>.txt <basename>.txt.old and write the output to the original name (i.e. <basename>.txt).
perl -i.old -pe 's/^(ROA.{22})(.)(.+)$/$1.
Now you know why I wanted to use Perl. Should run much faster than most shell scripts with a "read", etc, too, but that would only be significantly noticeable for large input files.
The output will not be quite like the file you provided because of your changes to your list of substitutions.
If neither of the above are adequate, please let me know why not.
tel2
Handling carriage return is pretty simple,
sed 's/\r//g' parse.log
For better readability you can add another function to the script
func_removecarr()
{
sed 's/\r//g' parse.log > parse_wo_carr.log
# once tested and satisfied add below line
# rm -rf parse.log
# Also you may want to append logfiles with some identifier when handling many files together
}
Here is an example
So, I have introduced couple of CR and LF in the example
sed 's/$/\r\n/g' abc.txt > abc1.txt
cat -v abc1.txt
QTY0147 000000000000000^M
ROA0112500000000000000422p EUR^M
ROA06X2500000000000000422p SEK^M
CUX212^M SEK4 0000 0000000001000000^M
ROA2123 00000000000000528pEUR^M
ROA411 00000000000000106pEUR^M
PRI01CAL000000000000000 000000000^M
sed 's/\r//g' abc1.txt|awk 'NF'
QTY0147 000000000000000
ROA0112500000000000000422p EUR
ROA06X2500000000000000422p SEK
CUX212 SEK4 0000 0000000001000000
ROA2123 00000000000000528pEUR
ROA411 00000000000000106pEUR
PRI01CAL000000000000000 000000000
Now, coming back to perl vs shell, obviously perl has a much richer library
When it comes to speed, it is not noticeable for couple 100 thousand records.
I am not a perl guy, but it is amazing to see that all of it can be done in a one liner.
I tried sed/awk but still had to put in context and clauses.
Maybe it is time to stop procrastinating about learning perl and actually learn it :)
sed 's/\r//g' parse.log
For better readability you can add another function to the script
func_removecarr()
{
sed 's/\r//g' parse.log > parse_wo_carr.log
# once tested and satisfied add below line
# rm -rf parse.log
# Also you may want to append logfiles with some identifier when handling many files together
}
Here is an example
So, I have introduced couple of CR and LF in the example
sed 's/$/\r\n/g' abc.txt > abc1.txt
cat -v abc1.txt
QTY0147 000000000000000^M
ROA0112500000000000000422p
ROA06X2500000000000000422p
CUX212^M SEK4 0000 0000000001000000^M
ROA2123 00000000000000528pEUR^M
ROA411 00000000000000106pEUR^M
PRI01CAL000000000000000 000000000^M
sed 's/\r//g' abc1.txt|awk 'NF'
QTY0147 000000000000000
ROA0112500000000000000422p
ROA06X2500000000000000422p
CUX212 SEK4 0000 0000000001000000
ROA2123 00000000000000528pEUR
ROA411 00000000000000106pEUR
PRI01CAL000000000000000 000000000
Now, coming back to perl vs shell, obviously perl has a much richer library
When it comes to speed, it is not noticeable for couple 100 thousand records.
I am not a perl guy, but it is amazing to see that all of it can be done in a one liner.
I tried sed/awk but still had to put in context and clauses.
Maybe it is time to stop procrastinating about learning perl and actually learn it :)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Muscle reflex :), fingers automatically go with "grep -i"
For sed |awk 'NF', I misunderstood that everything supposed to be in one long string .
I will definitely start working on PERL
Thanks "tel2" for your inputs
For sed |awk 'NF', I misunderstood that everything supposed to be in one long string .
I will definitely start working on PERL
Thanks "tel2" for your inputs
No worries, Abhimanyu.
> "For sed |awk 'NF', I misunderstood that everything supposed to be in one long string."
One long line is what you will get when you remove all \r\n and \n.
How would you do it?
tel2
> "For sed |awk 'NF', I misunderstood that everything supposed to be in one long string."
One long line is what you will get when you remove all \r\n and \n.
How would you do it?
tel2
Probably something like this
sed 's/$/\r\n/g' abc.txt > abc1.txt
cat -v abc1.txt
QTY0147 000000000000000 ^M
ROA0112500000000000000422} EUR ^M
ROA06X2500000000000000422} SEK ^M
CUX212 SEK4 0000 0000000001000000 ^M
ROA2123 00000000000000528}EUR ^M
ROA411 00000000000000106}EUR ^M
PRI01CAL000000000000000 000000000 ^M
sed 's/\r//g' abc1.txt |tr '\n' ' ' > abc2.txt
cat abc2.txt
QTY0147 000000000000000 ROA0112500000000000000422} EUR ROA06X2500000000000000422} SEK CUX212 SEK4 0000 0000000001000000 ROA2123 00000000000000528}EUR ROA411 00000000000000106}EUR PRI01CAL000000000000000 000000000
sed 's/$/\r\n/g' abc.txt > abc1.txt
cat -v abc1.txt
QTY0147 000000000000000 ^M
ROA0112500000000000000422}
ROA06X2500000000000000422}
CUX212 SEK4 0000 0000000001000000 ^M
ROA2123 00000000000000528}EUR ^M
ROA411 00000000000000106}EUR ^M
PRI01CAL000000000000000 000000000 ^M
sed 's/\r//g' abc1.txt |tr '\n' ' ' > abc2.txt
cat abc2.txt
QTY0147 000000000000000 ROA0112500000000000000422}
That's close, Abhimanyu, but the problems with:
sed 's/\r//g' abc1.txt |tr '\n' ' ' > abc2.txt
are:
1. tr will replace '\n' with a space. Try this instead: tr -d '\n'
2. sed will remove ALL '\r' chars even if they don't have a '\n' after them. In most situations this won't be a problem. sed may be able to handle it, but I'm not sure how, so I'd use Perl.
sed 's/\r//g' abc1.txt |tr '\n' ' ' > abc2.txt
are:
1. tr will replace '\n' with a space. Try this instead: tr -d '\n'
2. sed will remove ALL '\r' chars even if they don't have a '\n' after them. In most situations this won't be a problem. sed may be able to handle it, but I'm not sure how, so I'd use Perl.
Having thought about it more, Abhimanyu...
A simpler way to do this:
sed 's/\r//g' abc1.txt |tr '\n' ' '
would be this:
tr -d '\r\n' <abc1.txt
which deletes any of the listed characters.
But if you want to avoid removing any '\r' which are not followed by '\n', you could do:
sed 's/\r$//g' abc1.txt |tr '\n' ' ' # Note the added '$'
A simpler way to do this:
sed 's/\r//g' abc1.txt |tr '\n' ' '
would be this:
tr -d '\r\n' <abc1.txt
which deletes any of the listed characters.
But if you want to avoid removing any '\r' which are not followed by '\n', you could do:
sed 's/\r$//g' abc1.txt |tr '\n' ' ' # Note the added '$'
ASKER
Thanks both. I have tested both solutions and they are working as expected. Appreciate both of your assistance.
ASKER
Thank you both. I have made a split of the points accordingly.
Welcome :)
It was a good learning for me too, thanks to "tel2"
It was a good learning for me too, thanks to "tel2"
A pleasure doing business with you all. It was an interesting task.
Abhimanyu, my last command:
sed 's/\r$//g' abc1.txt | tr '\n' ' ' # Note the added '$'
should have read:
sed 's/\r$//g' abc1.txt | tr -d '\n' # Note the added '$'
Also, re doing this kind of task in bash, you shouldn't need to add more code (conditions) for each match (substitution) that you need to cater for. You should just be able to grep it from your match.txt file, etc. A faster way could be to store that list of pairs from match.txt in a hash (i.e. associative array) at the beginning of the script, then just look up the hash after that. I've never used hash's in shell scripts, but I use them in Perl a lot as they are a very convenient/powerful data structure. Have you used them in bash?
Abhimanyu, my last command:
sed 's/\r$//g' abc1.txt | tr '\n' ' ' # Note the added '$'
should have read:
sed 's/\r$//g' abc1.txt | tr -d '\n' # Note the added '$'
Also, re doing this kind of task in bash, you shouldn't need to add more code (conditions) for each match (substitution) that you need to cater for. You should just be able to grep it from your match.txt file, etc. A faster way could be to store that list of pairs from match.txt in a hash (i.e. associative array) at the beginning of the script, then just look up the hash after that. I've never used hash's in shell scripts, but I use them in Perl a lot as they are a very convenient/powerful data structure. Have you used them in bash?
> "So this should be replaced with the above table and then ^r^n and ^n should be removed from the file and should look like the TestOutput.txt file attached."
When you say "^r^n and ^n should be removed" are you talking about CR-LF and LF? If so, why are they still in TestOutput.txt?
> "I want to do this through either a unix/awk script and want to do it on bulk files in a folder. How to achieve this?"
Will you accept a Perl solution? Many UNIX/Linux systems come with Perl.
How about a sed solution?
tel2