• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 495
  • Last Modified:

AWK script to count and then replace

Hi All,

I am looking for a simple awk script which can count the segment delimeters of a file and then replace the count in the UNT segment.

For example i am showing the below instance of a file:-

UNA:+.?*'
UNB+UNOB:4+0050:ZZZ+DBH+20081105:1803+0811051803'
UNG+IFTMIN+0050+3995+20081105:1803+0811051803+UN+D:99AEBL02'
UNH+0811051803+IFTMIN:99A:UNEBL02'
BGM+705+C00005738+5'
CTA+MS+:EnterpriseBatchProcessor'
DTM+137:200811031729:203'
FTX+AEA++9'
CNT+11:22:PCE'
CNT+7:22500:KGM'
CNT+15:0.053:MTQ'
DOC+710++EI+2+0'
LOC+57+DEHAM:::Hamburg'
RFF+FF:C00005738'
TDT+20+125+1+13:OCEAN VESSEL+00009999:172+++:::COSCO LONG BEACH'
DTM+133:20081105:102'
LOC+9+DEHAM:::Hamburg'
LOC+12+SGSIN:::Singapore'
NAD+CN+00004225+11 JOO KOON CIRCLE:JURONG:11 JOO KOON CIRCLE:SINGAPORE+ALFA LAVAL SINGAPORE PTE LTD++++629043'
NAD+CZ+00007306+ALTENAER STRASSE 72-76-58675 HEMER:ALTENAER STRASSE 72-76-58675 HEMER:HH+MATRIX VERTRIEBSSERVICE GMBH'
CTA+IC+:EnterpriseBatchProcessor'
NAD+N1+00012726+Amselstrasse::Amselstrasse:Hamburg:HH+Hanjin Shipping Co. Ltd. Schiffsmak++++20457'
GID+1+22:PA:::PACKET'
FTX+AAA+++MACHINERY PARTS'
MEA+WT+AAE+KGM:22500'
MEA+VOL+AAE+MTQ:0.053'
RFF+BN:HJSHAM123456'
PCI++ALFA LAVAL SINGAPORE'
SGP+HJCU8521230+22'
EQD+CN+HJCU8521230+42G0'
TMD+3:FCL/FCL+FCL'
SEL+123963+SH'
RFF+BN:SDEHAG0000023'
UNT+31+0811051803'
UNE+1+0811051803'
UNZ+1+0811051803'

We need to count the number of segment delimeters i.e. the symbol ' from the UNH to the UNT segment and then replace the UNT+31 the 31 value with the exact count.

Please help. Any other information i will let you know.

Regards
Karan
0
Pankaj_Sachdeva
Asked:
Pankaj_Sachdeva
  • 12
  • 8
1 Solution
 
ozoCommented:
awk '/^UNH/,/^UNT/{if( index($0,"'"'"'") ){c++}};{sub(/^UNT\+[0-9]+/,"UNT+" c);print}'
0
 
Pankaj_SachdevaAuthor Commented:
Hi I tried to run the above awk script on the below file :-

UNB+UNOB:4+0050:ZZZ+DBH+20081110:1129+0811101129'
UNG+IFTMIN+0050+3995+20081110:1129+0811101129+UN+D:99A:DEBL02'
UNH+0811101129+IFTMIN:D:99A:UN:DEBL02'
BGM+705+C00005752+9'
CTA+MS+:EnterpriseBatchProcessor'
DTM+137:200811041510:203'
FTX+AEA++5'
CNT+11:0:PCE'
CNT+7:700:KGM'
CNT+15:5:MTQ'
LOC+57+DEBRE:::Bremen'
RFF+FF:C00005752'
TDT+20+VO0411086+1+13:OCEAN VESSEL+00009999:172+++:::ANL EXPLORER'
DTM+133:20081106:102'
LOC+9+DEBRE:::Bremen'
LOC+12+CAZZZ:::DUMMY PORT'
NAD+CN+00001142+DSV AIR & SEA INC.:100 Walnut Avenue:Suite 405:Clark:NJ+++++07066'
NAD+CZ+00001062+DSV AIR & SEA GMBH:Schlachte 15/18:LKW ABTEILUNG:Bremen:HB+++++28195'
CTA+IC+:EnterpriseBatchProcessor'
NAD+N1+00012726+Hanjin Shipping Co. Ltd. Schiffsmak:Amselstrasse::Hamburg:HH+++++20457'
GID+1+0:PA:::PACKET'
PIA+5+041106:CC:169'
FTX+AAA+++desc0411086'
MEA+WT+AAE+KGM:700'
MEA+VOL+AAE+MTQ:5'
PCI++M&M0411086'
SGP+CONT0411060+0'
EQD+CN+CONT0411060+2060'
TMD+2:LCL/LCL+LCL'
SEL+seal041108+SH'
UNT+30+0811101129'
UNE+1+0811101129'
UNZ+1+0811101129'

The UNT count is mentioned as 30 here however if you check the number of lines from UNH to UNT segment having segment delimeter as ' is 29.

I ran your awk script and the output it gave me was same it didnt change the UNT count to 29.

Thanks for your help..

Regards

Karan
0
 
ozoCommented:
I was counting the ' at the end of UNT+30+0811101129'
If don't want to count it, you can reverse the substitte and the count
awk '{sub(/^UNT\+[0-9]+/,"UNT+" c);print};/^UNH/,/^UNT/{if( index($0,"'"'"'") ){c++}}'
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
Pankaj_SachdevaAuthor Commented:
I am also attaching a sample file which is not new line terminated. the whole data would come in a single line with segment delimeter as '.

Thanks

Karan
DBHA.005M3C.IFTMIN.000901115.txt
0
 
Pankaj_SachdevaAuthor Commented:
Thanks ozo but i tried both the solutions on the above attached file but they are still giving me UNT count as 30 in the output whereas it should be 29.

I need to count the number of segment delimeters ' from the UNH segment to the UNT segment.

Thanks for your help

Karan
0
 
ozoCommented:
I count 33 in that file
awk '{match($0,/UNH.*UNT/);u=substr($0,RSTART,RLENGTH);sub(/'"'UNT\+[0-9]+/,"'"'"'UNT+"'" gsub(/'"'"'/,FS,$0));print}' < DBHA.005M3C.IFTMIN.000901115.txt
0
 
Pankaj_SachdevaAuthor Commented:
Count the number of segment delimeters i.e. the symbol ' in that file starting from the UNH segment to the UNT segment.

Below please see:-

UNH+0811101129+IFTMIN:D:99A:UN:DEBL02'
BGM+705+C00005752+9'
CTA+MS+:EnterpriseBatchProcessor'
DTM+137:200811041510:203'
FTX+AEA++5'
CNT+11:0:PCE'
CNT+7:700:KGM'
CNT+15:5:MTQ'
LOC+57+DEBRE:::Bremen'
RFF+FF:C00005752'
TDT+20+VO0411086+1+13:OCEAN VESSEL+00009999:172+++:::ANL EXPLORER'
DTM+133:20081106:102'
LOC+9+DEBRE:::Bremen'
LOC+12+CAZZZ:::DUMMY PORT'
NAD+CN+00001142+DSV AIR & SEA INC.:100 Walnut Avenue:Suite 405:Clark:NJ+++++07066'
NAD+CZ+00001062+DSV AIR & SEA GMBH:Schlachte 15/18:LKW ABTEILUNG:Bremen:HB+++++28195'
CTA+IC+:EnterpriseBatchProcessor'
NAD+N1+00012726+Hanjin Shipping Co. Ltd. Schiffsmak:Amselstrasse::Hamburg:HH+++++20457'
GID+1+0:PA:::PACKET'
PIA+5+041106:CC:169'
FTX+AAA+++desc0411086'
MEA+WT+AAE+KGM:700'
MEA+VOL+AAE+MTQ:5'
PCI++M&M0411086'
SGP+CONT0411060+0'
EQD+CN+CONT0411060+2060'
TMD+2:LCL/LCL+LCL'
SEL+seal041108+SH'
UNT+30+0811101129'

The number of count of segment delimeter ' is coming to be 29 if we count manually.

Thanks

Karan
0
 
Pankaj_SachdevaAuthor Commented:
And after substituting the value of the UNT counter the whole file should come as it was only the UNT counter should be changed nothing else.
Please help its really urgent for me
Thanks
0
 
ozoCommented:
I get 29 if we count like http:#a22928072 and 28 if we count like http:#a22928139
0
 
Pankaj_SachdevaAuthor Commented:
Can you tell me how are you running this awk script i am running as below :-
awk '/^UNH/,/^UNT/{if( index($0,"'"'"'") ){c++}};{sub(/^UNT\+[0-9]+/,"UNT+" c);print}'  inputfilename
0
 
Pankaj_SachdevaAuthor Commented:
Im getting UNT counter as 30 for both can you please help.... :(
0
 
ozoCommented:
cat >  inputfilename <<ENDHERE
UNH+0811101129+IFTMIN:D:99A:UN:DEBL02'
BGM+705+C00005752+9'
CTA+MS+:EnterpriseBatchProcessor'
DTM+137:200811041510:203'
FTX+AEA++5'
CNT+11:0:PCE'
CNT+7:700:KGM'
CNT+15:5:MTQ'
LOC+57+DEBRE:::Bremen'
RFF+FF:C00005752'
TDT+20+VO0411086+1+13:OCEAN VESSEL+00009999:172+++:::ANL EXPLORER'
DTM+133:20081106:102'
LOC+9+DEBRE:::Bremen'
LOC+12+CAZZZ:::DUMMY PORT'
NAD+CN+00001142+DSV AIR & SEA INC.:100 Walnut Avenue:Suite 405:Clark:NJ+++++07066'
NAD+CZ+00001062+DSV AIR & SEA GMBH:Schlachte 15/18:LKW ABTEILUNG:Bremen:HB+++++28195'
CTA+IC+:EnterpriseBatchProcessor'
NAD+N1+00012726+Hanjin Shipping Co. Ltd. Schiffsmak:Amselstrasse::Hamburg:HH+++++20457'
GID+1+0:PA:::PACKET'
PIA+5+041106:CC:169'
FTX+AAA+++desc0411086'
MEA+WT+AAE+KGM:700'
MEA+VOL+AAE+MTQ:5'
PCI++M&M0411086'
SGP+CONT0411060+0'
EQD+CN+CONT0411060+2060'
TMD+2:LCL/LCL+LCL'
SEL+seal041108+SH'
UNT+30+0811101129'
ENDHERE
awk '/^UNH/,/^UNT/{if( index($0,"'"'"'") ){c++}};{sub(/^UNT\+[0-9]+/,"UNT+" c);print}'  inputfilename
UNH+0811101129+IFTMIN:D:99A:UN:DEBL02'
BGM+705+C00005752+9'
CTA+MS+:EnterpriseBatchProcessor'
DTM+137:200811041510:203'
FTX+AEA++5'
CNT+11:0:PCE'
CNT+7:700:KGM'
CNT+15:5:MTQ'
LOC+57+DEBRE:::Bremen'
RFF+FF:C00005752'
TDT+20+VO0411086+1+13:OCEAN VESSEL+00009999:172+++:::ANL EXPLORER'
DTM+133:20081106:102'
LOC+9+DEBRE:::Bremen'
LOC+12+CAZZZ:::DUMMY PORT'
NAD+CN+00001142+DSV AIR & SEA INC.:100 Walnut Avenue:Suite 405:Clark:NJ+++++07066'
NAD+CZ+00001062+DSV AIR & SEA GMBH:Schlachte 15/18:LKW ABTEILUNG:Bremen:HB+++++28195'
CTA+IC+:EnterpriseBatchProcessor'
NAD+N1+00012726+Hanjin Shipping Co. Ltd. Schiffsmak:Amselstrasse::Hamburg:HH+++++20457'
GID+1+0:PA:::PACKET'
PIA+5+041106:CC:169'
FTX+AAA+++desc0411086'
MEA+WT+AAE+KGM:700'
MEA+VOL+AAE+MTQ:5'
PCI++M&M0411086'
SGP+CONT0411060+0'
EQD+CN+CONT0411060+2060'
TMD+2:LCL/LCL+LCL'
SEL+seal041108+SH'
UNT+29+0811101129'
0
 
Pankaj_SachdevaAuthor Commented:
didnt get your last point what you meant...can you tell me how your running the awk script on the input file
0
 
Pankaj_SachdevaAuthor Commented:
see the below i ran the first awk you created on the input file and it gave me the below result:-
/dfds03/editst6/pankaj >{sub(/^UNT\+[0-9]+/,"UNT+" c);print}' DBHA.005M3C.IFTMIN.000901115                                <
UNB+UNOB:4+0050:ZZZ+DBH+20081110:1129+0811101129'UNG+IFTMIN+0050+3995+20081110:1129+0811101129+UN+D:99A:DEBL02'UNH+0811101129+IFTMIN:D:99A:UN:DEBL02'BGM+705+C00005752+9'CTA+MS+:EnterpriseBatchProcessor'DTM+137:200811041510:203'FTX+AEA++5'CNT+11:0:PCE'CNT+7:700:KGM'CNT+15:5:MTQ'LOC+57+DEBRE:::Bremen'RFF+FF:C00005752'TDT+20+VO0411086+1+13:OCEAN VESSEL+00009999:172+++:::ANL EXPLORER'DTM+133:20081106:102'LOC+9+DEBRE:::Bremen'LOC+12+CAZZZ:::DUMMY PORT'NAD+CN+00001142+DSV AIR & SEA INC.:100 Walnut Avenue:Suite 405:Clark:NJ+++++07066'NAD+CZ+00001062+DSV AIR & SEA GMBH:Schlachte 15/18:LKW ABTEILUNG:Bremen:HB+++++28195'CTA+IC+:EnterpriseBatchProcessor'NAD+N1+00012726+Hanjin Shipping Co. Ltd. Schiffsmak:Amselstrasse::Hamburg:HH+++++20457'GID+1+0:PA:::PACKET'PIA+5+041106:CC:169'FTX+AAA+++desc0411086'MEA+WT+AAE+KGM:700'MEA+VOL+AAE+MTQ:5'PCI++M&M0411086'SGP+CONT0411060+0'EQD+CN+CONT0411060+2060'TMD+2:LCL/LCL+LCL'SEL+seal041108+SH'UNT+30+0811101129'UNE+1+0811101129'UNZ+1+0811101129'
0
 
ozoCommented:
http:#a22928342 creates inputfilename,
runs awk on inputfilename,
and shows the result

what was the input file and command used to produce http:#a22928446 ?
0
 
Pankaj_SachdevaAuthor Commented:
Command to produce the output in http:#a22928446 ? was
awk '/^UNH/,/^UNT/{if( index($0,"'"'"'") ){c++}};{sub(/^UNT\+[0-9]+/,"UNT+" c);print}' DBHA.005M3C.IFTMIN.000901115.txt
The input file has been attached for your reference.
Karan

DBHA.005M3C.IFTMIN.000901115.txt
0
 
ozoCommented:
As I said in http:#a22928238
for a file all on one line, running
awk '{match($0,/UNH.*UNT/);u=substr($0,RSTART,RLENGTH);sub(/'"'UNT\+[0-9]+/,"'"'"'UNT+"'" gsub(/'"'"'/,FS,u));print}' < DBHA.005M3C.IFTMIN.000901115.txt
produces
UNB+UNOB:4+0050:ZZZ+DBH+20081110:1129+0811101129'UNG+IFTMIN+0050+3995+20081110:1129+0811101129+UN+D:99A:DEBL02'UNH+0811101129+IFTMIN:D:99A:UN:DEBL02'BGM+705+C00005752+9'CTA+MS+:EnterpriseBatchProcessor'DTM+137:200811041510:203'FTX+AEA++5'CNT+11:0:PCE'CNT+7:700:KGM'CNT+15:5:MTQ'LOC+57+DEBRE:::Bremen'RFF+FF:C00005752'TDT+20+VO0411086+1+13:OCEAN VESSEL+00009999:172+++:::ANL EXPLORER'DTM+133:20081106:102'LOC+9+DEBRE:::Bremen'LOC+12+CAZZZ:::DUMMY PORT'NAD+CN+00001142+DSV AIR & SEA INC.:100 Walnut Avenue:Suite 405:Clark:NJ+++++07066'NAD+CZ+00001062+DSV AIR & SEA GMBH:Schlachte 15/18:LKW ABTEILUNG:Bremen:HB+++++28195'CTA+IC+:EnterpriseBatchProcessor'NAD+N1+00012726+Hanjin Shipping Co. Ltd. Schiffsmak:Amselstrasse::Hamburg:HH+++++20457'GID+1+0:PA:::PACKET'PIA+5+041106:CC:169'FTX+AAA+++desc0411086'MEA+WT+AAE+KGM:700'MEA+VOL+AAE+MTQ:5'PCI++M&M0411086'SGP+CONT0411060+0'EQD+CN+CONT0411060+2060'TMD+2:LCL/LCL+LCL'SEL+seal041108+SH'UNT+28+0811101129'UNE+1+0811101129'UNZ+1+0811101129'
0
 
Pankaj_SachdevaAuthor Commented:
if you bring it to new line terminated and then check the count of ' between UNH and UNT segments its 29.
Can you please check that manually. The correct UNT counter should be 29.
0
 
ozoCommented:
Here's how I count
UNH+0811101129+IFTMIN:D:99A:UN:DEBL02'<1>BGM+705+C00005752+9'<2>CTA+MS+:EnterpriseBatchProcessor'<3>DTM+137:200811041510:203'<4>FTX+AEA++5'<5>CNT+11:0:PCE'<6>CNT+7:700:KGM'<7>CNT+15:5:MTQ'<8>LOC+57+DEBRE:::Bremen'<9>RFF+FF:C00005752'<10>TDT+20+VO0411086+1+13:OCEAN VESSEL+00009999:172+++:::ANL EXPLORER'<11>DTM+133:20081106:102'<12>LOC+9+DEBRE:::Bremen'<13>LOC+12+CAZZZ:::DUMMY PORT'<14>NAD+CN+00001142+DSV AIR & SEA INC.:100 Walnut Avenue:Suite 405:Clark:NJ+++++07066'<15>NAD+CZ+00001062+DSV AIR & SEA GMBH:Schlachte 15/18:LKW ABTEILUNG:Bremen:HB+++++28195'<16>CTA+IC+:EnterpriseBatchProcessor'<17>NAD+N1+00012726+Hanjin Shipping Co. Ltd. Schiffsmak:Amselstrasse::Hamburg:HH+++++20457'<18>GID+1+0:PA:::PACKET'<19>PIA+5+041106:CC:169'<20>FTX+AAA+++desc0411086'<21>MEA+WT+AAE+KGM:700'<22>MEA+VOL+AAE+MTQ:5'<23>PCI++M&M0411086'<24>SGP+CONT0411060+0'<25>EQD+CN+CONT0411060+2060'<26>TMD+2:LCL/LCL+LCL'<27>SEL+seal041108+SH'<28>UNT
0
 
Pankaj_SachdevaAuthor Commented:
We need to count the UNT segment delimeter as well. That will make it to be 29 then . All the segment delimeters i.e. ' needs to be counted for UNH till UNT end. inlcuding the UNT one as well..
 
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

  • 12
  • 8
Tackle projects and never again get stuck behind a technical roadblock.
Join Now