Link to home
Start Free TrialLog in
Avatar of Pankaj_Sachdeva
Pankaj_Sachdeva

asked on

AWK script to count and then replace

Hi All,

I am looking for a simple awk script which can count the segment delimeters of a file and then replace the count in the UNT segment.

For example i am showing the below instance of a file:-

UNA:+.?*'
UNB+UNOB:4+0050:ZZZ+DBH+20081105:1803+0811051803'
UNG+IFTMIN+0050+3995+20081105:1803+0811051803+UN+D:99AEBL02'
UNH+0811051803+IFTMIN:99A:UNEBL02'
BGM+705+C00005738+5'
CTA+MS+:EnterpriseBatchProcessor'
DTM+137:200811031729:203'
FTX+AEA++9'
CNT+11:22:PCE'
CNT+7:22500:KGM'
CNT+15:0.053:MTQ'
DOC+710++EI+2+0'
LOC+57+DEHAM:::Hamburg'
RFF+FF:C00005738'
TDT+20+125+1+13:OCEAN VESSEL+00009999:172+++:::COSCO LONG BEACH'
DTM+133:20081105:102'
LOC+9+DEHAM:::Hamburg'
LOC+12+SGSIN:::Singapore'
NAD+CN+00004225+11 JOO KOON CIRCLE:JURONG:11 JOO KOON CIRCLE:SINGAPORE+ALFA LAVAL SINGAPORE PTE LTD++++629043'
NAD+CZ+00007306+ALTENAER STRASSE 72-76-58675 HEMER:ALTENAER STRASSE 72-76-58675 HEMER:HH+MATRIX VERTRIEBSSERVICE GMBH'
CTA+IC+:EnterpriseBatchProcessor'
NAD+N1+00012726+Amselstrasse::Amselstrasse:Hamburg:HH+Hanjin Shipping Co. Ltd. Schiffsmak++++20457'
GID+1+22:PA:::PACKET'
FTX+AAA+++MACHINERY PARTS'
MEA+WT+AAE+KGM:22500'
MEA+VOL+AAE+MTQ:0.053'
RFF+BN:HJSHAM123456'
PCI++ALFA LAVAL SINGAPORE'
SGP+HJCU8521230+22'
EQD+CN+HJCU8521230+42G0'
TMD+3:FCL/FCL+FCL'
SEL+123963+SH'
RFF+BN:SDEHAG0000023'
UNT+31+0811051803'
UNE+1+0811051803'
UNZ+1+0811051803'

We need to count the number of segment delimeters i.e. the symbol ' from the UNH to the UNT segment and then replace the UNT+31 the 31 value with the exact count.

Please help. Any other information i will let you know.

Regards
Karan
Avatar of ozo
ozo
Flag of United States of America image

awk '/^UNH/,/^UNT/{if( index($0,"'"'"'") ){c++}};{sub(/^UNT\+[0-9]+/,"UNT+" c);print}'
Avatar of Pankaj_Sachdeva
Pankaj_Sachdeva

ASKER

Hi I tried to run the above awk script on the below file :-

UNB+UNOB:4+0050:ZZZ+DBH+20081110:1129+0811101129'
UNG+IFTMIN+0050+3995+20081110:1129+0811101129+UN+D:99A:DEBL02'
UNH+0811101129+IFTMIN:D:99A:UN:DEBL02'
BGM+705+C00005752+9'
CTA+MS+:EnterpriseBatchProcessor'
DTM+137:200811041510:203'
FTX+AEA++5'
CNT+11:0:PCE'
CNT+7:700:KGM'
CNT+15:5:MTQ'
LOC+57+DEBRE:::Bremen'
RFF+FF:C00005752'
TDT+20+VO0411086+1+13:OCEAN VESSEL+00009999:172+++:::ANL EXPLORER'
DTM+133:20081106:102'
LOC+9+DEBRE:::Bremen'
LOC+12+CAZZZ:::DUMMY PORT'
NAD+CN+00001142+DSV AIR & SEA INC.:100 Walnut Avenue:Suite 405:Clark:NJ+++++07066'
NAD+CZ+00001062+DSV AIR & SEA GMBH:Schlachte 15/18:LKW ABTEILUNG:Bremen:HB+++++28195'
CTA+IC+:EnterpriseBatchProcessor'
NAD+N1+00012726+Hanjin Shipping Co. Ltd. Schiffsmak:Amselstrasse::Hamburg:HH+++++20457'
GID+1+0:PA:::PACKET'
PIA+5+041106:CC:169'
FTX+AAA+++desc0411086'
MEA+WT+AAE+KGM:700'
MEA+VOL+AAE+MTQ:5'
PCI++M&M0411086'
SGP+CONT0411060+0'
EQD+CN+CONT0411060+2060'
TMD+2:LCL/LCL+LCL'
SEL+seal041108+SH'
UNT+30+0811101129'
UNE+1+0811101129'
UNZ+1+0811101129'

The UNT count is mentioned as 30 here however if you check the number of lines from UNH to UNT segment having segment delimeter as ' is 29.

I ran your awk script and the output it gave me was same it didnt change the UNT count to 29.

Thanks for your help..

Regards

Karan
I was counting the ' at the end of UNT+30+0811101129'
If don't want to count it, you can reverse the substitte and the count
awk '{sub(/^UNT\+[0-9]+/,"UNT+" c);print};/^UNH/,/^UNT/{if( index($0,"'"'"'") ){c++}}'
I am also attaching a sample file which is not new line terminated. the whole data would come in a single line with segment delimeter as '.

Thanks

Karan
DBHA.005M3C.IFTMIN.000901115.txt
Thanks ozo but i tried both the solutions on the above attached file but they are still giving me UNT count as 30 in the output whereas it should be 29.

I need to count the number of segment delimeters ' from the UNH segment to the UNT segment.

Thanks for your help

Karan
I count 33 in that file
awk '{match($0,/UNH.*UNT/);u=substr($0,RSTART,RLENGTH);sub(/'"'UNT\+[0-9]+/,"'"'"'UNT+"'" gsub(/'"'"'/,FS,$0));print}' < DBHA.005M3C.IFTMIN.000901115.txt
Count the number of segment delimeters i.e. the symbol ' in that file starting from the UNH segment to the UNT segment.

Below please see:-

UNH+0811101129+IFTMIN:D:99A:UN:DEBL02'
BGM+705+C00005752+9'
CTA+MS+:EnterpriseBatchProcessor'
DTM+137:200811041510:203'
FTX+AEA++5'
CNT+11:0:PCE'
CNT+7:700:KGM'
CNT+15:5:MTQ'
LOC+57+DEBRE:::Bremen'
RFF+FF:C00005752'
TDT+20+VO0411086+1+13:OCEAN VESSEL+00009999:172+++:::ANL EXPLORER'
DTM+133:20081106:102'
LOC+9+DEBRE:::Bremen'
LOC+12+CAZZZ:::DUMMY PORT'
NAD+CN+00001142+DSV AIR & SEA INC.:100 Walnut Avenue:Suite 405:Clark:NJ+++++07066'
NAD+CZ+00001062+DSV AIR & SEA GMBH:Schlachte 15/18:LKW ABTEILUNG:Bremen:HB+++++28195'
CTA+IC+:EnterpriseBatchProcessor'
NAD+N1+00012726+Hanjin Shipping Co. Ltd. Schiffsmak:Amselstrasse::Hamburg:HH+++++20457'
GID+1+0:PA:::PACKET'
PIA+5+041106:CC:169'
FTX+AAA+++desc0411086'
MEA+WT+AAE+KGM:700'
MEA+VOL+AAE+MTQ:5'
PCI++M&M0411086'
SGP+CONT0411060+0'
EQD+CN+CONT0411060+2060'
TMD+2:LCL/LCL+LCL'
SEL+seal041108+SH'
UNT+30+0811101129'

The number of count of segment delimeter ' is coming to be 29 if we count manually.

Thanks

Karan
And after substituting the value of the UNT counter the whole file should come as it was only the UNT counter should be changed nothing else.
Please help its really urgent for me
Thanks
I get 29 if we count like http:#a22928072 and 28 if we count like http:#a22928139
Can you tell me how are you running this awk script i am running as below :-
awk '/^UNH/,/^UNT/{if( index($0,"'"'"'") ){c++}};{sub(/^UNT\+[0-9]+/,"UNT+" c);print}'  inputfilename
Im getting UNT counter as 30 for both can you please help.... :(
cat >  inputfilename <<ENDHERE
UNH+0811101129+IFTMIN:D:99A:UN:DEBL02'
BGM+705+C00005752+9'
CTA+MS+:EnterpriseBatchProcessor'
DTM+137:200811041510:203'
FTX+AEA++5'
CNT+11:0:PCE'
CNT+7:700:KGM'
CNT+15:5:MTQ'
LOC+57+DEBRE:::Bremen'
RFF+FF:C00005752'
TDT+20+VO0411086+1+13:OCEAN VESSEL+00009999:172+++:::ANL EXPLORER'
DTM+133:20081106:102'
LOC+9+DEBRE:::Bremen'
LOC+12+CAZZZ:::DUMMY PORT'
NAD+CN+00001142+DSV AIR & SEA INC.:100 Walnut Avenue:Suite 405:Clark:NJ+++++07066'
NAD+CZ+00001062+DSV AIR & SEA GMBH:Schlachte 15/18:LKW ABTEILUNG:Bremen:HB+++++28195'
CTA+IC+:EnterpriseBatchProcessor'
NAD+N1+00012726+Hanjin Shipping Co. Ltd. Schiffsmak:Amselstrasse::Hamburg:HH+++++20457'
GID+1+0:PA:::PACKET'
PIA+5+041106:CC:169'
FTX+AAA+++desc0411086'
MEA+WT+AAE+KGM:700'
MEA+VOL+AAE+MTQ:5'
PCI++M&M0411086'
SGP+CONT0411060+0'
EQD+CN+CONT0411060+2060'
TMD+2:LCL/LCL+LCL'
SEL+seal041108+SH'
UNT+30+0811101129'
ENDHERE
awk '/^UNH/,/^UNT/{if( index($0,"'"'"'") ){c++}};{sub(/^UNT\+[0-9]+/,"UNT+" c);print}'  inputfilename
UNH+0811101129+IFTMIN:D:99A:UN:DEBL02'
BGM+705+C00005752+9'
CTA+MS+:EnterpriseBatchProcessor'
DTM+137:200811041510:203'
FTX+AEA++5'
CNT+11:0:PCE'
CNT+7:700:KGM'
CNT+15:5:MTQ'
LOC+57+DEBRE:::Bremen'
RFF+FF:C00005752'
TDT+20+VO0411086+1+13:OCEAN VESSEL+00009999:172+++:::ANL EXPLORER'
DTM+133:20081106:102'
LOC+9+DEBRE:::Bremen'
LOC+12+CAZZZ:::DUMMY PORT'
NAD+CN+00001142+DSV AIR & SEA INC.:100 Walnut Avenue:Suite 405:Clark:NJ+++++07066'
NAD+CZ+00001062+DSV AIR & SEA GMBH:Schlachte 15/18:LKW ABTEILUNG:Bremen:HB+++++28195'
CTA+IC+:EnterpriseBatchProcessor'
NAD+N1+00012726+Hanjin Shipping Co. Ltd. Schiffsmak:Amselstrasse::Hamburg:HH+++++20457'
GID+1+0:PA:::PACKET'
PIA+5+041106:CC:169'
FTX+AAA+++desc0411086'
MEA+WT+AAE+KGM:700'
MEA+VOL+AAE+MTQ:5'
PCI++M&M0411086'
SGP+CONT0411060+0'
EQD+CN+CONT0411060+2060'
TMD+2:LCL/LCL+LCL'
SEL+seal041108+SH'
UNT+29+0811101129'
didnt get your last point what you meant...can you tell me how your running the awk script on the input file
see the below i ran the first awk you created on the input file and it gave me the below result:-
/dfds03/editst6/pankaj >{sub(/^UNT\+[0-9]+/,"UNT+" c);print}' DBHA.005M3C.IFTMIN.000901115                                <
UNB+UNOB:4+0050:ZZZ+DBH+20081110:1129+0811101129'UNG+IFTMIN+0050+3995+20081110:1129+0811101129+UN+D:99A:DEBL02'UNH+0811101129+IFTMIN:D:99A:UN:DEBL02'BGM+705+C00005752+9'CTA+MS+:EnterpriseBatchProcessor'DTM+137:200811041510:203'FTX+AEA++5'CNT+11:0:PCE'CNT+7:700:KGM'CNT+15:5:MTQ'LOC+57+DEBRE:::Bremen'RFF+FF:C00005752'TDT+20+VO0411086+1+13:OCEAN VESSEL+00009999:172+++:::ANL EXPLORER'DTM+133:20081106:102'LOC+9+DEBRE:::Bremen'LOC+12+CAZZZ:::DUMMY PORT'NAD+CN+00001142+DSV AIR & SEA INC.:100 Walnut Avenue:Suite 405:Clark:NJ+++++07066'NAD+CZ+00001062+DSV AIR & SEA GMBH:Schlachte 15/18:LKW ABTEILUNG:Bremen:HB+++++28195'CTA+IC+:EnterpriseBatchProcessor'NAD+N1+00012726+Hanjin Shipping Co. Ltd. Schiffsmak:Amselstrasse::Hamburg:HH+++++20457'GID+1+0:PA:::PACKET'PIA+5+041106:CC:169'FTX+AAA+++desc0411086'MEA+WT+AAE+KGM:700'MEA+VOL+AAE+MTQ:5'PCI++M&M0411086'SGP+CONT0411060+0'EQD+CN+CONT0411060+2060'TMD+2:LCL/LCL+LCL'SEL+seal041108+SH'UNT+30+0811101129'UNE+1+0811101129'UNZ+1+0811101129'
http:#a22928342 creates inputfilename,
runs awk on inputfilename,
and shows the result

what was the input file and command used to produce http:#a22928446 ?
Command to produce the output in http:#a22928446 ? was
awk '/^UNH/,/^UNT/{if( index($0,"'"'"'") ){c++}};{sub(/^UNT\+[0-9]+/,"UNT+" c);print}' DBHA.005M3C.IFTMIN.000901115.txt
The input file has been attached for your reference.
Karan

DBHA.005M3C.IFTMIN.000901115.txt
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
if you bring it to new line terminated and then check the count of ' between UNH and UNT segments its 29.
Can you please check that manually. The correct UNT counter should be 29.
Here's how I count
UNH+0811101129+IFTMIN:D:99A:UN:DEBL02'<1>BGM+705+C00005752+9'<2>CTA+MS+:EnterpriseBatchProcessor'<3>DTM+137:200811041510:203'<4>FTX+AEA++5'<5>CNT+11:0:PCE'<6>CNT+7:700:KGM'<7>CNT+15:5:MTQ'<8>LOC+57+DEBRE:::Bremen'<9>RFF+FF:C00005752'<10>TDT+20+VO0411086+1+13:OCEAN VESSEL+00009999:172+++:::ANL EXPLORER'<11>DTM+133:20081106:102'<12>LOC+9+DEBRE:::Bremen'<13>LOC+12+CAZZZ:::DUMMY PORT'<14>NAD+CN+00001142+DSV AIR & SEA INC.:100 Walnut Avenue:Suite 405:Clark:NJ+++++07066'<15>NAD+CZ+00001062+DSV AIR & SEA GMBH:Schlachte 15/18:LKW ABTEILUNG:Bremen:HB+++++28195'<16>CTA+IC+:EnterpriseBatchProcessor'<17>NAD+N1+00012726+Hanjin Shipping Co. Ltd. Schiffsmak:Amselstrasse::Hamburg:HH+++++20457'<18>GID+1+0:PA:::PACKET'<19>PIA+5+041106:CC:169'<20>FTX+AAA+++desc0411086'<21>MEA+WT+AAE+KGM:700'<22>MEA+VOL+AAE+MTQ:5'<23>PCI++M&M0411086'<24>SGP+CONT0411060+0'<25>EQD+CN+CONT0411060+2060'<26>TMD+2:LCL/LCL+LCL'<27>SEL+seal041108+SH'<28>UNT
We need to count the UNT segment delimeter as well. That will make it to be 29 then . All the segment delimeters i.e. ' needs to be counted for UNH till UNT end. inlcuding the UNT one as well..