Link to home
Start Free TrialLog in
Avatar of Watnog
WatnogFlag for Belgium

asked on

Unix: reformat a file so that it can be awk-ed

Dear Experts,

I need a file in irregular format fixed so that it can awk it.
The original file looks like below:

CD0E0100_H000 0002D0508F CD0E0100_H0003010101013MSPROXMLT 01JTN VJ 00M 00A 00NNT_TH56CEXPN
FT0Y3600_H000 0013D0508F FT0Y3600_H0003120112013MSPROXMLT 01JTN VJ 01M 01A 01NNT_TH56CEXPO
MSPROXMLH 000 0002D0508F MSPROXMLH 0003010101013MSUPDSEOT 01JTN VJ 01M 01A 01NN O
MSPROXMLH 000 0003D0508F MSPROXMLH 0003020102013VSRFUMVTT 01JTN VJ 01M 01A 01NN O
MSPROXMLH 000 0004D0508F MSPROXMLH 0003030103013FT0Y6600_T01JTN VJ 01M 01A 01NNP_FT56CEXPO
MSPROXMLH 000 0005D0508F MSPROXMLH 0003040104013MSINSMSGT 01JTN V 00 00 00NN
PC0R3420_H000 0002D0508F PC0R3420_H0003010101013MSPROXMLT 01JTN V 00 00 00NNT_TH56CEXP
WO0A0020_H000 0002D0508F WO0A0020_H0003010101013MSPROXMLT 01JTN VJ 01M 01A 01NNT_TH56CEXPO
XGGENEXTH 000 0002D0508F XGGENEXTH 0003010101013MSPROXMLT 01JTN VJ 00M 00A 00NN O

Open in new window


It should be transformed to the format below...

CD0E0100_H000 0002D0508F CD0E0100_H0003010101013MSPROXMLT 01JTN VJ 00M 00A 00NNT_TH56CEXPN
FT0Y3600_H000 0013D0508F FT0Y3600_H0003120112013MSPROXMLT 01JTN VJ 01M 01A 01NNT_TH56CEXPO
MSPROXMLH_000 0002D0508F MSPROXMLH_0003010101013MSUPDSEOT 01JTN VJ 01M 01A 01NN O
MSPROXMLH_000 0003D0508F MSPROXMLH_0003020102013VSRFUMVTT 01JTN VJ 01M 01A 01NN O
MSPROXMLH_000 0004D0508F MSPROXMLH_003030103013FT0Y6600_T 01JTN VJ 01M 01A 01NNP_FT56CEXPO
MSPROXMLH_000 0005D0508F MSPROXMLH_0003040104013MSINSMSGT 01JTN V 00 00 00NN
PC0R3420_H000 0002D0508F PC0R3420_H0003010101013MSPROXMLT 01JTN V 00 00 00NNT_TH56CEXP
WO0A0020_H000 0002D0508F WO0A0020_H0003010101013MSPROXMLT 01JTN VJ 01M 01A 01NNT_TH56CEXPO
XGGENEXTH_000 0002D0508F XGGENEXTH_0003010101013MSPROXMLT 01JTN VJ 00M 00A 00NN O

Open in new window


It comes down to creating a first column of 13 digits,  a second one of 10, a third one of 32, a fourth one of 5.
Lines 1 and 2 represent how the rest of the lines need to look after the conversion.

Can you have a look please?
As always, many thanks and cheers.
Watnog
Avatar of arnold
arnold
Flag of United States of America image

I would forget awk and use perl.

Try the following,
cat file -| perl -e 'while (<STDIN>) {
chomp();
$front=substr($_,0,16);
$back=substr($_,17,len ($_)-16);
$front=~ s/ /_/;
print "$front$back\n";
}'

The part of importance is to grab the first group, the substring should not capture the space after the first element.

Double check and adjust the substring of front and back to make sure only the possible space in the first 16 characters.
Avatar of Watnog

ASKER

Thanks Arnold.

I get this error

"Undefined subroutine &main::len called at -e line 4, <STDIN> line 1."

W.
ASKER CERTIFIED SOLUTION
Avatar of arnold
arnold
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Watnog

ASKER

Thank you Arnold. Glad you could help me with the perl solution as it gives me the best result.