Unix: reformat a file so that it can be awk-ed

Watnog
Watnog used Ask the Experts™
on
Dear Experts,

I need a file in irregular format fixed so that it can awk it.
The original file looks like below:

CD0E0100_H000 0002D0508F CD0E0100_H0003010101013MSPROXMLT 01JTN VJ 00M 00A 00NNT_TH56CEXPN
FT0Y3600_H000 0013D0508F FT0Y3600_H0003120112013MSPROXMLT 01JTN VJ 01M 01A 01NNT_TH56CEXPO
MSPROXMLH 000 0002D0508F MSPROXMLH 0003010101013MSUPDSEOT 01JTN VJ 01M 01A 01NN O
MSPROXMLH 000 0003D0508F MSPROXMLH 0003020102013VSRFUMVTT 01JTN VJ 01M 01A 01NN O
MSPROXMLH 000 0004D0508F MSPROXMLH 0003030103013FT0Y6600_T01JTN VJ 01M 01A 01NNP_FT56CEXPO
MSPROXMLH 000 0005D0508F MSPROXMLH 0003040104013MSINSMSGT 01JTN V 00 00 00NN
PC0R3420_H000 0002D0508F PC0R3420_H0003010101013MSPROXMLT 01JTN V 00 00 00NNT_TH56CEXP
WO0A0020_H000 0002D0508F WO0A0020_H0003010101013MSPROXMLT 01JTN VJ 01M 01A 01NNT_TH56CEXPO
XGGENEXTH 000 0002D0508F XGGENEXTH 0003010101013MSPROXMLT 01JTN VJ 00M 00A 00NN O

Open in new window


It should be transformed to the format below...

CD0E0100_H000 0002D0508F CD0E0100_H0003010101013MSPROXMLT 01JTN VJ 00M 00A 00NNT_TH56CEXPN
FT0Y3600_H000 0013D0508F FT0Y3600_H0003120112013MSPROXMLT 01JTN VJ 01M 01A 01NNT_TH56CEXPO
MSPROXMLH_000 0002D0508F MSPROXMLH_0003010101013MSUPDSEOT 01JTN VJ 01M 01A 01NN O
MSPROXMLH_000 0003D0508F MSPROXMLH_0003020102013VSRFUMVTT 01JTN VJ 01M 01A 01NN O
MSPROXMLH_000 0004D0508F MSPROXMLH_003030103013FT0Y6600_T 01JTN VJ 01M 01A 01NNP_FT56CEXPO
MSPROXMLH_000 0005D0508F MSPROXMLH_0003040104013MSINSMSGT 01JTN V 00 00 00NN
PC0R3420_H000 0002D0508F PC0R3420_H0003010101013MSPROXMLT 01JTN V 00 00 00NNT_TH56CEXP
WO0A0020_H000 0002D0508F WO0A0020_H0003010101013MSPROXMLT 01JTN VJ 01M 01A 01NNT_TH56CEXPO
XGGENEXTH_000 0002D0508F XGGENEXTH_0003010101013MSPROXMLT 01JTN VJ 00M 00A 00NN O

Open in new window


It comes down to creating a first column of 13 digits,  a second one of 10, a third one of 32, a fourth one of 5.
Lines 1 and 2 represent how the rest of the lines need to look after the conversion.

Can you have a look please?
As always, many thanks and cheers.
Watnog
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Distinguished Expert 2017

Commented:
I would forget awk and use perl.

Try the following,
cat file -| perl -e 'while (<STDIN>) {
chomp();
$front=substr($_,0,16);
$back=substr($_,17,len ($_)-16);
$front=~ s/ /_/;
print "$front$back\n";
}'

The part of importance is to grab the first group, the substring should not capture the space after the first element.

Double check and adjust the substring of front and back to make sure only the possible space in the first 16 characters.

Author

Commented:
Thanks Arnold.

I get this error

"Undefined subroutine &main::len called at -e line 4, <STDIN> line 1."

W.
Distinguished Expert 2017
Commented:
Try length.

The issue us that your last field is not uniform in terms

If it was uniform, the simple approach would be to strip the underscores and then reformat

Cat file | sed -e 's/_/ /g'  | awk ' { print $1"_"$2" " $3" "$4"_"$5,$6,$7,$8,$9,$10,$11 } '

If you gave rules, perl usin strip out all underscores, use split on white space, and reassembly the line based on the rules.

Author

Commented:
Thank you Arnold. Glad you could help me with the perl solution as it gives me the best result.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial