Output Control

Hello,

I have a tab delimited file containing following example records (no specific characters or row length)

word word word word word word
word word
word word word word
word word word word word
word word word
word

I'm looking for a way to only output 3 words per line. If a line contains more than 3 words, move the rest to a new line. In case a line contains 5 words and 2 words are moved to a new line, then any line with one word in the original file should be appended at the end or beginning of the line to make a 3 word record. In case there is 1 record, then any line from the original file containing 2 words should be appended. In case there are not enough 1 or 2 word records left then ANY 1 or 2 word combinations from either the original or the new file can be appended.

Thank you!
faithless1Asked:
Who is Participating?
 
wilcoxonCommented:
I think this will do what you want.  Look for comments with XXX in them for things you can change to alter the behavior.  Mostly these relate to tabs - your question says tab-delimited for input but the example is space-delimited and you don't refer to how you want the output.

Given the input specified in johanntagle's comment, it will produce the following output (this is how I interpreted your question):

one two three
four five twenty
six seven eleven
eight nine ten
twelve thirteen fourteen
fifteen sixteen
seventeeen eighteen nineteen

Note "fifteen sixteen" is only two words and not the last record because you did not specify what to do when there were no more one or two word records left in either file to append.

The script is callable as:

script.pl input_file
#!/usr/bin/perl

use strict;
use warnings;

# XXX - output separator - to use tab isntead of space, change " " to "\t"
my $sep = " ";

my @lines;
my @base = (undef, [], []);
my @xtra = (undef, [], []);

# XXX - for testing, uncomment "while (<DATA>)" and comment out "while (<>)"
#while (<DATA>) {
while (<>) {
    chomp;
    # XXX - if you want only tab delimiter, change \s+ to \t+
    my @words = split /\s+/;
    next unless @words; # skip lines with no words at all
    push @lines, [@words];
    my $cnt = scalar @words;
    if ($cnt == 1) {
        push @{$base[1]}, @lines-1;
    } elsif ($cnt == 2) {
        push @{$base[2]}, @lines-1;
    } elsif ($cnt % 3 == 1) {
        push @{$xtra[1]}, @lines-1;
    } elsif ($cnt % 3 == 2) {
        push @{$xtra[2]}, @lines-1;
    }
}

for my $i (0 .. @lines-1) {
    my @words = @{$lines[$i]};
    next unless @words; # skip lines where we removed all words
    while (@words > 3) {
        print join($sep, splice(@words, 0, 3)), "\n";
    }
    next unless @words;
    # where to look for "extra" words
    my $off = (@words == 1) ? 2 : 1;
    my $line = -1;
    # skip lines we've already seen - check @base first then @xtra
    while (@{$base[$off]} and $line < $i) {
        $line = pop @{$base[$off]};
    }
    while (@{$xtra[$off]} and $line < $i) {
        $line = pop @{$xtra[$off]};
    }
    if ($line >= 0) {
        push @words, splice(@{$lines[$line]}, -$off);
    }
    print join($sep, @words), "\n";
}

__DATA__
one two three four five
six seven
eight nine ten eleven
twelve thirteen fourteen fifteen sixteen
seventeeen eighteen nineteen
twenty

Open in new window

0
 
TvMptCommented:
Hi.
Im not used with perl but why you dont read all lines to a string and then split the words making an array, then you easily print 3 positions of array at a time and per line.
0
 
johanntagleCommented:
To clarify, if your input is like this:

one two three four five
six seven
eight nine ten eleven
twelve thirteen fourteen fifteen sixteen
seventeeen eighteen nineteen
twenty

Will the output be like:
one two three
four five six
seven eight nine
ten eleven twelve
thirteen fourteen fifteen
sixteen seventeen eighteen
nineteen twenty

OR will it be like
one two three
four five twenty
six seven eight
nine ten eleven
twelve thirteen fourteen
fifteen sixteen seventeeen
eighteen nineteen

In the second example "twenty" got placed to the front because you said "In case a line contains 5 words and 2 words are moved to a new line, then any line with one word in the original file should be appended at the end or beginning of the line to make a 3 word record. "

The first output option should be easy, the second one could get complicated as you will need to read the whole file first and find lines that will "fit the puzzle".

Am curious - can you give me an idea what you need this for?
0
 
wilcoxonCommented:
To explain why the output is done that way...

one two three - first 3 words of first line
four five twenty - last two words of first line plus the only one-word line in the original file
six seven eleven - line two plus the first one word line in the new file
eight nine ten - first 3 words of third line
twelve thirteen fourteen - first 3 words of fourth line
fifteen sixteen - last two words of fourth line - no more single words so left as two words
seventeeen eighteen nineteen - fifth line
0
 
faithless1Author Commented:
Excellent, does exactly what I need. Thank you!!!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.