Solved

Perl s

Posted on 2013-11-27
12
468 Views
Last Modified: 2013-12-13
I need to iterate over a stanza of poetry and index each word. Desired result is shown as an example below.

I know how to open and edit files in Perl. I can slurp up the line but I need some RegEx magic to get at the word position and line number in which each word is located. I am familiar with some regex, and I comfortable with looking up the meaning of regular expressions that you build when answering this question.

The known structure of input file: all textfiles have at least 1 stanza, each stanza is only 9 lines (this means that "stanza count"  is 1-to-many, and "line count" should never be more than 9) you do not have to build a wrpper program for this rexular expression but, please be willing to build the output Perl code - this is the end goal, the index output.
 
Example stanza text from "textfile.txt":

LO I the man, whose Muse whilome did maske,
As time her taught, in lowly Shepheards weeds,
Am now enforst a far vnfitter taske,
For trumpets sterne to chaunge mine Oaten reeds,
And sing of Knights and Ladies gentle deeds;
Whose prayses hauing slept in silence long,
Me, all too meane, the sacred Muse areeds
To blazon broad emongst her learned throng:
Fierce warres and faithfull loues shall moralize my song.



Example output for "textfile.txt":

##########
FILENAME: textfile.txt
STANZA COUNT: 1
TOTAL LINE COUNT: 9
TOTAL WORD COUNT: 71
UNIQUE WORDS COUNT: 65

UNIQUE WORDS: lo, i, the, man, whose, muse, did, whilome, maske, as, time, her, taught, in, lowly, shepheards, weeds, am, now, enforst, a, far, vnfitter, taske, for, trumpets, sterne, to, chaunge, mine, oaten, reeds, and, sing, of, knights, ladies, gentle, deeds, prayses, hauing, slept, silence, long, me, all, too, meane, the, sacred, areeds, blazon, broad, emongst, her, learned, throng, fierce, warres, faithful, loues, shall, moralize, my, song

REPEATED WORDS: the, and, her

INDEX OF WORDS: (this means all words of textfile.txt)
(layout is: word, filename, stanza, line, location)

a, textfile.txt, stanza 1, line 3, word 4
and, textfile.txt, stanza 1, line 5, word 1; stanza 1, line 5, word 5; stanza 1, line 9, word 3
all, textfile.txt, stanza 1, line 7, word 2
...
blazon, textfile.txt, stanza 1, line 8, word 2
...
her, textfile.txt, stanza 1, line 2, word 3; stanza 1, line 8, word 5
...
song, textfile.txt, stanza 1, line 9, word 9

##########

Notice that REPEATED WORDS "and", and "her" in this example output are indexed with multiple locations.
suggestions on output are welcomed..

acceptable "INDEX OF WORDS" output could be simplfied as such:

INDEX OF WORD WORDS:
(layout is: word, filename, stanza, line, location)

a, textfile.txt, 1,3,4
and, textfile.txt, 1,5,1; 1,5,5; 1,9,3
all, textfile.txt, 1,7,2
...
blazon, textfile.txt, 1,8,2
...
her, textfile.txt, 1,2,3; 1,8,5
...
song, textfile.txt, 1,9,9

"stanza, line, location" is a set, if a word such as "and" appears in a second stanza then output should look like this:

INDEX OF WORD WORDS:
(layout is: word, filename, stanza, line, location)

...
and, textfile.txt, 1,5,1; 1,5,5; 1,9,3, 2,4,1; 2,6,3; 2,6,7; 3,3,1; 3,6,2
...
in this example output the word "and" appears in stanza 1, 2, and 3 in multiple locations.
0
Comment
Question by:Todd Weaver
  • 9
  • 3
12 Comments
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
Comment Utility
This is more involved than a regex (maybe possible in regex but it would be a hideously complex and unmaintainable regex)...
use strict;
use warnings;
use File::Slurp qw(slurp);
my $fil = shift or die "Usage: $0 textfile\n";
my @lines = slurp($fil);
my ($stanza, $word_cnt, %uwords);
# loop over all lines
while (@lines) {
    $stanza++;
    # loop over the lines in the stanza
    for my $ln (1..9) {
        my $line = shift @lines or die "ran out of lines mid-stanza";
        # split the line and remove punctuation - can add others to char class
        my @words = map { s{[.,;:]$}{}; $_ } split m{\s+}, $line;
        # loop over words
        for my $i (0..@words-1) {
            $word_cnt++;
            $uwords{$words[$i]} = [] unless exists($uwords{$words[$i]});
            push @{$uwords{$words[$i]}}, [$stanza, $line, $i+1];
        }
    }
    # if there are blank lines between stanzas, uncomment this
    #shift @lines while ($lines[0] =~ m{^\s*$});
}
# output
print "FILENAME: $fil\n",
    "STANZA COUNT: $stanza\n",
    "TOTAL LINE COUNT: ", $stanza*9, "\n",
    "TOTAL WORD COUNT: $word_cnt\n",
    "UNIQUE WORD COUNT: ", scalar(keys %uwords), "\n\n",
    "UNIQUE WORDS: ", join(', ', sort keys %uwords), "\n\n",
    "REPEATED WORDS: ", grep({ @{$uwords{$_}}-1 } sort keys %uwords), "\n\n",
    "INDEX OF WORDS:\n";
foreach my $word (sort keys %uwords) {
    print "$word, $fil, ", join('; ', map { join ',', @$_ } @{$uwords{$word}}), "\n";
}

Open in new window

0
 

Author Comment

by:Todd Weaver
Comment Utility
tested.. excellent work! I am glad that you have posted the entire script. This seems more like a proposal and a prototype solution.  I am very happy with the results.

Thank you for being so very thorough. there's a lot of perl here that I am going to have to look up, such as "map".. I would like to return soon with any questions but I think you have provided me with a complete answer to my question.
0
 

Author Closing Comment

by:Todd Weaver
Comment Utility
tested.. excellent work! I am glad that you have posted the entire script. This seems more like a proposal and a prototype solution.  I am very happy with the results.

Thank you for being so very thorough. there's a lot of perl here that I am going to have to look up, such as "map".. I would like to return soon with any questions but I think you have provided me with a complete answer to my question.
0
 

Author Comment

by:Todd Weaver
Comment Utility
.. nevermind this comment, I was missing a closing brace.
0
 

Author Comment

by:Todd Weaver
Comment Utility
there are blank lines between stanzas; at the instruction to "uncomment"  line 23 if blank lines exist between stanzas does not seem to work properly.

Please help me to skip blank lines in the text.

example text input:

Right well I wote most mighty Soueraine,
That all this famous antique history,
Of some th'aboundance of an idle braine
Will iudged be, and painted forgery,
Rather then matter of iust memory,
Sith none, that breatheth liuing aire, does know,
Where is that happy land of Faery,
Which I so much do vaunt, yet no where show,
But vouch antiquities, which no body can know.

But let that man with better sence aduize,
That of the world least part to vs is red:
And dayly how through hardy enterprize,
Many great Regions are discouered,
Which to late age were neuer mentioned.
Who euer heard of th'Indian Peru?
Or who in venturous vessell measured
The Amazon huge riuer now found trew?
Or fruitfullest Virginia who did euer vew?
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
Two minor changes:
1) Add chomp => 1 to the slurp call (I could have sworn this was the default behavior).
2) Add a remove blank lines shift line before the stanza loop too (just to be sure).
use strict;
use warnings;
use File::Slurp qw(slurp);
my $fil = shift or die "Usage: $0 textfile\n";
my @lines = slurp($fil, chomp => 1);
my ($stanza, $word_cnt, %uwords);
# loop over all lines
while (@lines) {
    $stanza++;
    # make sure there are no blank lines to start
    shift @lines while ($lines[0] =~ m{^\s*$});
    # loop over the lines in the stanza
    for my $ln (1..9) {
        my $line = shift @lines or die "ran out of lines mid-stanza";
        # split the line and remove punctuation - can add others to char class
        my @words = map { s{[.,;:]$}{}; $_ } split m{\s+}, $line;
        # loop over words
        for my $i (0..@words-1) {
            $word_cnt++;
            $uwords{$words[$i]} = [] unless exists($uwords{$words[$i]});
            push @{$uwords{$words[$i]}}, [$stanza, $line, $i+1];
        }
    }
    # if there are blank lines between stanzas, uncomment this
    shift @lines while ($lines[0] =~ m{^\s*$});
}
# output
print "FILENAME: $fil\n",
    "STANZA COUNT: $stanza\n",
    "TOTAL LINE COUNT: ", $stanza*9, "\n",
    "TOTAL WORD COUNT: $word_cnt\n",
    "UNIQUE WORD COUNT: ", scalar(keys %uwords), "\n\n",
    "UNIQUE WORDS: ", join(', ', sort keys %uwords), "\n\n",
    "REPEATED WORDS: ", grep({ @{$uwords{$_}}-1 } sort keys %uwords), "\n\n",
    "INDEX OF WORDS:\n";
foreach my $word (sort keys %uwords) {
    print "$word, $fil, ", join('; ', map { join ',', @$_ } @{$uwords{$word}}), "\n";
}

Open in new window

0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 

Author Comment

by:Todd Weaver
Comment Utility
not the expected output, something has changed. With the lower shift commented the script runs, with the lower shift uncommented - I get and uninitialized error..

the index of words is goofed.

_______________________________________________________________________
INPUT

Right well I wote most mighty Soueraine,
That all this famous antique history,
Of some th'aboundance of an idle braine
Will iudged be, and painted forgery,
Rather then matter of iust memory,
Sith none, that breatheth liuing aire, does know,
Where is that happy land of Faery,
Which I so much do vaunt, yet no where show,
But vouch antiquities, which no body can know.

But let that man with better sence aduize,
That of the world least part to vs is red:
And dayly how through hardy enterprize,
Many great Regions are discouered,
Which to late age were neuer mentioned.
Who euer heard of th'Indian Peru?
Or who in venturous vessell measured
The Amazon huge riuer now found trew?
Or fruitfullest Virginia who did euer vew?



OUTPUT
_______________________________________________________________________
C:\Spenser>perl concordance.pl fq.txt1
FILENAME: fq.txt1
STANZA COUNT: 2
TOTAL LINE COUNT: 18
TOTAL WORD COUNT: 127
UNIQUE WORD COUNT: 110

UNIQUE WORDS: Amazon, And, But, Faery, I, Many, Of, Or, Peru?, Rather, Regions,
Right, Sith, Soueraine, That, The, Virginia, Where, Which, Who, Will, aduize, ag
e, aire, all, an, and, antique, antiquities, are, be, better, body, braine, brea
theth, can, dayly, did, discouered, do, does, enterprize, euer, famous, forgery,
 found, fruitfullest, great, happy, hardy, heard, history, how, huge, idle, in,
is, iudged, iust, know, land, late, least, let, liuing, man, matter, measured, m
emory, mentioned, mighty, most, much, neuer, no, none, now, of, painted, part, r
ed, riuer, sence, show, so, some, th'Indian, th'aboundance, that, the, then, thi
s, through, to, trew?, vaunt, venturous, vessell, vew?, vouch, vs, well, were, w
here, which, who, with, world, wote, yet

REPEATED WORDS: ButIOrThatWhicheuerisknownoofthattowho

INDEX OF WORDS:
,2azon, fq.txt1, 2,The Amazon huge riuer now found trew?
,1d, fq.txt1, 2,And dayly how through hardy enterprize,
,1; 2,But let that man with better sence aduize, ody can know.
,7ery, fq.txt1, 1,Where is that happy land of Faery,
,2; 1,Which I so much do vaunt, yet no where show,ine,
,1ny, fq.txt1, 2,Many great Regions are discouered,
,1, fq.txt1, 1,Of some th'aboundance of an idle braine
,1; 2,Or fruitfullest Virginia who did euer vew?,1d
,6ru?, fq.txt1, 2,Who euer heard of th'Indian Peru?
,1ther, fq.txt1, 1,Rather then matter of iust memory,
,3gions, fq.txt1, 2,Many great Regions are discouered,
,1ght, fq.txt1, 1,Right well I wote most mighty Soueraine,
,1th, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,7ueraine, fq.txt1, 1,Right well I wote most mighty Soueraine,
,1; 2,That of the world least part to vs is red: tory,
,1e, fq.txt1, 2,The Amazon huge riuer now found trew?
Virginia, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,3
,1ere, fq.txt1, 1,Where is that happy land of Faery,
,1; 2,Which to late age were neuer mentioned. t no where show,
,1o, fq.txt1, 2,Who euer heard of th'Indian Peru?
,1ll, fq.txt1, 1,Will iudged be, and painted forgery,
,8uize, fq.txt1, 2,But let that man with better sence aduize,
,4e, fq.txt1, 2,Which to late age were neuer mentioned.
,6re, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,2l, fq.txt1, 1,That all this famous antique history,
,5, fq.txt1, 1,Of some th'aboundance of an idle braine
,4d, fq.txt1, 1,Will iudged be, and painted forgery,
,5tique, fq.txt1, 1,That all this famous antique history,
,3tiquities, fq.txt1, 1,But vouch antiquities, which no body can know.
,4e, fq.txt1, 2,Many great Regions are discouered,
,3, fq.txt1, 1,Will iudged be, and painted forgery,
,6tter, fq.txt1, 2,But let that man with better sence aduize,
,6dy, fq.txt1, 1,But vouch antiquities, which no body can know.
,7aine, fq.txt1, 1,Of some th'aboundance of an idle braine
,4eatheth, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,7n, fq.txt1, 1,But vouch antiquities, which no body can know.
,2yly, fq.txt1, 2,And dayly how through hardy enterprize,
did, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,5
,5scouered, fq.txt1, 2,Many great Regions are discouered,
,5, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,7es, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,6terprize, fq.txt1, 2,And dayly how through hardy enterprize,
,2; 2,Or fruitfullest Virginia who did euer vew?,6
,4mous, fq.txt1, 1,That all this famous antique history,
,6rgery, fq.txt1, 1,Will iudged be, and painted forgery,
,6und, fq.txt1, 2,The Amazon huge riuer now found trew?
fruitfullest, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,2
,2eat, fq.txt1, 2,Many great Regions are discouered,
,4ppy, fq.txt1, 1,Where is that happy land of Faery,
,5rdy, fq.txt1, 2,And dayly how through hardy enterprize,
,3ard, fq.txt1, 2,Who euer heard of th'Indian Peru?
,6story, fq.txt1, 1,That all this famous antique history,
,3w, fq.txt1, 2,And dayly how through hardy enterprize,
,3ge, fq.txt1, 2,The Amazon huge riuer now found trew?
,6le, fq.txt1, 1,Of some th'aboundance of an idle braine
,3, fq.txt1, 2,Or who in venturous vessell measured
,9; 2,That of the world least part to vs is red:
,2dged, fq.txt1, 1,Will iudged be, and painted forgery,
,5st, fq.txt1, 1,Rather then matter of iust memory,
,8; 1,But vouch antiquities, which no body can know.re, does know,
,5nd, fq.txt1, 1,Where is that happy land of Faery,
,3te, fq.txt1, 2,Which to late age were neuer mentioned.
,5ast, fq.txt1, 2,That of the world least part to vs is red:
,2t, fq.txt1, 2,But let that man with better sence aduize,
,5uing, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,4n, fq.txt1, 2,But let that man with better sence aduize,
,3tter, fq.txt1, 1,Rather then matter of iust memory,
,6asured, fq.txt1, 2,Or who in venturous vessell measured
,6mory, fq.txt1, 1,Rather then matter of iust memory,
,7ntioned, fq.txt1, 2,Which to late age were neuer mentioned.
,6ghty, fq.txt1, 1,Right well I wote most mighty Soueraine,
,5st, fq.txt1, 1,Right well I wote most mighty Soueraine,
,4ch, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,6uer, fq.txt1, 2,Which to late age were neuer mentioned.
,5; 1,But vouch antiquities, which no body can know.e show,
,2ne, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,5w, fq.txt1, 2,The Amazon huge riuer now found trew?
,4; 2,Who euer heard of th'Indian Peru?  is red: raine
,5inted, fq.txt1, 1,Will iudged be, and painted forgery,
,6rt, fq.txt1, 2,That of the world least part to vs is red:
,10, fq.txt1, 2,That of the world least part to vs is red:
,4uer, fq.txt1, 2,The Amazon huge riuer now found trew?
,7nce, fq.txt1, 2,But let that man with better sence aduize,
,10w, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,3, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,2me, fq.txt1, 1,Of some th'aboundance of an idle braine
,5'Indian, fq.txt1, 2,Who euer heard of th'Indian Peru?
,3'aboundance, fq.txt1, 1,Of some th'aboundance of an idle braine
,3; 2,But let that man with better sence aduize,  aire, does know,
,3e, fq.txt1, 2,That of the world least part to vs is red:
,2en, fq.txt1, 1,Rather then matter of iust memory,
,3is, fq.txt1, 1,That all this famous antique history,
,4rough, fq.txt1, 2,And dayly how through hardy enterprize,
,2; 2,Which to late age were neuer mentioned.  vs is red:
,7ew?, fq.txt1, 2,The Amazon huge riuer now found trew?
,6unt, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,4nturous, fq.txt1, 2,Or who in venturous vessell measured
,5ssell, fq.txt1, 2,Or who in venturous vessell measured
vew?, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,7
,2uch, fq.txt1, 1,But vouch antiquities, which no body can know.
,8, fq.txt1, 2,That of the world least part to vs is red:
,2ll, fq.txt1, 1,Right well I wote most mighty Soueraine,
,5re, fq.txt1, 2,Which to late age were neuer mentioned.
,9ere, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,4ich, fq.txt1, 1,But vouch antiquities, which no body can know.
,2; 2,Or fruitfullest Virginia who did euer vew?,4ed
,5th, fq.txt1, 2,But let that man with better sence aduize,
,4rld, fq.txt1, 2,That of the world least part to vs is red:
,4te, fq.txt1, 1,Right well I wote most mighty Soueraine,
,7t, fq.txt1, 1,Which I so much do vaunt, yet no where show,



SCRIPT
_______________________________________________________________________
use strict;
use warnings;
use File::Slurp qw(slurp);
my $fil = shift or die "Usage: $0 textfile\n";
my @lines = slurp($fil, chomp => 1);
my ($stanza, $word_cnt, %uwords);
# loop over all lines
while (@lines) {
    $stanza++;
    # make sure there are no blank lines to start
    shift @lines while ($lines[0] =~ m{^\s*$});
    # loop over the lines in the stanza
    for my $ln (1..9) {
        my $line = shift @lines or die "ran out of lines mid-stanza";
        # split the line and remove punctuation - can add others to char class
        my @words = map { s{[.,;:]$}{}; $_ } split m{\s+}, $line;
        # loop over words
        for my $i (0..@words-1) {
            $word_cnt++;
            $uwords{$words[$i]} = [] unless exists($uwords{$words[$i]});
            push @{$uwords{$words[$i]}}, [$stanza, $line, $i+1];
        }
    }
    # if there are blank lines between stanzas, uncomment this
    # shift @lines while ($lines[0] =~ m{^\s*$});
}
# output
print "FILENAME: $fil\n",
    "STANZA COUNT: $stanza\n",
    "TOTAL LINE COUNT: ", $stanza*9, "\n",
    "TOTAL WORD COUNT: $word_cnt\n",
    "UNIQUE WORD COUNT: ", scalar(keys %uwords), "\n\n",
    "UNIQUE WORDS: ", join(', ', sort keys %uwords), "\n\n",
    "REPEATED WORDS: ", grep({ @{$uwords{$_}}-1 } sort keys %uwords), "\n\n",
    "INDEX OF WORDS:\n";
foreach my $word (sort keys %uwords) {
    print "$word, $fil, ", join('; ', map { join ',', @$_ } @{$uwords{$word}}), "\n";
}
0
 

Author Comment

by:Todd Weaver
Comment Utility
"chomp is screwing up the @lines array, in the code below with the blank lines element inserted and the chomp line commented it works on the text with blank lines provided:




the INPUT:
_______________________________________________________________
Right well I wote most mighty Soueraine,
That all this famous antique history,
Of some th'aboundance of an idle braine
Will iudged be, and painted forgery,
Rather then matter of iust memory,
Sith none, that breatheth liuing aire, does know,
Where is that happy land of Faery,
Which I so much do vaunt, yet no where show,
But vouch antiquities, which no body can know.

But let that man with better sence aduize,
That of the world least part to vs is red:
And dayly how through hardy enterprize,
Many great Regions are discouered,
Which to late age were neuer mentioned.
Who euer heard of th'Indian Peru?
Or who in venturous vessell measured
The Amazon huge riuer now found trew?
Or fruitfullest Virginia who did euer vew?




the SCRIPT (notice @lines is initialized with no CHOMP and "skip blanks" is present):
_______________________________________________________________
use strict;
use warnings;
use File::Slurp qw(slurp);
my $fil = shift or die "Usage: $0 textfile\n";
my @lines = slurp($fil);
# my @lines = slurp($fil, chomp => 1);
my ($stanza, $word_cnt, %uwords);
# loop over all lines of the stanza
while (@lines) {
     # # make sure there are no blank lines to start
     shift @lines while ($lines[0] =~ m{^\s*$});
     # increase stanza count
     $stanza++;
     # loop over the lines in the 9-line stanza
     for my $ln (1..9) {
          my $line = shift @lines or die "ran out of lines mid-stanza";
        # split the line and remove punctuation - can add others to char class
        my @words = map { s{[.,;:]$}{}; $_ } split m{\s+}, $line;
        # loop over words
        for my $i (0..@words-1) {
             $word_cnt++;
             $uwords{$words[$i]} = [] unless exists($uwords{$words[$i]});
             push @{$uwords{$words[$i]}}, [$stanza, $line, $i+1];
         }
     }
}
# output
print "FILENAME: $fil\n",
    "STANZA COUNT: $stanza\n",
    "TOTAL LINE COUNT: ", $stanza*9, "\n",
    "TOTAL WORD COUNT: $word_cnt\n",
    "UNIQUE WORD COUNT: ", scalar(keys %uwords), "\n\n",
    "UNIQUE WORDS: ", join(', ', sort keys %uwords), "\n\n",
    "REPEATED WORDS: ", grep({ @{$uwords{$_}}-1 } sort keys %uwords), "\n\n",
    "INDEX OF WORDS:\n";
foreach my $word (sort keys %uwords) {
    print "$word, $fil, ", join('; ', map { join ',', @$_ } @{$uwords{$word}}), "\n";
}




the OUTPUT (is good, except the CR-comma-location at the end of each index, should be on same line...):
_______________________________________________________________
C:\Spenser>perl concordance.pl fq.txt1
FILENAME: fq.txt1
STANZA COUNT: 2
TOTAL LINE COUNT: 18
TOTAL WORD COUNT: 127
UNIQUE WORD COUNT: 110

UNIQUE WORDS: Amazon, And, But, Faery, I, Many, Of, Or, Peru?, Rather, Regions, Right, Sith, Souerai
ne, That, The, Virginia, Where, Which, Who, Will, aduize, age, aire, all, an, and, antique, antiquit
ies, are, be, better, body, braine, breatheth, can, dayly, did, discouered, do, does, enterprize, eu
er, famous, forgery, found, fruitfullest, great, happy, hardy, heard, history, how, huge, idle, in,
is, iudged, iust, know, land, late, least, let, liuing, man, matter, measured, memory, mentioned, mi
ghty, most, much, neuer, no, none, now, of, painted, part, red, riuer, sence, show, so, some, th'Ind
ian, th'aboundance, that, the, then, this, through, to, trew?, vaunt, venturous, vessell, vew?, vouc
h, vs, well, were, where, which, who, with, world, wote, yet

REPEATED WORDS: ButIOrThatWhicheuerisknownoofthattowho

INDEX OF WORDS:
Amazon, fq.txt1, 2,The Amazon huge riuer now found trew?
,2
And, fq.txt1, 2,And dayly how through hardy enterprize,
,1
But, fq.txt1, 1,But vouch antiquities, which no body can know.
,1; 2,But let that man with better sence aduize,
,1
Faery, fq.txt1, 1,Where is that happy land of Faery,
,7
I, fq.txt1, 1,Right well I wote most mighty Soueraine,
,3; 1,Which I so much do vaunt, yet no where show,
,2
Many, fq.txt1, 2,Many great Regions are discouered,
,1
Of, fq.txt1, 1,Of some th'aboundance of an idle braine
,1
Or, fq.txt1, 2,Or who in venturous vessell measured
,1; 2,Or fruitfullest Virginia who did euer vew?,1
Peru?, fq.txt1, 2,Who euer heard of th'Indian Peru?
,6
Rather, fq.txt1, 1,Rather then matter of iust memory,
,1
Regions, fq.txt1, 2,Many great Regions are discouered,
,3
Right, fq.txt1, 1,Right well I wote most mighty Soueraine,
,1
Sith, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,1
Soueraine, fq.txt1, 1,Right well I wote most mighty Soueraine,
,7
That, fq.txt1, 1,That all this famous antique history,
,1; 2,That of the world least part to vs is red:
,1
The, fq.txt1, 2,The Amazon huge riuer now found trew?
,1
Virginia, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,3
Where, fq.txt1, 1,Where is that happy land of Faery,
,1
Which, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,1; 2,Which to late age were neuer mentioned.
,1
Who, fq.txt1, 2,Who euer heard of th'Indian Peru?
,1
Will, fq.txt1, 1,Will iudged be, and painted forgery,
,1
aduize, fq.txt1, 2,But let that man with better sence aduize,
,8
age, fq.txt1, 2,Which to late age were neuer mentioned.
,4
aire, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,6
all, fq.txt1, 1,That all this famous antique history,
,2
an, fq.txt1, 1,Of some th'aboundance of an idle braine
,5
and, fq.txt1, 1,Will iudged be, and painted forgery,
,4
antique, fq.txt1, 1,That all this famous antique history,
,5
antiquities, fq.txt1, 1,But vouch antiquities, which no body can know.
,3
are, fq.txt1, 2,Many great Regions are discouered,
,4
be, fq.txt1, 1,Will iudged be, and painted forgery,
,3
better, fq.txt1, 2,But let that man with better sence aduize,
,6
body, fq.txt1, 1,But vouch antiquities, which no body can know.
,6
braine, fq.txt1, 1,Of some th'aboundance of an idle braine
,7
breatheth, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,4
can, fq.txt1, 1,But vouch antiquities, which no body can know.
,7
dayly, fq.txt1, 2,And dayly how through hardy enterprize,
,2
did, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,5
discouered, fq.txt1, 2,Many great Regions are discouered,
,5
do, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,5
does, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,7
enterprize, fq.txt1, 2,And dayly how through hardy enterprize,
,6
euer, fq.txt1, 2,Who euer heard of th'Indian Peru?
,2; 2,Or fruitfullest Virginia who did euer vew?,6
famous, fq.txt1, 1,That all this famous antique history,
,4
forgery, fq.txt1, 1,Will iudged be, and painted forgery,
,6
found, fq.txt1, 2,The Amazon huge riuer now found trew?
,6
fruitfullest, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,2
great, fq.txt1, 2,Many great Regions are discouered,
,2
happy, fq.txt1, 1,Where is that happy land of Faery,
,4
hardy, fq.txt1, 2,And dayly how through hardy enterprize,
,5
heard, fq.txt1, 2,Who euer heard of th'Indian Peru?
,3
history, fq.txt1, 1,That all this famous antique history,
,6
how, fq.txt1, 2,And dayly how through hardy enterprize,
,3
huge, fq.txt1, 2,The Amazon huge riuer now found trew?
,3
idle, fq.txt1, 1,Of some th'aboundance of an idle braine
,6
in, fq.txt1, 2,Or who in venturous vessell measured
,3
is, fq.txt1, 1,Where is that happy land of Faery,
,2; 2,That of the world least part to vs is red:
,9
iudged, fq.txt1, 1,Will iudged be, and painted forgery,
,2
iust, fq.txt1, 1,Rather then matter of iust memory,
,5
know, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,8; 1,But vouch antiquities, which no body can know.
,8
land, fq.txt1, 1,Where is that happy land of Faery,
,5
late, fq.txt1, 2,Which to late age were neuer mentioned.
,3
least, fq.txt1, 2,That of the world least part to vs is red:
,5
let, fq.txt1, 2,But let that man with better sence aduize,
,2
liuing, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,5
man, fq.txt1, 2,But let that man with better sence aduize,
,4
matter, fq.txt1, 1,Rather then matter of iust memory,
,3
measured, fq.txt1, 2,Or who in venturous vessell measured
,6
memory, fq.txt1, 1,Rather then matter of iust memory,
,6
mentioned, fq.txt1, 2,Which to late age were neuer mentioned.
,7
mighty, fq.txt1, 1,Right well I wote most mighty Soueraine,
,6
most, fq.txt1, 1,Right well I wote most mighty Soueraine,
,5
much, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,4
neuer, fq.txt1, 2,Which to late age were neuer mentioned.
,6
no, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,8; 1,But vouch antiquities, which no body can know.
,5
none, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,2
now, fq.txt1, 2,The Amazon huge riuer now found trew?
,5
of, fq.txt1, 1,Of some th'aboundance of an idle braine
,4; 1,Rather then matter of iust memory,
,4; 1,Where is that happy land of Faery,
,6; 2,That of the world least part to vs is red:
,2; 2,Who euer heard of th'Indian Peru?
,4
painted, fq.txt1, 1,Will iudged be, and painted forgery,
,5
part, fq.txt1, 2,That of the world least part to vs is red:
,6
red, fq.txt1, 2,That of the world least part to vs is red:
,10
riuer, fq.txt1, 2,The Amazon huge riuer now found trew?
,4
sence, fq.txt1, 2,But let that man with better sence aduize,
,7
show, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,10
so, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,3
some, fq.txt1, 1,Of some th'aboundance of an idle braine
,2
th'Indian, fq.txt1, 2,Who euer heard of th'Indian Peru?
,5
th'aboundance, fq.txt1, 1,Of some th'aboundance of an idle braine
,3
that, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,3; 1,Where is that happy land of Faery,
,3; 2,But let that man with better sence aduize,
,3
the, fq.txt1, 2,That of the world least part to vs is red:
,3
then, fq.txt1, 1,Rather then matter of iust memory,
,2
this, fq.txt1, 1,That all this famous antique history,
,3
through, fq.txt1, 2,And dayly how through hardy enterprize,
,4
to, fq.txt1, 2,That of the world least part to vs is red:
,7; 2,Which to late age were neuer mentioned.
,2
trew?, fq.txt1, 2,The Amazon huge riuer now found trew?
,7
vaunt, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,6
venturous, fq.txt1, 2,Or who in venturous vessell measured
,4
vessell, fq.txt1, 2,Or who in venturous vessell measured
,5
vew?, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,7
vouch, fq.txt1, 1,But vouch antiquities, which no body can know.
,2
vs, fq.txt1, 2,That of the world least part to vs is red:
,8
well, fq.txt1, 1,Right well I wote most mighty Soueraine,
,2
were, fq.txt1, 2,Which to late age were neuer mentioned.
,5
where, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,9
which, fq.txt1, 1,But vouch antiquities, which no body can know.
,4
who, fq.txt1, 2,Or who in venturous vessell measured
,2; 2,Or fruitfullest Virginia who did euer vew?,4
with, fq.txt1, 2,But let that man with better sence aduize,
,5
world, fq.txt1, 2,That of the world least part to vs is red:
,4
wote, fq.txt1, 1,Right well I wote most mighty Soueraine,
,4
yet, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,7

C:\Spenser>
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
Odd.  You must have an older buggy version of File::Slurp.  When I run the script, I provided I get the correct output (your second version but with everything on the right line).  Here's a simple little mod to work with slurp without chomp=>1...
use strict;
use warnings;
use File::Slurp qw(slurp);
my $fil = shift or die "Usage: $0 textfile\n";
#my @lines = slurp($fil, chomp => 1);
my @lines = slurp($fil);
my ($stanza, $word_cnt, %uwords);
# loop over all lines
while (@lines) {
    $stanza++;
    # make sure there are no blank lines to start
    shift @lines while ($lines[0] =~ m{^\s*$});
    # loop over the lines in the stanza
    for my $ln (1..9) {
        my $line = shift @lines or die "ran out of lines mid-stanza";
	chomp $line;
        # split the line and remove punctuation - can add others to char class
        my @words = map { s{[.,;:]$}{}; $_ } split m{\s+}, $line;
        # loop over words
        for my $i (0..@words-1) {
            $word_cnt++;
            $uwords{$words[$i]} = [] unless exists($uwords{$words[$i]});
            push @{$uwords{$words[$i]}}, [$stanza, $line, $i+1];
        }
    }
    # if there are blank lines between stanzas, uncomment this
    # this shouldn't actually be necessary with the first shift @lines above
    shift @lines while (@lines and $lines[0] =~ m{^\s*$});
}
# output
print "FILENAME: $fil\n",
    "STANZA COUNT: $stanza\n",
    "TOTAL LINE COUNT: ", $stanza*9, "\n",
    "TOTAL WORD COUNT: $word_cnt\n",
    "UNIQUE WORD COUNT: ", scalar(keys %uwords), "\n\n",
    "UNIQUE WORDS: ", join(', ', sort keys %uwords), "\n\n",
    "REPEATED WORDS: ", join(', ', grep({ @{$uwords{$_}}-1 } sort keys %uwords)), "\n\n",
    "INDEX OF WORDS:\n";
foreach my $word (sort keys %uwords) {
    print "$word, $fil, ", join('; ', map { join ',', @$_ } @{$uwords{$word}}), "\n";
}

Open in new window

0
 

Author Comment

by:Todd Weaver
Comment Utility
happier to work with your version of Slurp.  how can I know what is the version I have and how can I replace it? I used the internet to find step to install Slurp on my machine - I remember it was a command line entry and it worked on a PPM (I think...)

the INDEX OF WORDS is also bad.., but it is bad on the same line...


C:\Spenser>perl concordance.pl fq.txt1
FILENAME: fq.txt1
STANZA COUNT: 2
TOTAL LINE COUNT: 18
TOTAL WORD COUNT: 127
UNIQUE WORD COUNT: 110

UNIQUE WORDS: Amazon, And, But, Faery, I, Many, Of, Or, Peru?, Rather, Regions, Right, Sith, Souerai
ne, That, The, Virginia, Where, Which, Who, Will, aduize, age, aire, all, an, and, antique, antiquit
ies, are, be, better, body, braine, breatheth, can, dayly, did, discouered, do, does, enterprize, eu
er, famous, forgery, found, fruitfullest, great, happy, hardy, heard, history, how, huge, idle, in,
is, iudged, iust, know, land, late, least, let, liuing, man, matter, measured, memory, mentioned, mi
ghty, most, much, neuer, no, none, now, of, painted, part, red, riuer, sence, show, so, some, th'Ind
ian, th'aboundance, that, the, then, this, through, to, trew?, vaunt, venturous, vessell, vew?, vouc
h, vs, well, were, where, which, who, with, world, wote, yet

REPEATED WORDS: But, I, Or, That, Which, euer, is, know, no, of, that, to, who

INDEX OF WORDS:
,2azon, fq.txt1, 2,The Amazon huge riuer now found trew?
,1d, fq.txt1, 2,And dayly how through hardy enterprize,
,1; 2,But let that man with better sence aduize, ody can know.
,7ery, fq.txt1, 1,Where is that happy land of Faery,
,2; 1,Which I so much do vaunt, yet no where show,ine,
0
 

Author Comment

by:Todd Weaver
Comment Utility
also, pls change INDEX OF WORDS output to:

INDEX OF WORDS:
WORD | FILNAME | STANZA | LOCATION | LINE [; STANZA | LOCATION | LINE; etc...]
Amazon, fq.txt1, 2, 2, The Amazon huge riuer now found trew?
And, fq.txt1, 2, 1, And dayly how through hardy enterprize,
But, fq.txt1, 1, 1, But vouch antiquities, which no body can know.; 2, 1, But let that man with better sence aduize,
Faery, fq.txt1, 1, 7, Where is that happy land of Faery,
I, fq.txt1, 1, 3, Right well I wote most mighty Soueraine,; 1, 2, Which I so much do vaunt, yet no where show,
0
 

Author Comment

by:Todd Weaver
Comment Utility
C:\Spenser>perl -v

This is perl 5, version 16, subversion 3 (v5.16.3) built for MSWin32-x64-multi-thread
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2012, Larry Wall

Binary build 1603 [296746] provided by ActiveState http://www.ActiveState.com
Built Mar 13 2013 13:31:10

...

(Slurp.pm lives at "C:\Perl64\site\lib\File")

package File::Slurp;

use 5.6.2 ;

use strict;
use warnings ;

use Carp ;
use Exporter ;
use Fcntl qw( :DEFAULT ) ;
use POSIX qw( :fcntl_h ) ;
use Errno ;
#use Symbol ;

use vars qw( @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS $VERSION ) ;
@ISA = qw( Exporter ) ;

$VERSION = '9999.19';
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
bigHeights  challenge 13 55
Python 2.7 - French characters 6 46
move one pixel 4 38
C# code editing and collaboration 3 39
If you haven’t already, I encourage you to read the first article (http://www.experts-exchange.com/articles/18680/An-Introduction-to-R-Programming-and-R-Studio.html) in my series to gain a basic foundation of R and R Studio.  You will also find the …
This is about my first experience with programming Arduino.
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now