Solved

Perl s

Posted on 2013-11-27
12
472 Views
Last Modified: 2013-12-13
I need to iterate over a stanza of poetry and index each word. Desired result is shown as an example below.

I know how to open and edit files in Perl. I can slurp up the line but I need some RegEx magic to get at the word position and line number in which each word is located. I am familiar with some regex, and I comfortable with looking up the meaning of regular expressions that you build when answering this question.

The known structure of input file: all textfiles have at least 1 stanza, each stanza is only 9 lines (this means that "stanza count"  is 1-to-many, and "line count" should never be more than 9) you do not have to build a wrpper program for this rexular expression but, please be willing to build the output Perl code - this is the end goal, the index output.
 
Example stanza text from "textfile.txt":

LO I the man, whose Muse whilome did maske,
As time her taught, in lowly Shepheards weeds,
Am now enforst a far vnfitter taske,
For trumpets sterne to chaunge mine Oaten reeds,
And sing of Knights and Ladies gentle deeds;
Whose prayses hauing slept in silence long,
Me, all too meane, the sacred Muse areeds
To blazon broad emongst her learned throng:
Fierce warres and faithfull loues shall moralize my song.



Example output for "textfile.txt":

##########
FILENAME: textfile.txt
STANZA COUNT: 1
TOTAL LINE COUNT: 9
TOTAL WORD COUNT: 71
UNIQUE WORDS COUNT: 65

UNIQUE WORDS: lo, i, the, man, whose, muse, did, whilome, maske, as, time, her, taught, in, lowly, shepheards, weeds, am, now, enforst, a, far, vnfitter, taske, for, trumpets, sterne, to, chaunge, mine, oaten, reeds, and, sing, of, knights, ladies, gentle, deeds, prayses, hauing, slept, silence, long, me, all, too, meane, the, sacred, areeds, blazon, broad, emongst, her, learned, throng, fierce, warres, faithful, loues, shall, moralize, my, song

REPEATED WORDS: the, and, her

INDEX OF WORDS: (this means all words of textfile.txt)
(layout is: word, filename, stanza, line, location)

a, textfile.txt, stanza 1, line 3, word 4
and, textfile.txt, stanza 1, line 5, word 1; stanza 1, line 5, word 5; stanza 1, line 9, word 3
all, textfile.txt, stanza 1, line 7, word 2
...
blazon, textfile.txt, stanza 1, line 8, word 2
...
her, textfile.txt, stanza 1, line 2, word 3; stanza 1, line 8, word 5
...
song, textfile.txt, stanza 1, line 9, word 9

##########

Notice that REPEATED WORDS "and", and "her" in this example output are indexed with multiple locations.
suggestions on output are welcomed..

acceptable "INDEX OF WORDS" output could be simplfied as such:

INDEX OF WORD WORDS:
(layout is: word, filename, stanza, line, location)

a, textfile.txt, 1,3,4
and, textfile.txt, 1,5,1; 1,5,5; 1,9,3
all, textfile.txt, 1,7,2
...
blazon, textfile.txt, 1,8,2
...
her, textfile.txt, 1,2,3; 1,8,5
...
song, textfile.txt, 1,9,9

"stanza, line, location" is a set, if a word such as "and" appears in a second stanza then output should look like this:

INDEX OF WORD WORDS:
(layout is: word, filename, stanza, line, location)

...
and, textfile.txt, 1,5,1; 1,5,5; 1,9,3, 2,4,1; 2,6,3; 2,6,7; 3,3,1; 3,6,2
...
in this example output the word "and" appears in stanza 1, 2, and 3 in multiple locations.
0
Comment
Question by:Todd Weaver
  • 9
  • 3
12 Comments
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 39681600
This is more involved than a regex (maybe possible in regex but it would be a hideously complex and unmaintainable regex)...
use strict;
use warnings;
use File::Slurp qw(slurp);
my $fil = shift or die "Usage: $0 textfile\n";
my @lines = slurp($fil);
my ($stanza, $word_cnt, %uwords);
# loop over all lines
while (@lines) {
    $stanza++;
    # loop over the lines in the stanza
    for my $ln (1..9) {
        my $line = shift @lines or die "ran out of lines mid-stanza";
        # split the line and remove punctuation - can add others to char class
        my @words = map { s{[.,;:]$}{}; $_ } split m{\s+}, $line;
        # loop over words
        for my $i (0..@words-1) {
            $word_cnt++;
            $uwords{$words[$i]} = [] unless exists($uwords{$words[$i]});
            push @{$uwords{$words[$i]}}, [$stanza, $line, $i+1];
        }
    }
    # if there are blank lines between stanzas, uncomment this
    #shift @lines while ($lines[0] =~ m{^\s*$});
}
# output
print "FILENAME: $fil\n",
    "STANZA COUNT: $stanza\n",
    "TOTAL LINE COUNT: ", $stanza*9, "\n",
    "TOTAL WORD COUNT: $word_cnt\n",
    "UNIQUE WORD COUNT: ", scalar(keys %uwords), "\n\n",
    "UNIQUE WORDS: ", join(', ', sort keys %uwords), "\n\n",
    "REPEATED WORDS: ", grep({ @{$uwords{$_}}-1 } sort keys %uwords), "\n\n",
    "INDEX OF WORDS:\n";
foreach my $word (sort keys %uwords) {
    print "$word, $fil, ", join('; ', map { join ',', @$_ } @{$uwords{$word}}), "\n";
}

Open in new window

0
 

Author Comment

by:Todd Weaver
ID: 39682104
tested.. excellent work! I am glad that you have posted the entire script. This seems more like a proposal and a prototype solution.  I am very happy with the results.

Thank you for being so very thorough. there's a lot of perl here that I am going to have to look up, such as "map".. I would like to return soon with any questions but I think you have provided me with a complete answer to my question.
0
 

Author Closing Comment

by:Todd Weaver
ID: 39682107
tested.. excellent work! I am glad that you have posted the entire script. This seems more like a proposal and a prototype solution.  I am very happy with the results.

Thank you for being so very thorough. there's a lot of perl here that I am going to have to look up, such as "map".. I would like to return soon with any questions but I think you have provided me with a complete answer to my question.
0
 

Author Comment

by:Todd Weaver
ID: 39690294
.. nevermind this comment, I was missing a closing brace.
0
 

Author Comment

by:Todd Weaver
ID: 39708970
there are blank lines between stanzas; at the instruction to "uncomment"  line 23 if blank lines exist between stanzas does not seem to work properly.

Please help me to skip blank lines in the text.

example text input:

Right well I wote most mighty Soueraine,
That all this famous antique history,
Of some th'aboundance of an idle braine
Will iudged be, and painted forgery,
Rather then matter of iust memory,
Sith none, that breatheth liuing aire, does know,
Where is that happy land of Faery,
Which I so much do vaunt, yet no where show,
But vouch antiquities, which no body can know.

But let that man with better sence aduize,
That of the world least part to vs is red:
And dayly how through hardy enterprize,
Many great Regions are discouered,
Which to late age were neuer mentioned.
Who euer heard of th'Indian Peru?
Or who in venturous vessell measured
The Amazon huge riuer now found trew?
Or fruitfullest Virginia who did euer vew?
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 39710566
Two minor changes:
1) Add chomp => 1 to the slurp call (I could have sworn this was the default behavior).
2) Add a remove blank lines shift line before the stanza loop too (just to be sure).
use strict;
use warnings;
use File::Slurp qw(slurp);
my $fil = shift or die "Usage: $0 textfile\n";
my @lines = slurp($fil, chomp => 1);
my ($stanza, $word_cnt, %uwords);
# loop over all lines
while (@lines) {
    $stanza++;
    # make sure there are no blank lines to start
    shift @lines while ($lines[0] =~ m{^\s*$});
    # loop over the lines in the stanza
    for my $ln (1..9) {
        my $line = shift @lines or die "ran out of lines mid-stanza";
        # split the line and remove punctuation - can add others to char class
        my @words = map { s{[.,;:]$}{}; $_ } split m{\s+}, $line;
        # loop over words
        for my $i (0..@words-1) {
            $word_cnt++;
            $uwords{$words[$i]} = [] unless exists($uwords{$words[$i]});
            push @{$uwords{$words[$i]}}, [$stanza, $line, $i+1];
        }
    }
    # if there are blank lines between stanzas, uncomment this
    shift @lines while ($lines[0] =~ m{^\s*$});
}
# output
print "FILENAME: $fil\n",
    "STANZA COUNT: $stanza\n",
    "TOTAL LINE COUNT: ", $stanza*9, "\n",
    "TOTAL WORD COUNT: $word_cnt\n",
    "UNIQUE WORD COUNT: ", scalar(keys %uwords), "\n\n",
    "UNIQUE WORDS: ", join(', ', sort keys %uwords), "\n\n",
    "REPEATED WORDS: ", grep({ @{$uwords{$_}}-1 } sort keys %uwords), "\n\n",
    "INDEX OF WORDS:\n";
foreach my $word (sort keys %uwords) {
    print "$word, $fil, ", join('; ', map { join ',', @$_ } @{$uwords{$word}}), "\n";
}

Open in new window

0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:Todd Weaver
ID: 39713161
not the expected output, something has changed. With the lower shift commented the script runs, with the lower shift uncommented - I get and uninitialized error..

the index of words is goofed.

_______________________________________________________________________
INPUT

Right well I wote most mighty Soueraine,
That all this famous antique history,
Of some th'aboundance of an idle braine
Will iudged be, and painted forgery,
Rather then matter of iust memory,
Sith none, that breatheth liuing aire, does know,
Where is that happy land of Faery,
Which I so much do vaunt, yet no where show,
But vouch antiquities, which no body can know.

But let that man with better sence aduize,
That of the world least part to vs is red:
And dayly how through hardy enterprize,
Many great Regions are discouered,
Which to late age were neuer mentioned.
Who euer heard of th'Indian Peru?
Or who in venturous vessell measured
The Amazon huge riuer now found trew?
Or fruitfullest Virginia who did euer vew?



OUTPUT
_______________________________________________________________________
C:\Spenser>perl concordance.pl fq.txt1
FILENAME: fq.txt1
STANZA COUNT: 2
TOTAL LINE COUNT: 18
TOTAL WORD COUNT: 127
UNIQUE WORD COUNT: 110

UNIQUE WORDS: Amazon, And, But, Faery, I, Many, Of, Or, Peru?, Rather, Regions,
Right, Sith, Soueraine, That, The, Virginia, Where, Which, Who, Will, aduize, ag
e, aire, all, an, and, antique, antiquities, are, be, better, body, braine, brea
theth, can, dayly, did, discouered, do, does, enterprize, euer, famous, forgery,
 found, fruitfullest, great, happy, hardy, heard, history, how, huge, idle, in,
is, iudged, iust, know, land, late, least, let, liuing, man, matter, measured, m
emory, mentioned, mighty, most, much, neuer, no, none, now, of, painted, part, r
ed, riuer, sence, show, so, some, th'Indian, th'aboundance, that, the, then, thi
s, through, to, trew?, vaunt, venturous, vessell, vew?, vouch, vs, well, were, w
here, which, who, with, world, wote, yet

REPEATED WORDS: ButIOrThatWhicheuerisknownoofthattowho

INDEX OF WORDS:
,2azon, fq.txt1, 2,The Amazon huge riuer now found trew?
,1d, fq.txt1, 2,And dayly how through hardy enterprize,
,1; 2,But let that man with better sence aduize, ody can know.
,7ery, fq.txt1, 1,Where is that happy land of Faery,
,2; 1,Which I so much do vaunt, yet no where show,ine,
,1ny, fq.txt1, 2,Many great Regions are discouered,
,1, fq.txt1, 1,Of some th'aboundance of an idle braine
,1; 2,Or fruitfullest Virginia who did euer vew?,1d
,6ru?, fq.txt1, 2,Who euer heard of th'Indian Peru?
,1ther, fq.txt1, 1,Rather then matter of iust memory,
,3gions, fq.txt1, 2,Many great Regions are discouered,
,1ght, fq.txt1, 1,Right well I wote most mighty Soueraine,
,1th, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,7ueraine, fq.txt1, 1,Right well I wote most mighty Soueraine,
,1; 2,That of the world least part to vs is red: tory,
,1e, fq.txt1, 2,The Amazon huge riuer now found trew?
Virginia, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,3
,1ere, fq.txt1, 1,Where is that happy land of Faery,
,1; 2,Which to late age were neuer mentioned. t no where show,
,1o, fq.txt1, 2,Who euer heard of th'Indian Peru?
,1ll, fq.txt1, 1,Will iudged be, and painted forgery,
,8uize, fq.txt1, 2,But let that man with better sence aduize,
,4e, fq.txt1, 2,Which to late age were neuer mentioned.
,6re, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,2l, fq.txt1, 1,That all this famous antique history,
,5, fq.txt1, 1,Of some th'aboundance of an idle braine
,4d, fq.txt1, 1,Will iudged be, and painted forgery,
,5tique, fq.txt1, 1,That all this famous antique history,
,3tiquities, fq.txt1, 1,But vouch antiquities, which no body can know.
,4e, fq.txt1, 2,Many great Regions are discouered,
,3, fq.txt1, 1,Will iudged be, and painted forgery,
,6tter, fq.txt1, 2,But let that man with better sence aduize,
,6dy, fq.txt1, 1,But vouch antiquities, which no body can know.
,7aine, fq.txt1, 1,Of some th'aboundance of an idle braine
,4eatheth, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,7n, fq.txt1, 1,But vouch antiquities, which no body can know.
,2yly, fq.txt1, 2,And dayly how through hardy enterprize,
did, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,5
,5scouered, fq.txt1, 2,Many great Regions are discouered,
,5, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,7es, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,6terprize, fq.txt1, 2,And dayly how through hardy enterprize,
,2; 2,Or fruitfullest Virginia who did euer vew?,6
,4mous, fq.txt1, 1,That all this famous antique history,
,6rgery, fq.txt1, 1,Will iudged be, and painted forgery,
,6und, fq.txt1, 2,The Amazon huge riuer now found trew?
fruitfullest, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,2
,2eat, fq.txt1, 2,Many great Regions are discouered,
,4ppy, fq.txt1, 1,Where is that happy land of Faery,
,5rdy, fq.txt1, 2,And dayly how through hardy enterprize,
,3ard, fq.txt1, 2,Who euer heard of th'Indian Peru?
,6story, fq.txt1, 1,That all this famous antique history,
,3w, fq.txt1, 2,And dayly how through hardy enterprize,
,3ge, fq.txt1, 2,The Amazon huge riuer now found trew?
,6le, fq.txt1, 1,Of some th'aboundance of an idle braine
,3, fq.txt1, 2,Or who in venturous vessell measured
,9; 2,That of the world least part to vs is red:
,2dged, fq.txt1, 1,Will iudged be, and painted forgery,
,5st, fq.txt1, 1,Rather then matter of iust memory,
,8; 1,But vouch antiquities, which no body can know.re, does know,
,5nd, fq.txt1, 1,Where is that happy land of Faery,
,3te, fq.txt1, 2,Which to late age were neuer mentioned.
,5ast, fq.txt1, 2,That of the world least part to vs is red:
,2t, fq.txt1, 2,But let that man with better sence aduize,
,5uing, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,4n, fq.txt1, 2,But let that man with better sence aduize,
,3tter, fq.txt1, 1,Rather then matter of iust memory,
,6asured, fq.txt1, 2,Or who in venturous vessell measured
,6mory, fq.txt1, 1,Rather then matter of iust memory,
,7ntioned, fq.txt1, 2,Which to late age were neuer mentioned.
,6ghty, fq.txt1, 1,Right well I wote most mighty Soueraine,
,5st, fq.txt1, 1,Right well I wote most mighty Soueraine,
,4ch, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,6uer, fq.txt1, 2,Which to late age were neuer mentioned.
,5; 1,But vouch antiquities, which no body can know.e show,
,2ne, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,5w, fq.txt1, 2,The Amazon huge riuer now found trew?
,4; 2,Who euer heard of th'Indian Peru?  is red: raine
,5inted, fq.txt1, 1,Will iudged be, and painted forgery,
,6rt, fq.txt1, 2,That of the world least part to vs is red:
,10, fq.txt1, 2,That of the world least part to vs is red:
,4uer, fq.txt1, 2,The Amazon huge riuer now found trew?
,7nce, fq.txt1, 2,But let that man with better sence aduize,
,10w, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,3, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,2me, fq.txt1, 1,Of some th'aboundance of an idle braine
,5'Indian, fq.txt1, 2,Who euer heard of th'Indian Peru?
,3'aboundance, fq.txt1, 1,Of some th'aboundance of an idle braine
,3; 2,But let that man with better sence aduize,  aire, does know,
,3e, fq.txt1, 2,That of the world least part to vs is red:
,2en, fq.txt1, 1,Rather then matter of iust memory,
,3is, fq.txt1, 1,That all this famous antique history,
,4rough, fq.txt1, 2,And dayly how through hardy enterprize,
,2; 2,Which to late age were neuer mentioned.  vs is red:
,7ew?, fq.txt1, 2,The Amazon huge riuer now found trew?
,6unt, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,4nturous, fq.txt1, 2,Or who in venturous vessell measured
,5ssell, fq.txt1, 2,Or who in venturous vessell measured
vew?, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,7
,2uch, fq.txt1, 1,But vouch antiquities, which no body can know.
,8, fq.txt1, 2,That of the world least part to vs is red:
,2ll, fq.txt1, 1,Right well I wote most mighty Soueraine,
,5re, fq.txt1, 2,Which to late age were neuer mentioned.
,9ere, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,4ich, fq.txt1, 1,But vouch antiquities, which no body can know.
,2; 2,Or fruitfullest Virginia who did euer vew?,4ed
,5th, fq.txt1, 2,But let that man with better sence aduize,
,4rld, fq.txt1, 2,That of the world least part to vs is red:
,4te, fq.txt1, 1,Right well I wote most mighty Soueraine,
,7t, fq.txt1, 1,Which I so much do vaunt, yet no where show,



SCRIPT
_______________________________________________________________________
use strict;
use warnings;
use File::Slurp qw(slurp);
my $fil = shift or die "Usage: $0 textfile\n";
my @lines = slurp($fil, chomp => 1);
my ($stanza, $word_cnt, %uwords);
# loop over all lines
while (@lines) {
    $stanza++;
    # make sure there are no blank lines to start
    shift @lines while ($lines[0] =~ m{^\s*$});
    # loop over the lines in the stanza
    for my $ln (1..9) {
        my $line = shift @lines or die "ran out of lines mid-stanza";
        # split the line and remove punctuation - can add others to char class
        my @words = map { s{[.,;:]$}{}; $_ } split m{\s+}, $line;
        # loop over words
        for my $i (0..@words-1) {
            $word_cnt++;
            $uwords{$words[$i]} = [] unless exists($uwords{$words[$i]});
            push @{$uwords{$words[$i]}}, [$stanza, $line, $i+1];
        }
    }
    # if there are blank lines between stanzas, uncomment this
    # shift @lines while ($lines[0] =~ m{^\s*$});
}
# output
print "FILENAME: $fil\n",
    "STANZA COUNT: $stanza\n",
    "TOTAL LINE COUNT: ", $stanza*9, "\n",
    "TOTAL WORD COUNT: $word_cnt\n",
    "UNIQUE WORD COUNT: ", scalar(keys %uwords), "\n\n",
    "UNIQUE WORDS: ", join(', ', sort keys %uwords), "\n\n",
    "REPEATED WORDS: ", grep({ @{$uwords{$_}}-1 } sort keys %uwords), "\n\n",
    "INDEX OF WORDS:\n";
foreach my $word (sort keys %uwords) {
    print "$word, $fil, ", join('; ', map { join ',', @$_ } @{$uwords{$word}}), "\n";
}
0
 

Author Comment

by:Todd Weaver
ID: 39713281
"chomp is screwing up the @lines array, in the code below with the blank lines element inserted and the chomp line commented it works on the text with blank lines provided:




the INPUT:
_______________________________________________________________
Right well I wote most mighty Soueraine,
That all this famous antique history,
Of some th'aboundance of an idle braine
Will iudged be, and painted forgery,
Rather then matter of iust memory,
Sith none, that breatheth liuing aire, does know,
Where is that happy land of Faery,
Which I so much do vaunt, yet no where show,
But vouch antiquities, which no body can know.

But let that man with better sence aduize,
That of the world least part to vs is red:
And dayly how through hardy enterprize,
Many great Regions are discouered,
Which to late age were neuer mentioned.
Who euer heard of th'Indian Peru?
Or who in venturous vessell measured
The Amazon huge riuer now found trew?
Or fruitfullest Virginia who did euer vew?




the SCRIPT (notice @lines is initialized with no CHOMP and "skip blanks" is present):
_______________________________________________________________
use strict;
use warnings;
use File::Slurp qw(slurp);
my $fil = shift or die "Usage: $0 textfile\n";
my @lines = slurp($fil);
# my @lines = slurp($fil, chomp => 1);
my ($stanza, $word_cnt, %uwords);
# loop over all lines of the stanza
while (@lines) {
     # # make sure there are no blank lines to start
     shift @lines while ($lines[0] =~ m{^\s*$});
     # increase stanza count
     $stanza++;
     # loop over the lines in the 9-line stanza
     for my $ln (1..9) {
          my $line = shift @lines or die "ran out of lines mid-stanza";
        # split the line and remove punctuation - can add others to char class
        my @words = map { s{[.,;:]$}{}; $_ } split m{\s+}, $line;
        # loop over words
        for my $i (0..@words-1) {
             $word_cnt++;
             $uwords{$words[$i]} = [] unless exists($uwords{$words[$i]});
             push @{$uwords{$words[$i]}}, [$stanza, $line, $i+1];
         }
     }
}
# output
print "FILENAME: $fil\n",
    "STANZA COUNT: $stanza\n",
    "TOTAL LINE COUNT: ", $stanza*9, "\n",
    "TOTAL WORD COUNT: $word_cnt\n",
    "UNIQUE WORD COUNT: ", scalar(keys %uwords), "\n\n",
    "UNIQUE WORDS: ", join(', ', sort keys %uwords), "\n\n",
    "REPEATED WORDS: ", grep({ @{$uwords{$_}}-1 } sort keys %uwords), "\n\n",
    "INDEX OF WORDS:\n";
foreach my $word (sort keys %uwords) {
    print "$word, $fil, ", join('; ', map { join ',', @$_ } @{$uwords{$word}}), "\n";
}




the OUTPUT (is good, except the CR-comma-location at the end of each index, should be on same line...):
_______________________________________________________________
C:\Spenser>perl concordance.pl fq.txt1
FILENAME: fq.txt1
STANZA COUNT: 2
TOTAL LINE COUNT: 18
TOTAL WORD COUNT: 127
UNIQUE WORD COUNT: 110

UNIQUE WORDS: Amazon, And, But, Faery, I, Many, Of, Or, Peru?, Rather, Regions, Right, Sith, Souerai
ne, That, The, Virginia, Where, Which, Who, Will, aduize, age, aire, all, an, and, antique, antiquit
ies, are, be, better, body, braine, breatheth, can, dayly, did, discouered, do, does, enterprize, eu
er, famous, forgery, found, fruitfullest, great, happy, hardy, heard, history, how, huge, idle, in,
is, iudged, iust, know, land, late, least, let, liuing, man, matter, measured, memory, mentioned, mi
ghty, most, much, neuer, no, none, now, of, painted, part, red, riuer, sence, show, so, some, th'Ind
ian, th'aboundance, that, the, then, this, through, to, trew?, vaunt, venturous, vessell, vew?, vouc
h, vs, well, were, where, which, who, with, world, wote, yet

REPEATED WORDS: ButIOrThatWhicheuerisknownoofthattowho

INDEX OF WORDS:
Amazon, fq.txt1, 2,The Amazon huge riuer now found trew?
,2
And, fq.txt1, 2,And dayly how through hardy enterprize,
,1
But, fq.txt1, 1,But vouch antiquities, which no body can know.
,1; 2,But let that man with better sence aduize,
,1
Faery, fq.txt1, 1,Where is that happy land of Faery,
,7
I, fq.txt1, 1,Right well I wote most mighty Soueraine,
,3; 1,Which I so much do vaunt, yet no where show,
,2
Many, fq.txt1, 2,Many great Regions are discouered,
,1
Of, fq.txt1, 1,Of some th'aboundance of an idle braine
,1
Or, fq.txt1, 2,Or who in venturous vessell measured
,1; 2,Or fruitfullest Virginia who did euer vew?,1
Peru?, fq.txt1, 2,Who euer heard of th'Indian Peru?
,6
Rather, fq.txt1, 1,Rather then matter of iust memory,
,1
Regions, fq.txt1, 2,Many great Regions are discouered,
,3
Right, fq.txt1, 1,Right well I wote most mighty Soueraine,
,1
Sith, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,1
Soueraine, fq.txt1, 1,Right well I wote most mighty Soueraine,
,7
That, fq.txt1, 1,That all this famous antique history,
,1; 2,That of the world least part to vs is red:
,1
The, fq.txt1, 2,The Amazon huge riuer now found trew?
,1
Virginia, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,3
Where, fq.txt1, 1,Where is that happy land of Faery,
,1
Which, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,1; 2,Which to late age were neuer mentioned.
,1
Who, fq.txt1, 2,Who euer heard of th'Indian Peru?
,1
Will, fq.txt1, 1,Will iudged be, and painted forgery,
,1
aduize, fq.txt1, 2,But let that man with better sence aduize,
,8
age, fq.txt1, 2,Which to late age were neuer mentioned.
,4
aire, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,6
all, fq.txt1, 1,That all this famous antique history,
,2
an, fq.txt1, 1,Of some th'aboundance of an idle braine
,5
and, fq.txt1, 1,Will iudged be, and painted forgery,
,4
antique, fq.txt1, 1,That all this famous antique history,
,5
antiquities, fq.txt1, 1,But vouch antiquities, which no body can know.
,3
are, fq.txt1, 2,Many great Regions are discouered,
,4
be, fq.txt1, 1,Will iudged be, and painted forgery,
,3
better, fq.txt1, 2,But let that man with better sence aduize,
,6
body, fq.txt1, 1,But vouch antiquities, which no body can know.
,6
braine, fq.txt1, 1,Of some th'aboundance of an idle braine
,7
breatheth, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,4
can, fq.txt1, 1,But vouch antiquities, which no body can know.
,7
dayly, fq.txt1, 2,And dayly how through hardy enterprize,
,2
did, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,5
discouered, fq.txt1, 2,Many great Regions are discouered,
,5
do, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,5
does, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,7
enterprize, fq.txt1, 2,And dayly how through hardy enterprize,
,6
euer, fq.txt1, 2,Who euer heard of th'Indian Peru?
,2; 2,Or fruitfullest Virginia who did euer vew?,6
famous, fq.txt1, 1,That all this famous antique history,
,4
forgery, fq.txt1, 1,Will iudged be, and painted forgery,
,6
found, fq.txt1, 2,The Amazon huge riuer now found trew?
,6
fruitfullest, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,2
great, fq.txt1, 2,Many great Regions are discouered,
,2
happy, fq.txt1, 1,Where is that happy land of Faery,
,4
hardy, fq.txt1, 2,And dayly how through hardy enterprize,
,5
heard, fq.txt1, 2,Who euer heard of th'Indian Peru?
,3
history, fq.txt1, 1,That all this famous antique history,
,6
how, fq.txt1, 2,And dayly how through hardy enterprize,
,3
huge, fq.txt1, 2,The Amazon huge riuer now found trew?
,3
idle, fq.txt1, 1,Of some th'aboundance of an idle braine
,6
in, fq.txt1, 2,Or who in venturous vessell measured
,3
is, fq.txt1, 1,Where is that happy land of Faery,
,2; 2,That of the world least part to vs is red:
,9
iudged, fq.txt1, 1,Will iudged be, and painted forgery,
,2
iust, fq.txt1, 1,Rather then matter of iust memory,
,5
know, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,8; 1,But vouch antiquities, which no body can know.
,8
land, fq.txt1, 1,Where is that happy land of Faery,
,5
late, fq.txt1, 2,Which to late age were neuer mentioned.
,3
least, fq.txt1, 2,That of the world least part to vs is red:
,5
let, fq.txt1, 2,But let that man with better sence aduize,
,2
liuing, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,5
man, fq.txt1, 2,But let that man with better sence aduize,
,4
matter, fq.txt1, 1,Rather then matter of iust memory,
,3
measured, fq.txt1, 2,Or who in venturous vessell measured
,6
memory, fq.txt1, 1,Rather then matter of iust memory,
,6
mentioned, fq.txt1, 2,Which to late age were neuer mentioned.
,7
mighty, fq.txt1, 1,Right well I wote most mighty Soueraine,
,6
most, fq.txt1, 1,Right well I wote most mighty Soueraine,
,5
much, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,4
neuer, fq.txt1, 2,Which to late age were neuer mentioned.
,6
no, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,8; 1,But vouch antiquities, which no body can know.
,5
none, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,2
now, fq.txt1, 2,The Amazon huge riuer now found trew?
,5
of, fq.txt1, 1,Of some th'aboundance of an idle braine
,4; 1,Rather then matter of iust memory,
,4; 1,Where is that happy land of Faery,
,6; 2,That of the world least part to vs is red:
,2; 2,Who euer heard of th'Indian Peru?
,4
painted, fq.txt1, 1,Will iudged be, and painted forgery,
,5
part, fq.txt1, 2,That of the world least part to vs is red:
,6
red, fq.txt1, 2,That of the world least part to vs is red:
,10
riuer, fq.txt1, 2,The Amazon huge riuer now found trew?
,4
sence, fq.txt1, 2,But let that man with better sence aduize,
,7
show, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,10
so, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,3
some, fq.txt1, 1,Of some th'aboundance of an idle braine
,2
th'Indian, fq.txt1, 2,Who euer heard of th'Indian Peru?
,5
th'aboundance, fq.txt1, 1,Of some th'aboundance of an idle braine
,3
that, fq.txt1, 1,Sith none, that breatheth liuing aire, does know,
,3; 1,Where is that happy land of Faery,
,3; 2,But let that man with better sence aduize,
,3
the, fq.txt1, 2,That of the world least part to vs is red:
,3
then, fq.txt1, 1,Rather then matter of iust memory,
,2
this, fq.txt1, 1,That all this famous antique history,
,3
through, fq.txt1, 2,And dayly how through hardy enterprize,
,4
to, fq.txt1, 2,That of the world least part to vs is red:
,7; 2,Which to late age were neuer mentioned.
,2
trew?, fq.txt1, 2,The Amazon huge riuer now found trew?
,7
vaunt, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,6
venturous, fq.txt1, 2,Or who in venturous vessell measured
,4
vessell, fq.txt1, 2,Or who in venturous vessell measured
,5
vew?, fq.txt1, 2,Or fruitfullest Virginia who did euer vew?,7
vouch, fq.txt1, 1,But vouch antiquities, which no body can know.
,2
vs, fq.txt1, 2,That of the world least part to vs is red:
,8
well, fq.txt1, 1,Right well I wote most mighty Soueraine,
,2
were, fq.txt1, 2,Which to late age were neuer mentioned.
,5
where, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,9
which, fq.txt1, 1,But vouch antiquities, which no body can know.
,4
who, fq.txt1, 2,Or who in venturous vessell measured
,2; 2,Or fruitfullest Virginia who did euer vew?,4
with, fq.txt1, 2,But let that man with better sence aduize,
,5
world, fq.txt1, 2,That of the world least part to vs is red:
,4
wote, fq.txt1, 1,Right well I wote most mighty Soueraine,
,4
yet, fq.txt1, 1,Which I so much do vaunt, yet no where show,
,7

C:\Spenser>
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 39713294
Odd.  You must have an older buggy version of File::Slurp.  When I run the script, I provided I get the correct output (your second version but with everything on the right line).  Here's a simple little mod to work with slurp without chomp=>1...
use strict;
use warnings;
use File::Slurp qw(slurp);
my $fil = shift or die "Usage: $0 textfile\n";
#my @lines = slurp($fil, chomp => 1);
my @lines = slurp($fil);
my ($stanza, $word_cnt, %uwords);
# loop over all lines
while (@lines) {
    $stanza++;
    # make sure there are no blank lines to start
    shift @lines while ($lines[0] =~ m{^\s*$});
    # loop over the lines in the stanza
    for my $ln (1..9) {
        my $line = shift @lines or die "ran out of lines mid-stanza";
	chomp $line;
        # split the line and remove punctuation - can add others to char class
        my @words = map { s{[.,;:]$}{}; $_ } split m{\s+}, $line;
        # loop over words
        for my $i (0..@words-1) {
            $word_cnt++;
            $uwords{$words[$i]} = [] unless exists($uwords{$words[$i]});
            push @{$uwords{$words[$i]}}, [$stanza, $line, $i+1];
        }
    }
    # if there are blank lines between stanzas, uncomment this
    # this shouldn't actually be necessary with the first shift @lines above
    shift @lines while (@lines and $lines[0] =~ m{^\s*$});
}
# output
print "FILENAME: $fil\n",
    "STANZA COUNT: $stanza\n",
    "TOTAL LINE COUNT: ", $stanza*9, "\n",
    "TOTAL WORD COUNT: $word_cnt\n",
    "UNIQUE WORD COUNT: ", scalar(keys %uwords), "\n\n",
    "UNIQUE WORDS: ", join(', ', sort keys %uwords), "\n\n",
    "REPEATED WORDS: ", join(', ', grep({ @{$uwords{$_}}-1 } sort keys %uwords)), "\n\n",
    "INDEX OF WORDS:\n";
foreach my $word (sort keys %uwords) {
    print "$word, $fil, ", join('; ', map { join ',', @$_ } @{$uwords{$word}}), "\n";
}

Open in new window

0
 

Author Comment

by:Todd Weaver
ID: 39713305
happier to work with your version of Slurp.  how can I know what is the version I have and how can I replace it? I used the internet to find step to install Slurp on my machine - I remember it was a command line entry and it worked on a PPM (I think...)

the INDEX OF WORDS is also bad.., but it is bad on the same line...


C:\Spenser>perl concordance.pl fq.txt1
FILENAME: fq.txt1
STANZA COUNT: 2
TOTAL LINE COUNT: 18
TOTAL WORD COUNT: 127
UNIQUE WORD COUNT: 110

UNIQUE WORDS: Amazon, And, But, Faery, I, Many, Of, Or, Peru?, Rather, Regions, Right, Sith, Souerai
ne, That, The, Virginia, Where, Which, Who, Will, aduize, age, aire, all, an, and, antique, antiquit
ies, are, be, better, body, braine, breatheth, can, dayly, did, discouered, do, does, enterprize, eu
er, famous, forgery, found, fruitfullest, great, happy, hardy, heard, history, how, huge, idle, in,
is, iudged, iust, know, land, late, least, let, liuing, man, matter, measured, memory, mentioned, mi
ghty, most, much, neuer, no, none, now, of, painted, part, red, riuer, sence, show, so, some, th'Ind
ian, th'aboundance, that, the, then, this, through, to, trew?, vaunt, venturous, vessell, vew?, vouc
h, vs, well, were, where, which, who, with, world, wote, yet

REPEATED WORDS: But, I, Or, That, Which, euer, is, know, no, of, that, to, who

INDEX OF WORDS:
,2azon, fq.txt1, 2,The Amazon huge riuer now found trew?
,1d, fq.txt1, 2,And dayly how through hardy enterprize,
,1; 2,But let that man with better sence aduize, ody can know.
,7ery, fq.txt1, 1,Where is that happy land of Faery,
,2; 1,Which I so much do vaunt, yet no where show,ine,
0
 

Author Comment

by:Todd Weaver
ID: 39713310
also, pls change INDEX OF WORDS output to:

INDEX OF WORDS:
WORD | FILNAME | STANZA | LOCATION | LINE [; STANZA | LOCATION | LINE; etc...]
Amazon, fq.txt1, 2, 2, The Amazon huge riuer now found trew?
And, fq.txt1, 2, 1, And dayly how through hardy enterprize,
But, fq.txt1, 1, 1, But vouch antiquities, which no body can know.; 2, 1, But let that man with better sence aduize,
Faery, fq.txt1, 1, 7, Where is that happy land of Faery,
I, fq.txt1, 1, 3, Right well I wote most mighty Soueraine,; 1, 2, Which I so much do vaunt, yet no where show,
0
 

Author Comment

by:Todd Weaver
ID: 39713316
C:\Spenser>perl -v

This is perl 5, version 16, subversion 3 (v5.16.3) built for MSWin32-x64-multi-thread
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2012, Larry Wall

Binary build 1603 [296746] provided by ActiveState http://www.ActiveState.com
Built Mar 13 2013 13:31:10

...

(Slurp.pm lives at "C:\Perl64\site\lib\File")

package File::Slurp;

use 5.6.2 ;

use strict;
use warnings ;

use Carp ;
use Exporter ;
use Fcntl qw( :DEFAULT ) ;
use POSIX qw( :fcntl_h ) ;
use Errno ;
#use Symbol ;

use vars qw( @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS $VERSION ) ;
@ISA = qw( Exporter ) ;

$VERSION = '9999.19';
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Does the idea of dealing with bits scare or confuse you? Does it seem like a waste of time in an age where we all have terabytes of storage? If so, you're missing out on one of the core tools in every professional programmer's toolbox. Learn how to …
A short article about problems I had with the new location API and permissions in Marshmallow
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

930 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now