**Listing words in an input file along with their line numbers***

Greetings all...I am writing a program in perl which is passed a file containing text called 'sample.' The program is to list all the words in the input file 'sample' along with a list of line numbers in which the words occur.  A word is defined as beginning with a letter, followed by zero or more letters, and terminating in zero or one puncuation characters or terminating in 's (and all terminating puncuation such as periods, semicolons, etc should be removed).  Output should be sorted and converted to all uppercase (which is already taken care of in the code and shell script i wrote) Unfortunately the only two things wrong with my prog are that it doesn't meet the requirements stated above for the definition of a 'word' and it unfortunately lists multiple instances of the same word. For example say the contents of the input file 'sample'
is the following:

Here's an easy game to play.
Here's an easy thing to say:

the correct output should be:

AN 1 2
EASY 1 2
GAME 1
HERE'S 1 2
PLAY 1
SAY 2
THING 2
TO 12

but my program gives the following output:
AN 1
AN 2
GAME 1
HERE'S 1
HERE'S 2

etc.

Could anyone please fix these problems for me?
this is the following code:

$count = 0;
open(STREAM, "sample");
while(<STREAM>)
{
   $count++;
   uc;
   (@words) = split(/\s/);
   foreach $word (@words)
   {
      $word = uc $word;
      print "$word\t $count\n";
   }
}
close(STREAM);

FYI: my shell script is the following and it works (named my program tt.pl):
 <sample | perl tt.pl | sort> output.tt
ninjacookiesAsked:
Who is Participating?
 
yoricConnect With a Mentor Commented:
This should do it for you...

$count = 0;
open(STREAM, "sample.txt");
while(<STREAM>) {
   $count++;
   (@words) = split(/\s/);
   foreach $word (@words) {
      $word = uc $word;
      # Throw out everything except letters and apostrophes
      $word =~ s/[^A-Za-z']//g;  
      # Record the word being found in this line
      $wc{$word}{$count} = 1;
   }
}

# For each word found...
foreach (sort keys %wc) {
  print "$_\t";
  # For each line it was found in...
  foreach (sort { $a <=> $b } keys %{$wc{$_}}) {
    print "$_ ";
  }
  print "\n";
}

close(STREAM);
0
 
ninjacookiesAuthor Commented:
Adjusted points to 16
0
 
MindoCommented:
yoric has answered this question at the same time when i was solving it for you too. Choose the best one that suits for you :-) I think mine is shorter and simpler to understand 8-?

#!/usr/local/bin/perl

open(F, "< sample");

%words = ();

while(<F>)
{
   while(/([A-Za-z'-]+)/g)
   {
      $word = uc($1);
      $words{$word} .= "$. ";
   }
}

foreach $key (sort keys %words)
{
  print "$key = $words{$key} \n";
}

close(F);
0
 
yoricCommented:
Mindo's solution certainly is cleaner and easier to follow (I learned a few things! Thanks, Mindo!), but it doesn't correctly handle the case when there are two of the same word in the same line.
0
 
ninjacookiesAuthor Commented:
Great job fellaz...and thanks once again....gotta give the nod to yoric...he was first plus he did point out an interesting minor minor (microscopic) detail of Mindo's design...nevertheless great job guys and thanks once agian for the help!!
, ninjac
0
All Courses

From novice to tech pro — start learning today.