Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


How to count words ... ozo please help...?

Posted on 2002-03-30
Medium Priority
Last Modified: 2010-03-05
Question by:sdesar
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
LVL 19

Expert Comment

by:Kim Ryan
ID: 6907937
I wrote a CPAN module that will analyze text and report many statitiscs, including the number of words. You can download it from http://www.cpan.org/modules/by-module/Lingua/KIMRYAN/Lingua-EN-Fathom-1.06.tar.gz

use Lingua::EN::Fathom;

my $text = new Lingua::EN::Fathom;
$num_words = $text->num_words;

Author Comment

ID: 6908211
oops sorry .. the entire question did not get posted...
I am using use Lingua::EN::Fathom... in my code. Thanks for theis module... it works great to fid the best words.

here's my question-
I have 3 words ie "navigate, among, most" that are in $key1_splited  
I need to add the $key1_splited to %uniq_words and also display its count.

How can I do that?

Here's the script...

sub dumpKeywords {
    my $self = shift;
    my $dir = shift;   # sort by either alpha, or num
    $dir = "alpha" unless $dir;
    my $len = shift;
    $len = 0 unless $len;

    my %uniq_words = %{$self->{STEMCOUNT}};
    my $word;
    my $ret;

       my $key1_splited  = $self->{EXTRAKEY};

    my @list = sort keys %uniq_words;

    if($dir eq 'num') {
         @list = sort { $uniq_words{$b} <=> $uniq_words{$a} }  keys %uniq_words;
            if($len) {
         splice @list, $len;
my @key1_splited;
my $tmp;
my $var = ref($key1_splited);

 my $size = scalar(@{$key1_splited});
 print "THE the type is $var and size is $size <br>";

foreach $tmp (@{$key1_splited}){
 print "Tmp: $tmp <br>";
  push (@list, $tmp);
print "List pushed: @list <br>";

    $ret = "<TABLE>\n";
    foreach $word ( @list )
         $ret .= "<TR><TD ALIGN=right>" . $uniq_words{$word}. "</TD><TD>$word</TD></TR>\n"; # outputs
the word and frequency.
    ##                  print OUT ("$word\n"); # prints just the words

    $ret .= "</TABLE>\n";
    return $ret;

Currently, the output of this script looks like this-
40 user
30 inform
21 access
17 expert
14 individu
14 coher
12 cost
12 weight
11 item
10 docum
9 present
9 brows
8 structur
--Here's 3 additional words entered by the user and their counts
1 navigate
2 among
3 most

I need to add a count to the additional 3 words.
Therfore, how can I add these 3 words  and count them in the text document...to uniq_words.

here's the site-

Here's the
--- the script has a lot of print statemnets for
debugging purposes.

The Url that you can enter there for analysis purposes
is -

At present, the code automatically finds the top 10
keywords... it uses FATHOM module.  I need to modify the code so it also finds
the 3 additional keywords that the user enters in the
input box.

Eagerly awaiting a reponse,
Thanks in advance for your time and efforts.

LVL 84

Accepted Solution

ozo earned 400 total points
ID: 6909102
$count = () = /\b\Q$word\E\b/gi;

Author Comment

ID: 6909462
The 3 EXTRA  words are in -
my $key1_splited  = $self->{EXTRAKEY};

I need to know if I can replace - $word in -
$count = () = /\b\Q$word\E\b/gi;

Previously, I added $key1_splited  to @list.

This gave me the words by NOT the count.
This outputs

How can I add 3 extra words to uniq_words and display the count ?

Awaiting a response,

Author Comment

ID: 6909489
I think to make it work.. I need to add the EXTRAKEY to this routine-
# Get the top n Stem Keywords.  Also generate the equivalent array
# of real keywords (which will have more than n keys, and display unstemmed)
sub getStemKeywords {
     my $self = shift;
     my $len = shift;
     my $stems = $self->{STEMS};
     my $stemcount = $self->{STEMCOUNT};
     ##   my $key1_splited  = $self->{EXTRAKEY};
     my @list = sort { $stemcount->{$b} <=> $stemcount->{$a} }  keys %$stemcount;
     splice @list, $len;
      ##  my @key1_splited;
     ##   my $tmp;
  ##my $var = ref($key1_splited);

  ##my $size = scalar(@{$key1_splited});
 ## print "THE the type is $var and size is $size <br>";

     # now find all the words in the other list
     my @klist = ();
       ## foreach $tmp (@{$key1_splited}){
       ##    print "Tmp: $tmp <br>";
       ##    push (@klist, $tmp);
       ## }
     for (keys %$stems) {
          my $w = $_;
          for (@list) {
               if($stems->{$w} eq $_) {
                                push @klist, $w;

     return( \@list, \@klist );

I tried to do a push (@klist, $tmp);.. but that just adds the keyword to @list, but it does NOT count.

How can I modify the above funtion so it returns
return( \@list, \@klist, \@extralist );

And then I think I will be able to use it in-
sub txtAnalyze {
# Now, do a re-count based on stemmed words
     my $fathom = $self->{FATHOM};
     my %uniq_words = $fathom->unique_words;
     my %keycount;

     for (keys %uniq_words) {
              my $tmp1 = $uniq_words{$_};
              my $tmp2 = $stemhash{$_};
        ###      print "COUNT: $tmp1  STEMHASH : $tmp2 <br>";
          $keycount{$stemhash{$_}} += $uniq_words{$_};
     $self->{STEMCOUNT} = \%keycount;

     # Now, get the top 10 keywords

     ($self->{STEMKEYWORDS}, $self->{KEYWORDS}) = $self->getStemKeywords(10);


Awaiting suggestions...

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question