Solved

How to count words ... ozo please help...?

Posted on 2002-03-30
5
225 Views
Last Modified: 2010-03-05
0
Comment
Question by:sdesar
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
5 Comments
 
LVL 19

Expert Comment

by:Kim Ryan
ID: 6907937
I wrote a CPAN module that will analyze text and report many statitiscs, including the number of words. You can download it from http://www.cpan.org/modules/by-module/Lingua/KIMRYAN/Lingua-EN-Fathom-1.06.tar.gz

use Lingua::EN::Fathom;

my $text = new Lingua::EN::Fathom;
$text->analyse_file("sample.txt");
$num_words = $text->num_words;
0
 

Author Comment

by:sdesar
ID: 6908211
oops sorry .. the entire question did not get posted...
I am using use Lingua::EN::Fathom... in my code. Thanks for theis module... it works great to fid the best words.

here's my question-
I have 3 words ie "navigate, among, most" that are in $key1_splited  
I need to add the $key1_splited to %uniq_words and also display its count.

How can I do that?

Here's the script...

sub dumpKeywords {
    my $self = shift;
    my $dir = shift;   # sort by either alpha, or num
    $dir = "alpha" unless $dir;
    my $len = shift;
    $len = 0 unless $len;

    my %uniq_words = %{$self->{STEMCOUNT}};
    my $word;
    my $ret;

       my $key1_splited  = $self->{EXTRAKEY};

    my @list = sort keys %uniq_words;

    if($dir eq 'num') {
         @list = sort { $uniq_words{$b} <=> $uniq_words{$a} }  keys %uniq_words;
    }
            if($len) {
         splice @list, $len;
    }
my @key1_splited;
my $tmp;
my $var = ref($key1_splited);

 my $size = scalar(@{$key1_splited});
 print "THE the type is $var and size is $size <br>";

foreach $tmp (@{$key1_splited}){
 print "Tmp: $tmp <br>";
  push (@list, $tmp);
}
print "List pushed: @list <br>";

    $ret = "<TABLE>\n";
    foreach $word ( @list )
    {      
         $ret .= "<TR><TD ALIGN=right>" . $uniq_words{$word}. "</TD><TD>$word</TD></TR>\n"; # outputs
the word and frequency.
    ##                  print OUT ("$word\n"); # prints just the words

    }
    $ret .= "</TABLE>\n";
    return $ret;
}

Currently, the output of this script looks like this-
40 user
30 inform
21 access
17 expert
14 individu
14 coher
12 cost
12 weight
11 item
10 docum
9 present
9 brows
8 structur
--Here's 3 additional words entered by the user and their counts
1 navigate
2 among
3 most

I need to add a count to the additional 3 words.
Therfore, how can I add these 3 words  and count them in the text document...to uniq_words.

here's the site- http://208.56.56.72/blaT/frame1.html

Here's the
code-http://textseem.ehost4u.com/blaT/PP3-29.txt
--- the script has a lot of print statemnets for
debugging purposes.

The Url that you can enter there for analysis purposes
is - http://208.56.56.72/blaT/Access1.html


At present, the code automatically finds the top 10
keywords... it uses FATHOM module.  I need to modify the code so it also finds
the 3 additional keywords that the user enters in the
input box.



Eagerly awaiting a reponse,
Thanks in advance for your time and efforts.

0
 
LVL 84

Accepted Solution

by:
ozo earned 100 total points
ID: 6909102
$count = () = /\b\Q$word\E\b/gi;
0
 

Author Comment

by:sdesar
ID: 6909462
The 3 EXTRA  words are in -
my $key1_splited  = $self->{EXTRAKEY};

I need to know if I can replace - $word in -
$count = () = /\b\Q$word\E\b/gi;

Previously, I added $key1_splited  to @list.

This gave me the words by NOT the count.
This outputs
COUNT:$uniq_words{$word}
WORD:$word

How can I add 3 extra words to uniq_words and display the count ?

Awaiting a response,
Thanks
0
 

Author Comment

by:sdesar
ID: 6909489
I think to make it work.. I need to add the EXTRAKEY to this routine-
# Get the top n Stem Keywords.  Also generate the equivalent array
# of real keywords (which will have more than n keys, and display unstemmed)
sub getStemKeywords {
     my $self = shift;
     my $len = shift;
     my $stems = $self->{STEMS};
     my $stemcount = $self->{STEMCOUNT};
     ##   my $key1_splited  = $self->{EXTRAKEY};
     my @list = sort { $stemcount->{$b} <=> $stemcount->{$a} }  keys %$stemcount;
     splice @list, $len;
      ##  my @key1_splited;
     ##   my $tmp;
  ##my $var = ref($key1_splited);

  ##my $size = scalar(@{$key1_splited});
 ## print "THE the type is $var and size is $size <br>";

     # now find all the words in the other list
     my @klist = ();
       ## foreach $tmp (@{$key1_splited}){
       ##    print "Tmp: $tmp <br>";
       ##    push (@klist, $tmp);
       ## }
     for (keys %$stems) {
          my $w = $_;
          for (@list) {
               if($stems->{$w} eq $_) {
                                push @klist, $w;
                    last;
               }
          }
     }

     return( \@list, \@klist );
}

I tried to do a push (@klist, $tmp);.. but that just adds the keyword to @list, but it does NOT count.

How can I modify the above funtion so it returns
return( \@list, \@klist, \@extralist );


And then I think I will be able to use it in-
sub txtAnalyze {
................
# Now, do a re-count based on stemmed words
     my $fathom = $self->{FATHOM};
     my %uniq_words = $fathom->unique_words;
     my %keycount;

     for (keys %uniq_words) {
              my $tmp1 = $uniq_words{$_};
              my $tmp2 = $stemhash{$_};
       
        ###      print "COUNT: $tmp1  STEMHASH : $tmp2 <br>";
             
          $keycount{$stemhash{$_}} += $uniq_words{$_};
                 }
     $self->{STEMCOUNT} = \%keycount;


     # Now, get the top 10 keywords

     ($self->{STEMKEYWORDS}, $self->{KEYWORDS}) = $self->getStemKeywords(10);

...}


Awaiting suggestions...
Thanks
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question