Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

How to count words ... ozo please help...?

Posted on 2002-03-30
5
216 Views
Last Modified: 2010-03-05
0
Comment
Question by:sdesar
  • 3
5 Comments
 
LVL 19

Expert Comment

by:Kim Ryan
ID: 6907937
I wrote a CPAN module that will analyze text and report many statitiscs, including the number of words. You can download it from http://www.cpan.org/modules/by-module/Lingua/KIMRYAN/Lingua-EN-Fathom-1.06.tar.gz

use Lingua::EN::Fathom;

my $text = new Lingua::EN::Fathom;
$text->analyse_file("sample.txt");
$num_words = $text->num_words;
0
 

Author Comment

by:sdesar
ID: 6908211
oops sorry .. the entire question did not get posted...
I am using use Lingua::EN::Fathom... in my code. Thanks for theis module... it works great to fid the best words.

here's my question-
I have 3 words ie "navigate, among, most" that are in $key1_splited  
I need to add the $key1_splited to %uniq_words and also display its count.

How can I do that?

Here's the script...

sub dumpKeywords {
    my $self = shift;
    my $dir = shift;   # sort by either alpha, or num
    $dir = "alpha" unless $dir;
    my $len = shift;
    $len = 0 unless $len;

    my %uniq_words = %{$self->{STEMCOUNT}};
    my $word;
    my $ret;

       my $key1_splited  = $self->{EXTRAKEY};

    my @list = sort keys %uniq_words;

    if($dir eq 'num') {
         @list = sort { $uniq_words{$b} <=> $uniq_words{$a} }  keys %uniq_words;
    }
            if($len) {
         splice @list, $len;
    }
my @key1_splited;
my $tmp;
my $var = ref($key1_splited);

 my $size = scalar(@{$key1_splited});
 print "THE the type is $var and size is $size <br>";

foreach $tmp (@{$key1_splited}){
 print "Tmp: $tmp <br>";
  push (@list, $tmp);
}
print "List pushed: @list <br>";

    $ret = "<TABLE>\n";
    foreach $word ( @list )
    {      
         $ret .= "<TR><TD ALIGN=right>" . $uniq_words{$word}. "</TD><TD>$word</TD></TR>\n"; # outputs
the word and frequency.
    ##                  print OUT ("$word\n"); # prints just the words

    }
    $ret .= "</TABLE>\n";
    return $ret;
}

Currently, the output of this script looks like this-
40 user
30 inform
21 access
17 expert
14 individu
14 coher
12 cost
12 weight
11 item
10 docum
9 present
9 brows
8 structur
--Here's 3 additional words entered by the user and their counts
1 navigate
2 among
3 most

I need to add a count to the additional 3 words.
Therfore, how can I add these 3 words  and count them in the text document...to uniq_words.

here's the site- http://208.56.56.72/blaT/frame1.html

Here's the
code-http://textseem.ehost4u.com/blaT/PP3-29.txt
--- the script has a lot of print statemnets for
debugging purposes.

The Url that you can enter there for analysis purposes
is - http://208.56.56.72/blaT/Access1.html


At present, the code automatically finds the top 10
keywords... it uses FATHOM module.  I need to modify the code so it also finds
the 3 additional keywords that the user enters in the
input box.



Eagerly awaiting a reponse,
Thanks in advance for your time and efforts.

0
 
LVL 84

Accepted Solution

by:
ozo earned 100 total points
ID: 6909102
$count = () = /\b\Q$word\E\b/gi;
0
 

Author Comment

by:sdesar
ID: 6909462
The 3 EXTRA  words are in -
my $key1_splited  = $self->{EXTRAKEY};

I need to know if I can replace - $word in -
$count = () = /\b\Q$word\E\b/gi;

Previously, I added $key1_splited  to @list.

This gave me the words by NOT the count.
This outputs
COUNT:$uniq_words{$word}
WORD:$word

How can I add 3 extra words to uniq_words and display the count ?

Awaiting a response,
Thanks
0
 

Author Comment

by:sdesar
ID: 6909489
I think to make it work.. I need to add the EXTRAKEY to this routine-
# Get the top n Stem Keywords.  Also generate the equivalent array
# of real keywords (which will have more than n keys, and display unstemmed)
sub getStemKeywords {
     my $self = shift;
     my $len = shift;
     my $stems = $self->{STEMS};
     my $stemcount = $self->{STEMCOUNT};
     ##   my $key1_splited  = $self->{EXTRAKEY};
     my @list = sort { $stemcount->{$b} <=> $stemcount->{$a} }  keys %$stemcount;
     splice @list, $len;
      ##  my @key1_splited;
     ##   my $tmp;
  ##my $var = ref($key1_splited);

  ##my $size = scalar(@{$key1_splited});
 ## print "THE the type is $var and size is $size <br>";

     # now find all the words in the other list
     my @klist = ();
       ## foreach $tmp (@{$key1_splited}){
       ##    print "Tmp: $tmp <br>";
       ##    push (@klist, $tmp);
       ## }
     for (keys %$stems) {
          my $w = $_;
          for (@list) {
               if($stems->{$w} eq $_) {
                                push @klist, $w;
                    last;
               }
          }
     }

     return( \@list, \@klist );
}

I tried to do a push (@klist, $tmp);.. but that just adds the keyword to @list, but it does NOT count.

How can I modify the above funtion so it returns
return( \@list, \@klist, \@extralist );


And then I think I will be able to use it in-
sub txtAnalyze {
................
# Now, do a re-count based on stemmed words
     my $fathom = $self->{FATHOM};
     my %uniq_words = $fathom->unique_words;
     my %keycount;

     for (keys %uniq_words) {
              my $tmp1 = $uniq_words{$_};
              my $tmp2 = $stemhash{$_};
       
        ###      print "COUNT: $tmp1  STEMHASH : $tmp2 <br>";
             
          $keycount{$stemhash{$_}} += $uniq_words{$_};
                 }
     $self->{STEMCOUNT} = \%keycount;


     # Now, get the top 10 keywords

     ($self->{STEMKEYWORDS}, $self->{KEYWORDS}) = $self->getStemKeywords(10);

...}


Awaiting suggestions...
Thanks
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Sending email via Perl on Windows 3 183
Using Perl DBI to query oracle 3 46
create a gui in perl 3 97
Able to retrieve only 1 row through email amongst multiple rows 3 53
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question