Solved

How to count words ... ozo please help...?

Posted on 2002-03-30
5
194 Views
Last Modified: 2010-03-05
0
Comment
Question by:sdesar
  • 3
5 Comments
 
LVL 19

Expert Comment

by:Kim Ryan
ID: 6907937
I wrote a CPAN module that will analyze text and report many statitiscs, including the number of words. You can download it from http://www.cpan.org/modules/by-module/Lingua/KIMRYAN/Lingua-EN-Fathom-1.06.tar.gz

use Lingua::EN::Fathom;

my $text = new Lingua::EN::Fathom;
$text->analyse_file("sample.txt");
$num_words = $text->num_words;
0
 

Author Comment

by:sdesar
ID: 6908211
oops sorry .. the entire question did not get posted...
I am using use Lingua::EN::Fathom... in my code. Thanks for theis module... it works great to fid the best words.

here's my question-
I have 3 words ie "navigate, among, most" that are in $key1_splited  
I need to add the $key1_splited to %uniq_words and also display its count.

How can I do that?

Here's the script...

sub dumpKeywords {
    my $self = shift;
    my $dir = shift;   # sort by either alpha, or num
    $dir = "alpha" unless $dir;
    my $len = shift;
    $len = 0 unless $len;

    my %uniq_words = %{$self->{STEMCOUNT}};
    my $word;
    my $ret;

       my $key1_splited  = $self->{EXTRAKEY};

    my @list = sort keys %uniq_words;

    if($dir eq 'num') {
         @list = sort { $uniq_words{$b} <=> $uniq_words{$a} }  keys %uniq_words;
    }
            if($len) {
         splice @list, $len;
    }
my @key1_splited;
my $tmp;
my $var = ref($key1_splited);

 my $size = scalar(@{$key1_splited});
 print "THE the type is $var and size is $size <br>";

foreach $tmp (@{$key1_splited}){
 print "Tmp: $tmp <br>";
  push (@list, $tmp);
}
print "List pushed: @list <br>";

    $ret = "<TABLE>\n";
    foreach $word ( @list )
    {      
         $ret .= "<TR><TD ALIGN=right>" . $uniq_words{$word}. "</TD><TD>$word</TD></TR>\n"; # outputs
the word and frequency.
    ##                  print OUT ("$word\n"); # prints just the words

    }
    $ret .= "</TABLE>\n";
    return $ret;
}

Currently, the output of this script looks like this-
40 user
30 inform
21 access
17 expert
14 individu
14 coher
12 cost
12 weight
11 item
10 docum
9 present
9 brows
8 structur
--Here's 3 additional words entered by the user and their counts
1 navigate
2 among
3 most

I need to add a count to the additional 3 words.
Therfore, how can I add these 3 words  and count them in the text document...to uniq_words.

here's the site- http://208.56.56.72/blaT/frame1.html

Here's the
code-http://textseem.ehost4u.com/blaT/PP3-29.txt
--- the script has a lot of print statemnets for
debugging purposes.

The Url that you can enter there for analysis purposes
is - http://208.56.56.72/blaT/Access1.html


At present, the code automatically finds the top 10
keywords... it uses FATHOM module.  I need to modify the code so it also finds
the 3 additional keywords that the user enters in the
input box.



Eagerly awaiting a reponse,
Thanks in advance for your time and efforts.

0
 
LVL 84

Accepted Solution

by:
ozo earned 100 total points
ID: 6909102
$count = () = /\b\Q$word\E\b/gi;
0
 

Author Comment

by:sdesar
ID: 6909462
The 3 EXTRA  words are in -
my $key1_splited  = $self->{EXTRAKEY};

I need to know if I can replace - $word in -
$count = () = /\b\Q$word\E\b/gi;

Previously, I added $key1_splited  to @list.

This gave me the words by NOT the count.
This outputs
COUNT:$uniq_words{$word}
WORD:$word

How can I add 3 extra words to uniq_words and display the count ?

Awaiting a response,
Thanks
0
 

Author Comment

by:sdesar
ID: 6909489
I think to make it work.. I need to add the EXTRAKEY to this routine-
# Get the top n Stem Keywords.  Also generate the equivalent array
# of real keywords (which will have more than n keys, and display unstemmed)
sub getStemKeywords {
     my $self = shift;
     my $len = shift;
     my $stems = $self->{STEMS};
     my $stemcount = $self->{STEMCOUNT};
     ##   my $key1_splited  = $self->{EXTRAKEY};
     my @list = sort { $stemcount->{$b} <=> $stemcount->{$a} }  keys %$stemcount;
     splice @list, $len;
      ##  my @key1_splited;
     ##   my $tmp;
  ##my $var = ref($key1_splited);

  ##my $size = scalar(@{$key1_splited});
 ## print "THE the type is $var and size is $size <br>";

     # now find all the words in the other list
     my @klist = ();
       ## foreach $tmp (@{$key1_splited}){
       ##    print "Tmp: $tmp <br>";
       ##    push (@klist, $tmp);
       ## }
     for (keys %$stems) {
          my $w = $_;
          for (@list) {
               if($stems->{$w} eq $_) {
                                push @klist, $w;
                    last;
               }
          }
     }

     return( \@list, \@klist );
}

I tried to do a push (@klist, $tmp);.. but that just adds the keyword to @list, but it does NOT count.

How can I modify the above funtion so it returns
return( \@list, \@klist, \@extralist );


And then I think I will be able to use it in-
sub txtAnalyze {
................
# Now, do a re-count based on stemmed words
     my $fathom = $self->{FATHOM};
     my %uniq_words = $fathom->unique_words;
     my %keycount;

     for (keys %uniq_words) {
              my $tmp1 = $uniq_words{$_};
              my $tmp2 = $stemhash{$_};
       
        ###      print "COUNT: $tmp1  STEMHASH : $tmp2 <br>";
             
          $keycount{$stemhash{$_}} += $uniq_words{$_};
                 }
     $self->{STEMCOUNT} = \%keycount;


     # Now, get the top 10 keywords

     ($self->{STEMKEYWORDS}, $self->{KEYWORDS}) = $self->getStemKeywords(10);

...}


Awaiting suggestions...
Thanks
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now