• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 349
  • Last Modified:

Calculating IDF

I need some help calculating the IDF.

I created this code that will loop thru an array of term, and it will check if the terms are in a hash that contains some text information.

my hash has only 5 texts, and the terms in the hash deos not repeat. But when I print the numbers of time the term apear in the documents hash I get some crazy number (like 4000,or 380) and not 5 or 1 or 2.

here is the loop where I look at the terms, and search thru the hash.

The array of doc contains text information, so each place in the array has a text. the docTemp is used to get the terms from the doc array and store each term as a value inside the array.

for example
$doc[1] = "The mouse is black"
for ($counter = 0; $counter <= $#terms; $counter++){
   $nDocs = 0;
   for ($count = 0; $count <= $#doc; $count++){
      @docTemp = split(/\s+/, $doc[$count]);
      ###################################
      # STORE THE DOCUMENTS INTO A HASH #
      ###################################
      for my $word (@docTemp){
         $docHash{$word}++;
      }
      for my $key ( keys %docHash ) {
         ################################
         # CHECK IF TERM IS IN THE HASH #
         ################################
         if ($terms[$counter] == $key){
            $nDocs++;
         }
      }  
   }
   print $terms[$counter], " ", $nDocs, "\n";
}

Open in new window

0
Ennio
Asked:
Ennio
  • 3
  • 2
1 Solution
 
EnnioAuthor Commented:
I did some changes in the code, and now I get all 6 times.

here is the changes in the code.
for ($counter = 0; $counter <= $#terms; $counter++){
   $nDocs = 0;
   for ($count = 0; $count <= $#doc; $count++){
      @docTemp = split(/\s+/, $doc[$count]);
      ###################################
      # STORE THE DOCUMENTS INTO A HASH #
      ###################################
      for my $word (@docTemp){
         $docHash{$word}++;
      }
      
      if (exists $docHash{$terms[$counter]}){
         $nDocs++;
      }
   }
   print $terms[$counter], " ", $nDocs, "\n";
}

Open in new window

0
 
ozoCommented:
where did $terms[$counter] come from?

if exists $docHash{$terms[$counter]} is true the first time through the for ($count = 0; $count <= $#doc; $count++) loop,
it will also be true the next time through the loop.
Is that what you want?
0
 
EnnioAuthor Commented:
$terms is an array that contain the terms that I'm searching in the hash.

I only want the if exists $docHash{$terms[$counter]} if the terms[$counter] is in the hash too, if not I don't want.
0
 
ozoCommented:
exists $docHash{$terms[$counter]}  is true when the $terms[$counter] is in the %docHash hash
since you only accumulate entries in the hash, if it is ever in the hash, it will always be in the hash
0
 
EnnioAuthor Commented:
what it whould be the best way to do this. should I check and delete the key from the hash?
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now