# Calculating IDF

I need some help calculating the IDF.

I created this code that will loop thru an array of term, and it will check if the terms are in a hash that contains some text information.

my hash has only 5 texts, and the terms in the hash deos not repeat. But when I print the numbers of time the term apear in the documents hash I get some crazy number (like 4000,or 380) and not 5 or 1 or 2.

here is the loop where I look at the terms, and search thru the hash.

The array of doc contains text information, so each place in the array has a text. the docTemp is used to get the terms from the doc array and store each term as a value inside the array.

for example
\$doc[1] = "The mouse is black"
``````for (\$counter = 0; \$counter <= \$#terms; \$counter++){
\$nDocs = 0;
for (\$count = 0; \$count <= \$#doc; \$count++){
@docTemp = split(/\s+/, \$doc[\$count]);
###################################
# STORE THE DOCUMENTS INTO A HASH #
###################################
for my \$word (@docTemp){
\$docHash{\$word}++;
}
for my \$key ( keys %docHash ) {
################################
# CHECK IF TERM IS IN THE HASH #
################################
if (\$terms[\$counter] == \$key){
\$nDocs++;
}
}
}
print \$terms[\$counter], " ", \$nDocs, "\n";
}
``````
LVL 1
###### Who is Participating?

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Author Commented:
I did some changes in the code, and now I get all 6 times.

here is the changes in the code.
``````for (\$counter = 0; \$counter <= \$#terms; \$counter++){
\$nDocs = 0;
for (\$count = 0; \$count <= \$#doc; \$count++){
@docTemp = split(/\s+/, \$doc[\$count]);
###################################
# STORE THE DOCUMENTS INTO A HASH #
###################################
for my \$word (@docTemp){
\$docHash{\$word}++;
}

if (exists \$docHash{\$terms[\$counter]}){
\$nDocs++;
}
}
print \$terms[\$counter], " ", \$nDocs, "\n";
}
``````
0
Commented:
where did \$terms[\$counter] come from?

if exists \$docHash{\$terms[\$counter]} is true the first time through the for (\$count = 0; \$count <= \$#doc; \$count++) loop,
it will also be true the next time through the loop.
Is that what you want?
0
Author Commented:
\$terms is an array that contain the terms that I'm searching in the hash.

I only want the if exists \$docHash{\$terms[\$counter]} if the terms[\$counter] is in the hash too, if not I don't want.
0
Commented:
exists \$docHash{\$terms[\$counter]}  is true when the \$terms[\$counter] is in the %docHash hash
since you only accumulate entries in the hash, if it is ever in the hash, it will always be in the hash
0

Experts Exchange Solution brought to you by