# Calculating TF

I have a question, I'm crating a script that will calculate the TF from a document. I have the document stored in a vector.

My question is when I print the vector I get the frequency of times the words appear in the text, for example the word college appears 10 times in the text. Should I keep the numbers like this or should I make the number between 0 and 1?

Because every where that I look they say that the TF is 0.1 or 0.9.

Here is the code I have to calculate the frequency.
``````#Get the word frequency from the text.
for my \$word (@\$words){
\$wordcount{\$word}++;
}
``````
LVL 1
Commented:
Keeping the TF between 0 and 1 means it is normalized.  To do this, divide every number by the highest frequency.

Whether or not you do this will depend on how you are using it.
0
Author Commented:
So I should dived it to the highest frequency or the number of words in the text?

0
Commented:
By the highest frequency.

For examle, if you had this:
college: 10
apple: 6
letter: 16
You would divide each by 16, because it is the highest frequency, getting:
college: .625
apple: .375
letter: 1.0
0

Author Commented:
ok... thanks... because I was looking some where and they said to divide by the total numbers of terms in the text.  That make sense now.

:)
0
Commented:
Well, again it depends on what you are looking for, but you could divide each by the total number of terms in the text.
0
