# Calculating TF

I have a question, I'm crating a script that will calculate the TF from a document. I have the document stored in a vector.

My question is when I print the vector I get the frequency of times the words appear in the text, for example the word college appears 10 times in the text. Should I keep the numbers like this or should I make the number between 0 and 1?

Because every where that I look they say that the TF is 0.1 or 0.9.

Here is the code I have to calculate the frequency.
``````#Get the word frequency from the text.
for my \$word (@\$words){
\$wordcount{\$word}++;
}
``````
LVL 1
###### Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Commented:
Keeping the TF between 0 and 1 means it is normalized.  To do this, divide every number by the highest frequency.

Whether or not you do this will depend on how you are using it.
Author Commented:
So I should dived it to the highest frequency or the number of words in the text?

Commented:
By the highest frequency.

For examle, if you had this:
college: 10
apple: 6
letter: 16
You would divide each by 16, because it is the highest frequency, getting:
college: .625
apple: .375
letter: 1.0

Experts Exchange Solution brought to you by