Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


Calculating TF

Posted on 2008-11-11
Medium Priority
Last Modified: 2012-05-05
I have a question, I'm crating a script that will calculate the TF from a document. I have the document stored in a vector.

My question is when I print the vector I get the frequency of times the words appear in the text, for example the word college appears 10 times in the text. Should I keep the numbers like this or should I make the number between 0 and 1?

Because every where that I look they say that the TF is 0.1 or 0.9.

Here is the code I have to calculate the frequency.
#Get the word frequency from the text.
for my $word (@$words){

Open in new window

Question by:Ennio
  • 3
  • 2
LVL 39

Expert Comment

ID: 22932032
Keeping the TF between 0 and 1 means it is normalized.  To do this, divide every number by the highest frequency.

Whether or not you do this will depend on how you are using it.

Author Comment

ID: 22932053
So I should dived it to the highest frequency or the number of words in the text?

LVL 39

Accepted Solution

Adam314 earned 2000 total points
ID: 22932084
By the highest frequency.

For examle, if you had this:
    college: 10
    apple: 6
    letter: 16
You would divide each by 16, because it is the highest frequency, getting:
    college: .625
    apple: .375
    letter: 1.0

Author Comment

ID: 22932103
ok... thanks... because I was looking some where and they said to divide by the total numbers of terms in the text.  That make sense now.

LVL 39

Expert Comment

ID: 22932164
Well, again it depends on what you are looking for, but you could divide each by the total number of terms in the text.

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Six Sigma Control Plans
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…
Suggested Courses

580 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question