Solved

Need help on K nearest neighbourhood algorithm with perl

Posted on 2009-05-06
2
303 Views
Last Modified: 2012-06-27
Can anyone explain each line of the code below and what does it do.
And from the code below i'm hoping to modify it so that it can determine which email is spam or which is not. If i m not mistaken it uses the K nearest neighbourhood algorithm to determine the difference. thx,

p.s. I have attached a sample of spam and non spam email text file to show their difference.
use Search::VectorSpace;
 
# These are just some simple sample documents
my @docs = (
	"The cat in the hat",
	"A cat is a fine pet.",
	"Dogs and cats make good pets.");
 
# Build our search engine, with a suitable threshold	
my $engine = Search::VectorSpace->new( docs => \@docs, threshold=>'0.1');
 
# Index the documents
$engine->build_index();
 
# Answer queries
print "Next query?\n";
while ( my $query = <> ) {
    my %results = $engine->search( $query );
    foreach my $result ( sort { $results{$b} <=> $results{$a} } keys %results ) {
        print "Relevance: ", $results{$result}, "\n";
        print $result, "\n\n";
    }
    print "Next query?\n";
}

Open in new window

spam.txt
0
Comment
Question by:ychet
2 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 24323888
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question