[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 324
  • Last Modified:

Need help on K nearest neighbourhood algorithm with perl

Can anyone explain each line of the code below and what does it do.
And from the code below i'm hoping to modify it so that it can determine which email is spam or which is not. If i m not mistaken it uses the K nearest neighbourhood algorithm to determine the difference. thx,

p.s. I have attached a sample of spam and non spam email text file to show their difference.
use Search::VectorSpace;
 
# These are just some simple sample documents
my @docs = (
	"The cat in the hat",
	"A cat is a fine pet.",
	"Dogs and cats make good pets.");
 
# Build our search engine, with a suitable threshold	
my $engine = Search::VectorSpace->new( docs => \@docs, threshold=>'0.1');
 
# Index the documents
$engine->build_index();
 
# Answer queries
print "Next query?\n";
while ( my $query = <> ) {
    my %results = $engine->search( $query );
    foreach my $result ( sort { $results{$b} <=> $results{$a} } keys %results ) {
        print "Relevance: ", $results{$result}, "\n";
        print $result, "\n\n";
    }
    print "Next query?\n";
}

Open in new window

spam.txt
0
ychet
Asked:
ychet
1 Solution

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now