Solved

Need help on K nearest neighbourhood algorithm with perl

Posted on 2009-05-06
2
300 Views
Last Modified: 2012-06-27
Can anyone explain each line of the code below and what does it do.
And from the code below i'm hoping to modify it so that it can determine which email is spam or which is not. If i m not mistaken it uses the K nearest neighbourhood algorithm to determine the difference. thx,

p.s. I have attached a sample of spam and non spam email text file to show their difference.
use Search::VectorSpace;

 

# These are just some simple sample documents

my @docs = (

	"The cat in the hat",

	"A cat is a fine pet.",

	"Dogs and cats make good pets.");

 

# Build our search engine, with a suitable threshold	

my $engine = Search::VectorSpace->new( docs => \@docs, threshold=>'0.1');

 

# Index the documents

$engine->build_index();

 

# Answer queries

print "Next query?\n";

while ( my $query = <> ) {

    my %results = $engine->search( $query );

    foreach my $result ( sort { $results{$b} <=> $results{$a} } keys %results ) {

        print "Relevance: ", $results{$result}, "\n";

        print $result, "\n\n";

    }

    print "Next query?\n";

}

Open in new window

spam.txt
0
Comment
Question by:ychet
2 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 24323888
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
rename outfile before writing 2 71
How to do basic image processing(Zoom ,Pan ,Rotate and Flip) inside picture box in C#? 4 213
algorithm 15 99
read an xml file in perl 2 15
Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now