Solved

SPAMASSASSIN and bayes how to

Posted on 2006-06-14
6
447 Views
Last Modified: 2008-02-01
We are running Spamassassin here on our mail server. Then the legitimate mail gets forwarded to our exchange server. We have had alot of spam come through to users, and would like to put an end to this. I've read that the bayes filter is a great way. Right now messages marked as spam go into a exchange mailbox called spam-trap. I need better documentation and understanding of bayes. It seems as if the tool it uses requires the mail to be available locally, but mine is on the exchange server.

Any information wo uld be helpful
0
Comment
Question by:shankshank
  • 4
  • 2
6 Comments
 
LVL 6

Accepted Solution

by:
Sanktwo earned 500 total points
ID: 16923625
For a detailed description, look up "bayesian inference" in Wikipedia, but here is a potted version.
Bayesian inference means to test a hypothesis based upon weighting evidence as to the probability of it being true or not.
For example, if an email contains the word "Weight" and "slimming", does it have a higher probability of being spam than one that doesn't?
For most of us the answer is "yes" unless you happen to be a site dealing in weight loss.
A Bayesian based filter requires data regarding what words and phrases imply that the message is spam (positives). Also, what words imply that it is not spam (negatives). Each mail is then weighted to say whether, on balance, it is more likely to be spam than not. Usually you are permitted to set the threshold at which it decides that an email is spam.
Thus Bayesian inference requires:
1. That you "train" the filter to match your circumstances (although it might come with some pre-training data) i.e. you tell it what is spam and what is not for quite a large number of messages
2. you choose a spam score threshold relevant to your business needs.
3. That the spammers are not sufficiently clever to pad the spam with "good" words and phrases so your filter cannot decide.

You say that "messages marked as spam go into spam-trap" - if this is already a bayesian inference filter, then those messages left may be from good spammers who know (3) above and you are unlikely to gain much by adding another Bayesian inference engine.

All you need for a Bayesian inference test is the full body of the email and the training data of the bayesian inference engine. This will allocate a score (say 1-100) and you decide the level at which it is decided that it is "spam".

Does that help?
0
 
LVL 6

Expert Comment

by:Sanktwo
ID: 16923712
I should add that you should take all steps to avoid validating email addresses to spammers in your domain e.g.
1. do NOT bounce messages which are addressed to users not in your domain.
2. Do not permit the use of HTML viewers which download images (such as older versions of Outlook Express) since image names containing database references are used to validate email addresses.
3. Tell all your users NEVER to attempt to "unsubscribe from spam mailing lists" (they just validate the address and continue sending).

Although it might upset your user and customers somewhat, try to avoid real names for all your new users. Pick something like Peter-405-Jones rather than Peter.Jones since spammers often try dictionary attacks to identify valid email addresses.

Ensure that you website does not contain harvestable email addresses i.e. use images or other obscuring techniques.
0
 
LVL 5

Author Comment

by:shankshank
ID: 16934070
Thanks for the info. So how do I train spamassassin
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 6

Expert Comment

by:Sanktwo
ID: 16934146
0
 
LVL 5

Author Comment

by:shankshank
ID: 16934176
and how is this?
http://wiki.apache.org/spamassassin/CustomRulesets

I am running freebsd and was unable to install according to instructions provided http://www.sa-blacklist.stearns.org/sa-blacklist/README for this dns blacklist : sa-blacklist

0
 
LVL 6

Expert Comment

by:Sanktwo
ID: 16935023
I am not a spamassassin guru nor have I ever used freebsd (maybe I should?). Perhaps someone else can assist you in configuring  spamassassin to trap more spam on freebsd. However, it would be nice if you gave more information about your problems.

You seem to have already given up on training the bayesian filter because your spam mailbox is on the Exchange server. Although I also don't know MS Exchange, surely it is possible to get it to dump a big chunk of spam mails with headers in a text file then feed that back to the bsd system manually. You should not have to do that too frequently to improve the filtering nor does it have to be real time.

In terms of the dns blacklist, you say that you were unable to install, but you need to give any Experts-exchange guru more details. What was the problem? Did you get an error message from somewhere? Could you not set up the cron to get a regular copy or was there some problem with Spamassassin? Also, if you are using Sendmail, Exim, Postfix as the Mail Transfer Agent, then you can use the dns block directly there (though less effectively than with Spamassassin since that can look at contents as well). It would be good if you could describe the mail path i.e. what is your MTA offering SMTP access, how does this relate to Spamassassin and how does the mail subsequently get to the Exchange server. That will help your potential helpers.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This article covers general Notes 8.5 troubleshooting information including recreating the Notes\Data folder.
Are you irritated by repeating emails issue in Microsoft Outlook 2016 after recent update ?  Lets’ see how to resolve and prevent duplicate emails in the Outlook 2016 using some simple techniques.
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
Many of my clients call in with monstrous Gmail overloading issues with Outlook. A quick tip is to turn off the All Mail and Important folders from synching. Here is a quick video I made to show you how to turn off these and other folders in Gmail s…

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

26 Experts available now in Live!

Get 1:1 Help Now