SPAMASSASSIN and bayes how to

Posted on 2006-06-14
Last Modified: 2008-02-01
We are running Spamassassin here on our mail server. Then the legitimate mail gets forwarded to our exchange server. We have had alot of spam come through to users, and would like to put an end to this. I've read that the bayes filter is a great way. Right now messages marked as spam go into a exchange mailbox called spam-trap. I need better documentation and understanding of bayes. It seems as if the tool it uses requires the mail to be available locally, but mine is on the exchange server.

Any information wo uld be helpful
Question by:shankshank
  • 4
  • 2

Accepted Solution

Sanktwo earned 500 total points
ID: 16923625
For a detailed description, look up "bayesian inference" in Wikipedia, but here is a potted version.
Bayesian inference means to test a hypothesis based upon weighting evidence as to the probability of it being true or not.
For example, if an email contains the word "Weight" and "slimming", does it have a higher probability of being spam than one that doesn't?
For most of us the answer is "yes" unless you happen to be a site dealing in weight loss.
A Bayesian based filter requires data regarding what words and phrases imply that the message is spam (positives). Also, what words imply that it is not spam (negatives). Each mail is then weighted to say whether, on balance, it is more likely to be spam than not. Usually you are permitted to set the threshold at which it decides that an email is spam.
Thus Bayesian inference requires:
1. That you "train" the filter to match your circumstances (although it might come with some pre-training data) i.e. you tell it what is spam and what is not for quite a large number of messages
2. you choose a spam score threshold relevant to your business needs.
3. That the spammers are not sufficiently clever to pad the spam with "good" words and phrases so your filter cannot decide.

You say that "messages marked as spam go into spam-trap" - if this is already a bayesian inference filter, then those messages left may be from good spammers who know (3) above and you are unlikely to gain much by adding another Bayesian inference engine.

All you need for a Bayesian inference test is the full body of the email and the training data of the bayesian inference engine. This will allocate a score (say 1-100) and you decide the level at which it is decided that it is "spam".

Does that help?

Expert Comment

ID: 16923712
I should add that you should take all steps to avoid validating email addresses to spammers in your domain e.g.
1. do NOT bounce messages which are addressed to users not in your domain.
2. Do not permit the use of HTML viewers which download images (such as older versions of Outlook Express) since image names containing database references are used to validate email addresses.
3. Tell all your users NEVER to attempt to "unsubscribe from spam mailing lists" (they just validate the address and continue sending).

Although it might upset your user and customers somewhat, try to avoid real names for all your new users. Pick something like Peter-405-Jones rather than Peter.Jones since spammers often try dictionary attacks to identify valid email addresses.

Ensure that you website does not contain harvestable email addresses i.e. use images or other obscuring techniques.

Author Comment

ID: 16934070
Thanks for the info. So how do I train spamassassin
Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.


Expert Comment

ID: 16934146

Author Comment

ID: 16934176
and how is this?

I am running freebsd and was unable to install according to instructions provided for this dns blacklist : sa-blacklist


Expert Comment

ID: 16935023
I am not a spamassassin guru nor have I ever used freebsd (maybe I should?). Perhaps someone else can assist you in configuring  spamassassin to trap more spam on freebsd. However, it would be nice if you gave more information about your problems.

You seem to have already given up on training the bayesian filter because your spam mailbox is on the Exchange server. Although I also don't know MS Exchange, surely it is possible to get it to dump a big chunk of spam mails with headers in a text file then feed that back to the bsd system manually. You should not have to do that too frequently to improve the filtering nor does it have to be real time.

In terms of the dns blacklist, you say that you were unable to install, but you need to give any Experts-exchange guru more details. What was the problem? Did you get an error message from somewhere? Could you not set up the cron to get a regular copy or was there some problem with Spamassassin? Also, if you are using Sendmail, Exim, Postfix as the Mail Transfer Agent, then you can use the dns block directly there (though less effectively than with Spamassassin since that can look at contents as well). It would be good if you could describe the mail path i.e. what is your MTA offering SMTP access, how does this relate to Spamassassin and how does the mail subsequently get to the Exchange server. That will help your potential helpers.

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SMTP in azure websites 2 105
Exchange 2010 and 2016 Co-Existence 24 191
Google created temp account I can't delete 11 129
Microsoft Edge, Outlook OWA 7 50
Email signatures have numerous marketing benefits. Here are 8 top reasons to turn your email signature into a marketing channel.
An analysis of the phishing scam that has been affecting Google users, along with steps to take for protection, as well as what to do if you receive one of the emails.
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
Many of my clients call in with monstrous Gmail overloading issues with Outlook. A quick tip is to turn off the All Mail and Important folders from synching. Here is a quick video I made to show you how to turn off these and other folders in Gmail s…

813 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now