Go Premium for a chance to win a PS4. Enter to Win


SPAMASSASSIN and bayes how to

Posted on 2006-06-14
Medium Priority
Last Modified: 2008-02-01
We are running Spamassassin here on our mail server. Then the legitimate mail gets forwarded to our exchange server. We have had alot of spam come through to users, and would like to put an end to this. I've read that the bayes filter is a great way. Right now messages marked as spam go into a exchange mailbox called spam-trap. I need better documentation and understanding of bayes. It seems as if the tool it uses requires the mail to be available locally, but mine is on the exchange server.

Any information wo uld be helpful
Question by:shankshank
  • 4
  • 2

Accepted Solution

Sanktwo earned 1500 total points
ID: 16923625
For a detailed description, look up "bayesian inference" in Wikipedia, but here is a potted version.
Bayesian inference means to test a hypothesis based upon weighting evidence as to the probability of it being true or not.
For example, if an email contains the word "Weight" and "slimming", does it have a higher probability of being spam than one that doesn't?
For most of us the answer is "yes" unless you happen to be a site dealing in weight loss.
A Bayesian based filter requires data regarding what words and phrases imply that the message is spam (positives). Also, what words imply that it is not spam (negatives). Each mail is then weighted to say whether, on balance, it is more likely to be spam than not. Usually you are permitted to set the threshold at which it decides that an email is spam.
Thus Bayesian inference requires:
1. That you "train" the filter to match your circumstances (although it might come with some pre-training data) i.e. you tell it what is spam and what is not for quite a large number of messages
2. you choose a spam score threshold relevant to your business needs.
3. That the spammers are not sufficiently clever to pad the spam with "good" words and phrases so your filter cannot decide.

You say that "messages marked as spam go into spam-trap" - if this is already a bayesian inference filter, then those messages left may be from good spammers who know (3) above and you are unlikely to gain much by adding another Bayesian inference engine.

All you need for a Bayesian inference test is the full body of the email and the training data of the bayesian inference engine. This will allocate a score (say 1-100) and you decide the level at which it is decided that it is "spam".

Does that help?

Expert Comment

ID: 16923712
I should add that you should take all steps to avoid validating email addresses to spammers in your domain e.g.
1. do NOT bounce messages which are addressed to users not in your domain.
2. Do not permit the use of HTML viewers which download images (such as older versions of Outlook Express) since image names containing database references are used to validate email addresses.
3. Tell all your users NEVER to attempt to "unsubscribe from spam mailing lists" (they just validate the address and continue sending).

Although it might upset your user and customers somewhat, try to avoid real names for all your new users. Pick something like Peter-405-Jones rather than Peter.Jones since spammers often try dictionary attacks to identify valid email addresses.

Ensure that you website does not contain harvestable email addresses i.e. use images or other obscuring techniques.

Author Comment

ID: 16934070
Thanks for the info. So how do I train spamassassin
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Expert Comment

ID: 16934146

Author Comment

ID: 16934176
and how is this?

I am running freebsd and was unable to install according to instructions provided http://www.sa-blacklist.stearns.org/sa-blacklist/README for this dns blacklist : sa-blacklist


Expert Comment

ID: 16935023
I am not a spamassassin guru nor have I ever used freebsd (maybe I should?). Perhaps someone else can assist you in configuring  spamassassin to trap more spam on freebsd. However, it would be nice if you gave more information about your problems.

You seem to have already given up on training the bayesian filter because your spam mailbox is on the Exchange server. Although I also don't know MS Exchange, surely it is possible to get it to dump a big chunk of spam mails with headers in a text file then feed that back to the bsd system manually. You should not have to do that too frequently to improve the filtering nor does it have to be real time.

In terms of the dns blacklist, you say that you were unable to install, but you need to give any Experts-exchange guru more details. What was the problem? Did you get an error message from somewhere? Could you not set up the cron to get a regular copy or was there some problem with Spamassassin? Also, if you are using Sendmail, Exim, Postfix as the Mail Transfer Agent, then you can use the dns block directly there (though less effectively than with Spamassassin since that can look at contents as well). It would be good if you could describe the mail path i.e. what is your MTA offering SMTP access, how does this relate to Spamassassin and how does the mail subsequently get to the Exchange server. That will help your potential helpers.

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

We are happy to announce a brand new addition to our line of acclaimed email signature management products – CodeTwo Email Signatures for Office 365.
We aren’t perfect, just like everyone else.  Check out the email errors our community caught and learn the top errors every email marketer should avoid.
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
Many of my clients call in with monstrous Gmail overloading issues with Outlook. A quick tip is to turn off the All Mail and Important folders from synching. Here is a quick video I made to show you how to turn off these and other folders in Gmail s…

885 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question