SPAMASSASSIN and bayes how to

Posted on 2006-06-14
Medium Priority
Last Modified: 2008-02-01
We are running Spamassassin here on our mail server. Then the legitimate mail gets forwarded to our exchange server. We have had alot of spam come through to users, and would like to put an end to this. I've read that the bayes filter is a great way. Right now messages marked as spam go into a exchange mailbox called spam-trap. I need better documentation and understanding of bayes. It seems as if the tool it uses requires the mail to be available locally, but mine is on the exchange server.

Any information wo uld be helpful
Question by:shankshank
  • 4
  • 2

Accepted Solution

Sanktwo earned 1500 total points
ID: 16923625
For a detailed description, look up "bayesian inference" in Wikipedia, but here is a potted version.
Bayesian inference means to test a hypothesis based upon weighting evidence as to the probability of it being true or not.
For example, if an email contains the word "Weight" and "slimming", does it have a higher probability of being spam than one that doesn't?
For most of us the answer is "yes" unless you happen to be a site dealing in weight loss.
A Bayesian based filter requires data regarding what words and phrases imply that the message is spam (positives). Also, what words imply that it is not spam (negatives). Each mail is then weighted to say whether, on balance, it is more likely to be spam than not. Usually you are permitted to set the threshold at which it decides that an email is spam.
Thus Bayesian inference requires:
1. That you "train" the filter to match your circumstances (although it might come with some pre-training data) i.e. you tell it what is spam and what is not for quite a large number of messages
2. you choose a spam score threshold relevant to your business needs.
3. That the spammers are not sufficiently clever to pad the spam with "good" words and phrases so your filter cannot decide.

You say that "messages marked as spam go into spam-trap" - if this is already a bayesian inference filter, then those messages left may be from good spammers who know (3) above and you are unlikely to gain much by adding another Bayesian inference engine.

All you need for a Bayesian inference test is the full body of the email and the training data of the bayesian inference engine. This will allocate a score (say 1-100) and you decide the level at which it is decided that it is "spam".

Does that help?

Expert Comment

ID: 16923712
I should add that you should take all steps to avoid validating email addresses to spammers in your domain e.g.
1. do NOT bounce messages which are addressed to users not in your domain.
2. Do not permit the use of HTML viewers which download images (such as older versions of Outlook Express) since image names containing database references are used to validate email addresses.
3. Tell all your users NEVER to attempt to "unsubscribe from spam mailing lists" (they just validate the address and continue sending).

Although it might upset your user and customers somewhat, try to avoid real names for all your new users. Pick something like Peter-405-Jones rather than Peter.Jones since spammers often try dictionary attacks to identify valid email addresses.

Ensure that you website does not contain harvestable email addresses i.e. use images or other obscuring techniques.

Author Comment

ID: 16934070
Thanks for the info. So how do I train spamassassin
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.


Expert Comment

ID: 16934146

Author Comment

ID: 16934176
and how is this?

I am running freebsd and was unable to install according to instructions provided http://www.sa-blacklist.stearns.org/sa-blacklist/README for this dns blacklist : sa-blacklist


Expert Comment

ID: 16935023
I am not a spamassassin guru nor have I ever used freebsd (maybe I should?). Perhaps someone else can assist you in configuring  spamassassin to trap more spam on freebsd. However, it would be nice if you gave more information about your problems.

You seem to have already given up on training the bayesian filter because your spam mailbox is on the Exchange server. Although I also don't know MS Exchange, surely it is possible to get it to dump a big chunk of spam mails with headers in a text file then feed that back to the bsd system manually. You should not have to do that too frequently to improve the filtering nor does it have to be real time.

In terms of the dns blacklist, you say that you were unable to install, but you need to give any Experts-exchange guru more details. What was the problem? Did you get an error message from somewhere? Could you not set up the cron to get a regular copy or was there some problem with Spamassassin? Also, if you are using Sendmail, Exim, Postfix as the Mail Transfer Agent, then you can use the dns block directly there (though less effectively than with Spamassassin since that can look at contents as well). It would be good if you could describe the mail path i.e. what is your MTA offering SMTP access, how does this relate to Spamassassin and how does the mail subsequently get to the Exchange server. That will help your potential helpers.

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Read this checklist to learn more about the 15 things you should never include in an email signature.
Today as you open your Outlook, you witness an error message: “Outlook is using an old copy of your Outlook Data File…”. Probably, Outlook is accessing an old OST file.
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
Planning to migrate your EDB file(s) to a new or an existing Outlook PST file? This video will guide you how to convert EDB file(s) to PST. Besides this, it also describes, how one can easily search any item(s) from multiple folders or mailboxes…

607 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question