Solved

SPAMASSASSIN and bayes how to

Posted on 2006-06-14
6
455 Views
Last Modified: 2008-02-01
We are running Spamassassin here on our mail server. Then the legitimate mail gets forwarded to our exchange server. We have had alot of spam come through to users, and would like to put an end to this. I've read that the bayes filter is a great way. Right now messages marked as spam go into a exchange mailbox called spam-trap. I need better documentation and understanding of bayes. It seems as if the tool it uses requires the mail to be available locally, but mine is on the exchange server.

Any information wo uld be helpful
0
Comment
Question by:shankshank
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
6 Comments
 
LVL 6

Accepted Solution

by:
Sanktwo earned 500 total points
ID: 16923625
For a detailed description, look up "bayesian inference" in Wikipedia, but here is a potted version.
Bayesian inference means to test a hypothesis based upon weighting evidence as to the probability of it being true or not.
For example, if an email contains the word "Weight" and "slimming", does it have a higher probability of being spam than one that doesn't?
For most of us the answer is "yes" unless you happen to be a site dealing in weight loss.
A Bayesian based filter requires data regarding what words and phrases imply that the message is spam (positives). Also, what words imply that it is not spam (negatives). Each mail is then weighted to say whether, on balance, it is more likely to be spam than not. Usually you are permitted to set the threshold at which it decides that an email is spam.
Thus Bayesian inference requires:
1. That you "train" the filter to match your circumstances (although it might come with some pre-training data) i.e. you tell it what is spam and what is not for quite a large number of messages
2. you choose a spam score threshold relevant to your business needs.
3. That the spammers are not sufficiently clever to pad the spam with "good" words and phrases so your filter cannot decide.

You say that "messages marked as spam go into spam-trap" - if this is already a bayesian inference filter, then those messages left may be from good spammers who know (3) above and you are unlikely to gain much by adding another Bayesian inference engine.

All you need for a Bayesian inference test is the full body of the email and the training data of the bayesian inference engine. This will allocate a score (say 1-100) and you decide the level at which it is decided that it is "spam".

Does that help?
0
 
LVL 6

Expert Comment

by:Sanktwo
ID: 16923712
I should add that you should take all steps to avoid validating email addresses to spammers in your domain e.g.
1. do NOT bounce messages which are addressed to users not in your domain.
2. Do not permit the use of HTML viewers which download images (such as older versions of Outlook Express) since image names containing database references are used to validate email addresses.
3. Tell all your users NEVER to attempt to "unsubscribe from spam mailing lists" (they just validate the address and continue sending).

Although it might upset your user and customers somewhat, try to avoid real names for all your new users. Pick something like Peter-405-Jones rather than Peter.Jones since spammers often try dictionary attacks to identify valid email addresses.

Ensure that you website does not contain harvestable email addresses i.e. use images or other obscuring techniques.
0
 
LVL 5

Author Comment

by:shankshank
ID: 16934070
Thanks for the info. So how do I train spamassassin
0
[Live Webinar] The Cloud Skills Gap

As Cloud technologies come of age, business leaders grapple with the impact it has on their team's skills and the gap associated with the use of a cloud platform.

Join experts from 451 Research and Concerto Cloud Services on July 27th where we will examine fact and fiction.

 
LVL 6

Expert Comment

by:Sanktwo
ID: 16934146
0
 
LVL 5

Author Comment

by:shankshank
ID: 16934176
and how is this?
http://wiki.apache.org/spamassassin/CustomRulesets

I am running freebsd and was unable to install according to instructions provided http://www.sa-blacklist.stearns.org/sa-blacklist/README for this dns blacklist : sa-blacklist

0
 
LVL 6

Expert Comment

by:Sanktwo
ID: 16935023
I am not a spamassassin guru nor have I ever used freebsd (maybe I should?). Perhaps someone else can assist you in configuring  spamassassin to trap more spam on freebsd. However, it would be nice if you gave more information about your problems.

You seem to have already given up on training the bayesian filter because your spam mailbox is on the Exchange server. Although I also don't know MS Exchange, surely it is possible to get it to dump a big chunk of spam mails with headers in a text file then feed that back to the bsd system manually. You should not have to do that too frequently to improve the filtering nor does it have to be real time.

In terms of the dns blacklist, you say that you were unable to install, but you need to give any Experts-exchange guru more details. What was the problem? Did you get an error message from somewhere? Could you not set up the cron to get a regular copy or was there some problem with Spamassassin? Also, if you are using Sendmail, Exim, Postfix as the Mail Transfer Agent, then you can use the dns block directly there (though less effectively than with Spamassassin since that can look at contents as well). It would be good if you could describe the mail path i.e. what is your MTA offering SMTP access, how does this relate to Spamassassin and how does the mail subsequently get to the Exchange server. That will help your potential helpers.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Resolve Outlook connectivity issues after moving mailbox to new Exchange 2016 server
Read this checklist to learn more about the 15 things you should never include in an email signature.
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
Many of my clients call in with monstrous Gmail overloading issues with Outlook. A quick tip is to turn off the All Mail and Important folders from synching. Here is a quick video I made to show you how to turn off these and other folders in Gmail s…

617 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question