[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
Solved

# spamassassin and Bayes

Posted on 2014-08-24
Medium Priority
240 Views
I am in the process of learning about Spamassassin and Bayes. I have been doing some reading a lot of the internet, but I am not sure I am grasping something here. If someone could help me understand, that would be great.

first, I know there are many reasons why a message could or could not get trapped by a spam filter. However, I am trying to this one part out. If a clearly spam message has Bayes_00 in the header, (which means Bayes thinks the message is ham)  does this mean that at one point, spamassassin/Bayes learned this message as ham either via some automatic means or by someone running sa-learn --ham on the spam message (for some unknown reason) ... If so, is there any way to know for sure it was learned as ham by Bayes? Or is that what Bayes_00 means?

Is it possible to have Bayes_00 in the header, but the message was never learned as either ham or spam? It just has Bayes_00 due to other reasons?  I am not sure if I am understanding this right or even verbalizing it right. I appreciate any help in understanding.
0
Question by:camstutz
• 6
• 3
• 2

LVL 84

Accepted Solution

David Johnson, CD, MVP earned 2000 total points
ID: 40282495
It can mean either that the user has told it that it is ham OR that the message doesn't fulfill the rules to classify it as spam.
0

Author Comment

ID: 40282525
Hello David,

Thank you for answering, I'm glad I am not completely crazy, I did think that might be one possibility. If I may, I have a few more questions.

Is there any way to know for sure which was the case? Does it really matter? It seems as if the answer is the same either way: learn it as spam.

Note  We do have other scoring in place (AWL, razor, etc) I've read that if Bayes becomes heavily weighted toward spam or ham (via it's learned tokens - seen via sa-learn -dump magic) that it could produce this outcome as well? would a difference of 3000 (favoring ham) mean that the database is off?

is it possible to create a query with sa-learn or spamassassin to view if and when the message was learned as ham, or if it was just not enough points scored? If you use sa-learn --ham or --spam it would just relearn the message as one or the other.
0

LVL 13

Expert Comment

ID: 40282533
I would suggest you to try MailScanner which can clarify you like on what basis msg has been declared as what (ham/spam).  It has a web panel where you can clearly see the score of mail which can lead to further classification of message.

Ty/SA
0

Author Comment

ID: 40282851
Thanks for the information Sandy, unfortunately, it is not an option at this point, However, I do have the ability to read through the message headers (which in includes spam assassin scoring for our clients emails. I neglected to mention that we run email servers for our clients. However, at the moment, I don't think implementing MailScanner is possible.

I appreciate the suggestion though.
0

LVL 13

Expert Comment

ID: 40282853
ok.. but even if you read the message header body it also have spam score indication if razor/fizer/dcc are in-place and configured.

Ty/SA
0

Author Comment

ID: 40282872
I do see that, not every messsage has a razor indication. However, using our current implementation of spamassasssin (mail scanner looks like a replacement), I was just wondering if there were to answer a few more the questions I have :)
Pretty much, I get that if it has Bayes_00, the best course of action is to learn it as spam - regardless of the before-mentioned reasoning. However, I guess I wanted to drill into mail filtering a bit more and figure out if Bayes truly learned this spam as ham once. Oh well, I guess I spend a few moments with spamc / spamassassin man pages. Maybe it isn't possible either.
0

LVL 13

Expert Comment

ID: 40282905
AFAIK about Bayes_00 it sense the message as spam when the spam score comes more than 20 which is the default nature hence it is always suggested to configure multiple rules with own defination to define. This interface is intellegent but somehow this also needs little more fine tune to understand the exact spam. In corporate mails it is bit tough for the system to judge based on bayes_00.

Not sure check if it has auto generated mail id's like :ham@domain and spam@domain (like in Zimbra) which are being used for manual feeding of spamfilters to understand which is spam or which is ham.

Hope m clear here. :)

TY/SA
0

Author Comment

ID: 40284662
Thanks for the comment Sandy,
0

Author Comment

ID: 40307786
I'm reading that the default for ham autolearning is .1, however, I notice that many spam are getting autolearned as ham with a .8 score. How do I check to see what is wrong?
0

Author Comment

ID: 40334361
David: thanks for your answer, I do receive a lot of spam email that doesn't hit any scoring and says bayes_00. is there any way to deal with this? I appreciate your previous answer, and the only thing I can think of is to learn it as spam for them or clear the bayes db for them.

I understand in theory, if bayes was not available at the time, and for some reason the net tests were also unavailable ... but to not even match local rules? that seems possible, but unlikely. either something is wrong or misconfigured or something.
0

LVL 84

Expert Comment

ID: 40334379
the more you use it the better it gets..
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question