asked on

spamassassin and Bayes

I am in the process of learning about Spamassassin and Bayes. I have been doing some reading a lot of the internet, but I am not sure I am grasping something here. If someone could help me understand, that would be great.

first, I know there are many reasons why a message could or could not get trapped by a spam filter. However, I am trying to this one part out. If a clearly spam message has Bayes_00 in the header, (which means Bayes thinks the message is ham) does this mean that at one point, spamassassin/Bayes learned this message as ham either via some automatic means or by someone running sa-learn --ham on the spam message (for some unknown reason) ... If so, is there any way to know for sure it was learned as ham by Bayes? Or is that what Bayes_00 means?

Is it possible to have Bayes_00 in the header, but the message was never learned as either ham or spam? It just has Bayes_00 due to other reasons? I am not sure if I am understanding this right or even verbalizing it right. I appreciate any help in understanding.

ASKER CERTIFIED SOLUTION

David Johnson, CD

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

camstutz

ASKER

Hello David,

Thank you for answering, I'm glad I am not completely crazy, I did think that might be one possibility. If I may, I have a few more questions.

Is there any way to know for sure which was the case? Does it really matter? It seems as if the answer is the same either way: learn it as spam.

Note We do have other scoring in place (AWL, razor, etc) I've read that if Bayes becomes heavily weighted toward spam or ham (via it's learned tokens - seen via sa-learn -dump magic) that it could produce this outcome as well? would a difference of 3000 (favoring ham) mean that the database is off?

is it possible to create a query with sa-learn or spamassassin to view if and when the message was learned as ham, or if it was just not enough points scored? If you use sa-learn --ham or --spam it would just relearn the message as one or the other.

Sandy

I would suggest you to try MailScanner which can clarify you like on what basis msg has been declared as what (ham/spam). It has a web panel where you can clearly see the score of mail which can lead to further classification of message.

Ty/SA

camstutz

ASKER

Thanks for the information Sandy, unfortunately, it is not an option at this point, However, I do have the ability to read through the message headers (which in includes spam assassin scoring for our clients emails. I neglected to mention that we run email servers for our clients. However, at the moment, I don't think implementing MailScanner is possible.

I appreciate the suggestion though.

Sandy

ok.. but even if you read the message header body it also have spam score indication if razor/fizer/dcc are in-place and configured.

Ty/SA

camstutz

ASKER

I do see that, not every messsage has a razor indication. However, using our current implementation of spamassasssin (mail scanner looks like a replacement), I was just wondering if there were to answer a few more the questions I have :)
Pretty much, I get that if it has Bayes_00, the best course of action is to learn it as spam - regardless of the before-mentioned reasoning. However, I guess I wanted to drill into mail filtering a bit more and figure out if Bayes truly learned this spam as ham once. Oh well, I guess I spend a few moments with spamc / spamassassin man pages. Maybe it isn't possible either.

Sandy

AFAIK about Bayes_00 it sense the message as spam when the spam score comes more than 20 which is the default nature hence it is always suggested to configure multiple rules with own defination to define. This interface is intellegent but somehow this also needs little more fine tune to understand the exact spam. In corporate mails it is bit tough for the system to judge based on bayes_00.

Not sure check if it has auto generated mail id's like :ham@domain and spam@domain (like in Zimbra) which are being used for manual feeding of spamfilters to understand which is spam or which is ham.

Hope m clear here. :)

TY/SA

camstutz

ASKER

Thanks for the comment Sandy,

camstutz

ASKER

I'm reading that the default for ham autolearning is .1, however, I notice that many spam are getting autolearned as ham with a .8 score. How do I check to see what is wrong?

camstutz

ASKER

David: thanks for your answer, I do receive a lot of spam email that doesn't hit any scoring and says bayes_00. is there any way to deal with this? I appreciate your previous answer, and the only thing I can think of is to learn it as spam for them or clear the bayes db for them.

I understand in theory, if bayes was not available at the time, and for some reason the net tests were also unavailable ... but to not even match local rules? that seems possible, but unlikely. either something is wrong or misconfigured or something.

David Johnson, CD

the more you use it the better it gets..