Solved

Bayes not working in spamassassin

Posted on 2014-12-19
9
271 Views
Last Modified: 2015-01-12
I have the same version of spamassassin running on two different Slackware computer. I believe I have enabled the Bayesian classifier on both. One host does give me Bayes scores in the message header, for example:
X-Spam-Report:
        *  0.0 HTML_MESSAGE BODY: HTML included in message
        *  2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60%
        *      [score: 0.5000]

Open in new window

The other host's spamassassin never does. I'm quite sure the recalcitrant host has been trained with plenty of spam and ham.

Any ideas why Bayes in not kicking in on this host? What can I check?
0
Comment
Question by:jmarkfoley
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
9 Comments
 
LVL 22

Expert Comment

by:robocat
ID: 40510539
Do you get a X-Spam-Report at all?
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 40514345
robocat: > Do you get a X-Spam-Report at all?

Yes, here's a complete report from a recent message. Notice no mention of Bayes.
X-Spam-Status: No, score=1.3 required=5.0 tests=AWL,DATE_IN_PAST_03_06,
        HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD autolearn=no
        version=3.3.2
X-Spam-Report:
        * -0.0 SPF_HELO_PASS SPF: HELO matches SPF record
        * -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
        *      domain
        * -0.0 SPF_PASS SPF: sender matches SPF record
        *  1.1 DATE_IN_PAST_03_06 Date: is 3 to 6 hours before Received: date
        *  0.0 HTML_MESSAGE BODY: HTML included in message
        *  0.2 AWL AWL: From: address is in the auto white-list
X-Spam-Level: *
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on

Open in new window

0
 
LVL 22

Assisted Solution

by:robocat
robocat earned 250 total points
ID: 40514531
Run

spamassassin -D --lint

and check for any messages involving "bayes"
0
Why Off-Site Backups Are The Only Way To Go

You are probably backing up your data—but how and where? Ransomware is on the rise and there are variants that specifically target backups. Read on to discover why off-site is the way to go.

 
LVL 1

Author Comment

by:jmarkfoley
ID: 40515034
Here are all the bayes related messages from the lint run on the non-working host:
Dec 23 10:50:05.770 [26285] dbg: plugin: loading Mail::SpamAssassin::Plugin::Bayes from @INC
Dec 23 10:50:05.893 [26285] dbg: config: fixed relative path: /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 10:50:05.893 [26285] dbg: config: using "/var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf" for included file
Dec 23 10:50:05.893 [26285] dbg: config: read file /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 10:50:07.189 [26285] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97a7300) implements 'learner_new', priority 0
Dec 23 10:50:07.189 [26285] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x97a7300), bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Dec 23 10:50:07.204 [26285] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x9957380)
Dec 23 10:50:07.212 [26285] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97a7300) implements 'learner_is_scan_available', priority 0
Dec 23 10:50:07.212 [26285] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_toks
Dec 23 10:50:07.212 [26285] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_seen
Dec 23 10:50:07.213 [26285] dbg: bayes: found bayes db version 3
Dec 23 10:50:07.213 [26285] dbg: bayes: DB journal sync: last sync: 0
Dec 23 10:50:07.244 [26285] dbg: bayes: DB journal sync: last sync: 0
Dec 23 10:50:07.244 [26285] dbg: bayes: corpus size: nspam = 679, nham = 1212
Dec 23 10:50:07.269 [26285] dbg: bayes: score = 0.346512389874618
Dec 23 10:50:07.269 [26285] dbg: bayes: DB expiry: tokens in DB: 113590, Expiry max size: 150000, Oldest atime: 1405796976, Newest atime: 1419005450, Last expire: 0, Current time: 1419349807
Dec 23 10:50:07.269 [26285] dbg: bayes: DB journal sync: last sync: 0
Dec 23 10:50:07.269 [26285] dbg: bayes: untie-ing
Dec 23 10:50:07.661 [26285] dbg: rules: ran eval rule BAYES_40 ======> got hit (1)
Dec 23 10:50:07.764 [26285] dbg: check: tests=BAYES_40,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
Dec 23 10:50:07.764 [26285] dbg: timing: total 2019 ms - init: 1469 (72.8%), parse: 0.49 (0.0%), extract_message_metadata: 0.86 (0.0%), get_uri_detail_list: 0.78 (0.0%), tests_pri_-1000: 7 (0.3%), compile_gen: 107 (5.3%), compile_eval: 26 (1.3%), tests_pri_-950: 6 (0.3%), tests_pri_-900: 11 (0.6%), tests_pri_-400: 29 (1.4%), check_bayes: 26 (1.3%), tests_pri_0: 412 (20.4%), tests_pri_500: 78 (3.8%), tests_pri_1000: 3 (0.2%)

Open in new window

I don't see anything particularly suspicious. Here is the same lint command on the host where bayes is working:
Dec 23 11:06:39.168 [7123] dbg: plugin: loading Mail::SpamAssassin::Plugin::Bayes from @INC
Dec 23 11:06:39.276 [7123] dbg: config: fixed relative path: /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 11:06:39.276 [7123] dbg: config: using "/var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf" for included file
Dec 23 11:06:39.276 [7123] dbg: config: read file /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 11:06:40.001 [7123] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97f5ef8) implements 'learner_new', priority 0
Dec 23 11:06:40.001 [7123] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x97f5ef8), bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Dec 23 11:06:40.009 [7123] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x9a471f0)
Dec 23 11:06:40.009 [7123] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97f5ef8) implements 'learner_is_scan_available', priority 0
Dec 23 11:06:40.009 [7123] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_toks
Dec 23 11:06:40.010 [7123] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_seen
Dec 23 11:06:40.010 [7123] dbg: bayes: found bayes db version 3
Dec 23 11:06:40.010 [7123] dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
Dec 23 11:06:40.025 [7123] dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
Dec 23 11:06:40.025 [7123] dbg: bayes: corpus size: nspam = 3783, nham = 5179
Dec 23 11:06:40.045 [7123] dbg: bayes: score = 0.484370339736726
Dec 23 11:06:40.045 [7123] dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
Dec 23 11:06:40.045 [7123] dbg: bayes: untie-ing
Dec 23 11:06:40.220 [7123] dbg: rules: ran eval rule BAYES_50 ======> got hit (1)
Dec 23 11:06:40.274 [7123] dbg: check: tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
Dec 23 11:06:40.275 [7123] dbg: timing: total 1132 ms - init: 868 (76.7%), parse: 0.50 (0.0%), extract_message_metadata: 0.66 (0.1%), get_uri_detail_list: 0.68 (0.1%), tests_pri_-1000: 4 (0.4%), compile_gen: 75 (6.6%), compile_eval: 13 (1.1%), tests_pri_-950: 3 (0.2%), tests_pri_-900: 3 (0.3%), tests_pri_-400: 24 (2.1%), check_bayes: 21 (1.8%), tests_pri_0: 180 (15.9%), tests_pri_500: 47 (4.2%)

Open in new window

This is a diff between the lint runs on the not working host (<) and the working host (>)
12,17c12,16
<  dbg: bayes: DB journal sync: last sync: 0
<  dbg: bayes: DB journal sync: last sync: 0
<  dbg: bayes: corpus size: nspam = 679, nham = 1212
<  dbg: bayes: score = 0.346512389874618
<  dbg: bayes: DB expiry: tokens in DB: 113590, Expiry max size: 150000, Oldest atime: 1405796976, Newest atime
: 1419005450, Last expire: 0, Current time: 1419349807
<  dbg: bayes: DB journal sync: last sync: 0
---
>  dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
>  dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
>  dbg: bayes: corpus size: nspam = 3783, nham = 5179
>  dbg: bayes: score = 0.484370339736726
>  dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
19,21c18,20
<  dbg: rules: ran eval rule BAYES_40 ======> got hit (1)
<  dbg: check: tests=BAYES_40,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
<  dbg: timing: total 2019 ms - init: 1469 (72.8%), parse: 0.49 (0.0%), extract_message_metadata: 0.86 (0.0%),
get_uri_detail_list: 0.78 (0.0%), tests_pri_-1000: 7 (0.3%), compile_gen: 107 (5.3%), compile_eval: 26 (1.3%),
tests_pri_-950: 6 (0.3%), tests_pri_-900: 11 (0.6%), tests_pri_-400: 29 (1.4%), check_bayes: 26 (1.3%), tests_p
ri_0: 412 (20.4%), tests_pri_500: 78 (3.8%), tests_pri_1000: 3 (0.2%)
---
>  dbg: rules: ran eval rule BAYES_50 ======> got hit (1)
>  dbg: check: tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
>  dbg: timing: total 1132 ms - init: 868 (76.7%), parse: 0.50 (0.0%), extract_message_metadata: 0.66 (0.1%), g
et_uri_detail_list: 0.68 (0.1%), tests_pri_-1000: 4 (0.4%), compile_gen: 75 (6.6%), compile_eval: 13 (1.1%), te
sts_pri_-950: 3 (0.2%), tests_pri_-900: 3 (0.3%), tests_pri_-400: 24 (2.1%), check_bayes: 21 (1.8%), tests_pri_
0: 180 (15.9%), tests_pri_500: 47 (4.2%)

Open in new window

The main differences I see are no "DB journal sync: last sync: 0" and "Last expire, 0" messages in the working host, and no "opportunistic call attempt skipped" in the not-working host.

Do these messages tell you anything?
0
 
LVL 64

Accepted Solution

by:
btan earned 250 total points
ID: 40521805
I saw there is a "autolearn=no" in the X-Spam report.
If a message has already been learned by SpamAssassin, then that message will not be learned again. Therefore, if you run a message through SpamAssassin to see why it was classified as spam or ham, and it has already been learned, you will always get the result "autolearn=no". (To see this more clearly, use the "-D" flag, and you will see debug output explaining that the message has already been learned.)
https://wiki.apache.org/spamassassin/AutolearningNotWorking

i am thinking if it need to relearn
If you find that SA never seems to learn messages, try using sa-learn --dump magic to find out more about your database. The line "nham" is the number of ham messages SA has learned, and the line "nspam" is the number of spam messages SA has learned.
http://wiki.apache.org/spamassassin/BayesNotWorking

Both messages shared in the working and non-working difference are pertaining to Bayes expiration of its tokens learnt in the messages seen.
SpamAssassin can sync the journal and expire the DB tokens either manually or opportunistically. A journal sync is due if --sync is passed to sa-learn (manual), or if the following is true (opportunistic)
(see Expiration section and good to check out the "Getting started" and "Effective Learning") http://spamassassin.apache.org/full/3.3.x/doc/sa-learn.html
0
 
LVL 22

Expert Comment

by:robocat
ID: 40531525
Can you run Spamassassin manually in debug mode?

spamassassin -D bayes <sometestspam.txt
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 40537693
sorry for the delay -- flu. Will test this afternoon
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 40545409
Well, it just started working all of a sudden! Perhaps I didn't have enough spam/ham in the database, but I was sure I did. I guess this one is solved.
0

Featured Post

What, When and Where - Security Threats from Q1

Join Corey Nachreiner, CTO, and Marc Laliberte, Information Security Threat Analyst, on July 26th as they explore their key findings from the first quarter of 2017.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The conference as a whole was very interesting, although if one has to make a choice between this one and some others, you may want to check out the others.  This conference is aimed mainly at government agencies.  So it addresses the various compli…
A look at what happened in the Verizon cloud breach.
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
Email security requires an ever evolving service that stays up to date with counter-evolving threats. The Email Laundry perform Research and Development to ensure their email security service evolves faster than cyber criminals. We apply our Threat…

627 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question