Bayes not working in spamassassin

I have the same version of spamassassin running on two different Slackware computer. I believe I have enabled the Bayesian classifier on both. One host does give me Bayes scores in the message header, for example:
X-Spam-Report:
        *  0.0 HTML_MESSAGE BODY: HTML included in message
        *  2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60%
        *      [score: 0.5000]

Open in new window

The other host's spamassassin never does. I'm quite sure the recalcitrant host has been trained with plenty of spam and ham.

Any ideas why Bayes in not kicking in on this host? What can I check?
LVL 1
jmarkfoleyAsked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
btanConnect With a Mentor Exec ConsultantCommented:
I saw there is a "autolearn=no" in the X-Spam report.
If a message has already been learned by SpamAssassin, then that message will not be learned again. Therefore, if you run a message through SpamAssassin to see why it was classified as spam or ham, and it has already been learned, you will always get the result "autolearn=no". (To see this more clearly, use the "-D" flag, and you will see debug output explaining that the message has already been learned.)
https://wiki.apache.org/spamassassin/AutolearningNotWorking

i am thinking if it need to relearn
If you find that SA never seems to learn messages, try using sa-learn --dump magic to find out more about your database. The line "nham" is the number of ham messages SA has learned, and the line "nspam" is the number of spam messages SA has learned.
http://wiki.apache.org/spamassassin/BayesNotWorking

Both messages shared in the working and non-working difference are pertaining to Bayes expiration of its tokens learnt in the messages seen.
SpamAssassin can sync the journal and expire the DB tokens either manually or opportunistically. A journal sync is due if --sync is passed to sa-learn (manual), or if the following is true (opportunistic)
(see Expiration section and good to check out the "Getting started" and "Effective Learning") http://spamassassin.apache.org/full/3.3.x/doc/sa-learn.html
0
 
robocatCommented:
Do you get a X-Spam-Report at all?
0
 
jmarkfoleyAuthor Commented:
robocat: > Do you get a X-Spam-Report at all?

Yes, here's a complete report from a recent message. Notice no mention of Bayes.
X-Spam-Status: No, score=1.3 required=5.0 tests=AWL,DATE_IN_PAST_03_06,
        HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD autolearn=no
        version=3.3.2
X-Spam-Report:
        * -0.0 SPF_HELO_PASS SPF: HELO matches SPF record
        * -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
        *      domain
        * -0.0 SPF_PASS SPF: sender matches SPF record
        *  1.1 DATE_IN_PAST_03_06 Date: is 3 to 6 hours before Received: date
        *  0.0 HTML_MESSAGE BODY: HTML included in message
        *  0.2 AWL AWL: From: address is in the auto white-list
X-Spam-Level: *
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on

Open in new window

0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
robocatConnect With a Mentor Commented:
Run

spamassassin -D --lint

and check for any messages involving "bayes"
0
 
jmarkfoleyAuthor Commented:
Here are all the bayes related messages from the lint run on the non-working host:
Dec 23 10:50:05.770 [26285] dbg: plugin: loading Mail::SpamAssassin::Plugin::Bayes from @INC
Dec 23 10:50:05.893 [26285] dbg: config: fixed relative path: /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 10:50:05.893 [26285] dbg: config: using "/var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf" for included file
Dec 23 10:50:05.893 [26285] dbg: config: read file /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 10:50:07.189 [26285] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97a7300) implements 'learner_new', priority 0
Dec 23 10:50:07.189 [26285] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x97a7300), bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Dec 23 10:50:07.204 [26285] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x9957380)
Dec 23 10:50:07.212 [26285] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97a7300) implements 'learner_is_scan_available', priority 0
Dec 23 10:50:07.212 [26285] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_toks
Dec 23 10:50:07.212 [26285] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_seen
Dec 23 10:50:07.213 [26285] dbg: bayes: found bayes db version 3
Dec 23 10:50:07.213 [26285] dbg: bayes: DB journal sync: last sync: 0
Dec 23 10:50:07.244 [26285] dbg: bayes: DB journal sync: last sync: 0
Dec 23 10:50:07.244 [26285] dbg: bayes: corpus size: nspam = 679, nham = 1212
Dec 23 10:50:07.269 [26285] dbg: bayes: score = 0.346512389874618
Dec 23 10:50:07.269 [26285] dbg: bayes: DB expiry: tokens in DB: 113590, Expiry max size: 150000, Oldest atime: 1405796976, Newest atime: 1419005450, Last expire: 0, Current time: 1419349807
Dec 23 10:50:07.269 [26285] dbg: bayes: DB journal sync: last sync: 0
Dec 23 10:50:07.269 [26285] dbg: bayes: untie-ing
Dec 23 10:50:07.661 [26285] dbg: rules: ran eval rule BAYES_40 ======> got hit (1)
Dec 23 10:50:07.764 [26285] dbg: check: tests=BAYES_40,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
Dec 23 10:50:07.764 [26285] dbg: timing: total 2019 ms - init: 1469 (72.8%), parse: 0.49 (0.0%), extract_message_metadata: 0.86 (0.0%), get_uri_detail_list: 0.78 (0.0%), tests_pri_-1000: 7 (0.3%), compile_gen: 107 (5.3%), compile_eval: 26 (1.3%), tests_pri_-950: 6 (0.3%), tests_pri_-900: 11 (0.6%), tests_pri_-400: 29 (1.4%), check_bayes: 26 (1.3%), tests_pri_0: 412 (20.4%), tests_pri_500: 78 (3.8%), tests_pri_1000: 3 (0.2%)

Open in new window

I don't see anything particularly suspicious. Here is the same lint command on the host where bayes is working:
Dec 23 11:06:39.168 [7123] dbg: plugin: loading Mail::SpamAssassin::Plugin::Bayes from @INC
Dec 23 11:06:39.276 [7123] dbg: config: fixed relative path: /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 11:06:39.276 [7123] dbg: config: using "/var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf" for included file
Dec 23 11:06:39.276 [7123] dbg: config: read file /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 11:06:40.001 [7123] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97f5ef8) implements 'learner_new', priority 0
Dec 23 11:06:40.001 [7123] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x97f5ef8), bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Dec 23 11:06:40.009 [7123] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x9a471f0)
Dec 23 11:06:40.009 [7123] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97f5ef8) implements 'learner_is_scan_available', priority 0
Dec 23 11:06:40.009 [7123] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_toks
Dec 23 11:06:40.010 [7123] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_seen
Dec 23 11:06:40.010 [7123] dbg: bayes: found bayes db version 3
Dec 23 11:06:40.010 [7123] dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
Dec 23 11:06:40.025 [7123] dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
Dec 23 11:06:40.025 [7123] dbg: bayes: corpus size: nspam = 3783, nham = 5179
Dec 23 11:06:40.045 [7123] dbg: bayes: score = 0.484370339736726
Dec 23 11:06:40.045 [7123] dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
Dec 23 11:06:40.045 [7123] dbg: bayes: untie-ing
Dec 23 11:06:40.220 [7123] dbg: rules: ran eval rule BAYES_50 ======> got hit (1)
Dec 23 11:06:40.274 [7123] dbg: check: tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
Dec 23 11:06:40.275 [7123] dbg: timing: total 1132 ms - init: 868 (76.7%), parse: 0.50 (0.0%), extract_message_metadata: 0.66 (0.1%), get_uri_detail_list: 0.68 (0.1%), tests_pri_-1000: 4 (0.4%), compile_gen: 75 (6.6%), compile_eval: 13 (1.1%), tests_pri_-950: 3 (0.2%), tests_pri_-900: 3 (0.3%), tests_pri_-400: 24 (2.1%), check_bayes: 21 (1.8%), tests_pri_0: 180 (15.9%), tests_pri_500: 47 (4.2%)

Open in new window

This is a diff between the lint runs on the not working host (<) and the working host (>)
12,17c12,16
<  dbg: bayes: DB journal sync: last sync: 0
<  dbg: bayes: DB journal sync: last sync: 0
<  dbg: bayes: corpus size: nspam = 679, nham = 1212
<  dbg: bayes: score = 0.346512389874618
<  dbg: bayes: DB expiry: tokens in DB: 113590, Expiry max size: 150000, Oldest atime: 1405796976, Newest atime
: 1419005450, Last expire: 0, Current time: 1419349807
<  dbg: bayes: DB journal sync: last sync: 0
---
>  dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
>  dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
>  dbg: bayes: corpus size: nspam = 3783, nham = 5179
>  dbg: bayes: score = 0.484370339736726
>  dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
19,21c18,20
<  dbg: rules: ran eval rule BAYES_40 ======> got hit (1)
<  dbg: check: tests=BAYES_40,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
<  dbg: timing: total 2019 ms - init: 1469 (72.8%), parse: 0.49 (0.0%), extract_message_metadata: 0.86 (0.0%),
get_uri_detail_list: 0.78 (0.0%), tests_pri_-1000: 7 (0.3%), compile_gen: 107 (5.3%), compile_eval: 26 (1.3%),
tests_pri_-950: 6 (0.3%), tests_pri_-900: 11 (0.6%), tests_pri_-400: 29 (1.4%), check_bayes: 26 (1.3%), tests_p
ri_0: 412 (20.4%), tests_pri_500: 78 (3.8%), tests_pri_1000: 3 (0.2%)
---
>  dbg: rules: ran eval rule BAYES_50 ======> got hit (1)
>  dbg: check: tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
>  dbg: timing: total 1132 ms - init: 868 (76.7%), parse: 0.50 (0.0%), extract_message_metadata: 0.66 (0.1%), g
et_uri_detail_list: 0.68 (0.1%), tests_pri_-1000: 4 (0.4%), compile_gen: 75 (6.6%), compile_eval: 13 (1.1%), te
sts_pri_-950: 3 (0.2%), tests_pri_-900: 3 (0.3%), tests_pri_-400: 24 (2.1%), check_bayes: 21 (1.8%), tests_pri_
0: 180 (15.9%), tests_pri_500: 47 (4.2%)

Open in new window

The main differences I see are no "DB journal sync: last sync: 0" and "Last expire, 0" messages in the working host, and no "opportunistic call attempt skipped" in the not-working host.

Do these messages tell you anything?
0
 
robocatCommented:
Can you run Spamassassin manually in debug mode?

spamassassin -D bayes <sometestspam.txt
0
 
jmarkfoleyAuthor Commented:
sorry for the delay -- flu. Will test this afternoon
0
 
jmarkfoleyAuthor Commented:
Well, it just started working all of a sudden! Perhaps I didn't have enough spam/ham in the database, but I was sure I did. I guess this one is solved.
0
All Courses

From novice to tech pro — start learning today.