Solved

Bayes not working in spamassassin

Posted on 2014-12-19
9
258 Views
Last Modified: 2015-01-12
I have the same version of spamassassin running on two different Slackware computer. I believe I have enabled the Bayesian classifier on both. One host does give me Bayes scores in the message header, for example:
X-Spam-Report:
        *  0.0 HTML_MESSAGE BODY: HTML included in message
        *  2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60%
        *      [score: 0.5000]

Open in new window

The other host's spamassassin never does. I'm quite sure the recalcitrant host has been trained with plenty of spam and ham.

Any ideas why Bayes in not kicking in on this host? What can I check?
0
Comment
Question by:jmarkfoley
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
9 Comments
 
LVL 21

Expert Comment

by:robocat
ID: 40510539
Do you get a X-Spam-Report at all?
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 40514345
robocat: > Do you get a X-Spam-Report at all?

Yes, here's a complete report from a recent message. Notice no mention of Bayes.
X-Spam-Status: No, score=1.3 required=5.0 tests=AWL,DATE_IN_PAST_03_06,
        HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD autolearn=no
        version=3.3.2
X-Spam-Report:
        * -0.0 SPF_HELO_PASS SPF: HELO matches SPF record
        * -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
        *      domain
        * -0.0 SPF_PASS SPF: sender matches SPF record
        *  1.1 DATE_IN_PAST_03_06 Date: is 3 to 6 hours before Received: date
        *  0.0 HTML_MESSAGE BODY: HTML included in message
        *  0.2 AWL AWL: From: address is in the auto white-list
X-Spam-Level: *
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on

Open in new window

0
 
LVL 21

Assisted Solution

by:robocat
robocat earned 250 total points
ID: 40514531
Run

spamassassin -D --lint

and check for any messages involving "bayes"
0
Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

 
LVL 1

Author Comment

by:jmarkfoley
ID: 40515034
Here are all the bayes related messages from the lint run on the non-working host:
Dec 23 10:50:05.770 [26285] dbg: plugin: loading Mail::SpamAssassin::Plugin::Bayes from @INC
Dec 23 10:50:05.893 [26285] dbg: config: fixed relative path: /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 10:50:05.893 [26285] dbg: config: using "/var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf" for included file
Dec 23 10:50:05.893 [26285] dbg: config: read file /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 10:50:07.189 [26285] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97a7300) implements 'learner_new', priority 0
Dec 23 10:50:07.189 [26285] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x97a7300), bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Dec 23 10:50:07.204 [26285] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x9957380)
Dec 23 10:50:07.212 [26285] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97a7300) implements 'learner_is_scan_available', priority 0
Dec 23 10:50:07.212 [26285] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_toks
Dec 23 10:50:07.212 [26285] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_seen
Dec 23 10:50:07.213 [26285] dbg: bayes: found bayes db version 3
Dec 23 10:50:07.213 [26285] dbg: bayes: DB journal sync: last sync: 0
Dec 23 10:50:07.244 [26285] dbg: bayes: DB journal sync: last sync: 0
Dec 23 10:50:07.244 [26285] dbg: bayes: corpus size: nspam = 679, nham = 1212
Dec 23 10:50:07.269 [26285] dbg: bayes: score = 0.346512389874618
Dec 23 10:50:07.269 [26285] dbg: bayes: DB expiry: tokens in DB: 113590, Expiry max size: 150000, Oldest atime: 1405796976, Newest atime: 1419005450, Last expire: 0, Current time: 1419349807
Dec 23 10:50:07.269 [26285] dbg: bayes: DB journal sync: last sync: 0
Dec 23 10:50:07.269 [26285] dbg: bayes: untie-ing
Dec 23 10:50:07.661 [26285] dbg: rules: ran eval rule BAYES_40 ======> got hit (1)
Dec 23 10:50:07.764 [26285] dbg: check: tests=BAYES_40,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
Dec 23 10:50:07.764 [26285] dbg: timing: total 2019 ms - init: 1469 (72.8%), parse: 0.49 (0.0%), extract_message_metadata: 0.86 (0.0%), get_uri_detail_list: 0.78 (0.0%), tests_pri_-1000: 7 (0.3%), compile_gen: 107 (5.3%), compile_eval: 26 (1.3%), tests_pri_-950: 6 (0.3%), tests_pri_-900: 11 (0.6%), tests_pri_-400: 29 (1.4%), check_bayes: 26 (1.3%), tests_pri_0: 412 (20.4%), tests_pri_500: 78 (3.8%), tests_pri_1000: 3 (0.2%)

Open in new window

I don't see anything particularly suspicious. Here is the same lint command on the host where bayes is working:
Dec 23 11:06:39.168 [7123] dbg: plugin: loading Mail::SpamAssassin::Plugin::Bayes from @INC
Dec 23 11:06:39.276 [7123] dbg: config: fixed relative path: /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 11:06:39.276 [7123] dbg: config: using "/var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf" for included file
Dec 23 11:06:39.276 [7123] dbg: config: read file /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 11:06:40.001 [7123] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97f5ef8) implements 'learner_new', priority 0
Dec 23 11:06:40.001 [7123] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x97f5ef8), bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Dec 23 11:06:40.009 [7123] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x9a471f0)
Dec 23 11:06:40.009 [7123] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97f5ef8) implements 'learner_is_scan_available', priority 0
Dec 23 11:06:40.009 [7123] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_toks
Dec 23 11:06:40.010 [7123] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_seen
Dec 23 11:06:40.010 [7123] dbg: bayes: found bayes db version 3
Dec 23 11:06:40.010 [7123] dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
Dec 23 11:06:40.025 [7123] dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
Dec 23 11:06:40.025 [7123] dbg: bayes: corpus size: nspam = 3783, nham = 5179
Dec 23 11:06:40.045 [7123] dbg: bayes: score = 0.484370339736726
Dec 23 11:06:40.045 [7123] dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
Dec 23 11:06:40.045 [7123] dbg: bayes: untie-ing
Dec 23 11:06:40.220 [7123] dbg: rules: ran eval rule BAYES_50 ======> got hit (1)
Dec 23 11:06:40.274 [7123] dbg: check: tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
Dec 23 11:06:40.275 [7123] dbg: timing: total 1132 ms - init: 868 (76.7%), parse: 0.50 (0.0%), extract_message_metadata: 0.66 (0.1%), get_uri_detail_list: 0.68 (0.1%), tests_pri_-1000: 4 (0.4%), compile_gen: 75 (6.6%), compile_eval: 13 (1.1%), tests_pri_-950: 3 (0.2%), tests_pri_-900: 3 (0.3%), tests_pri_-400: 24 (2.1%), check_bayes: 21 (1.8%), tests_pri_0: 180 (15.9%), tests_pri_500: 47 (4.2%)

Open in new window

This is a diff between the lint runs on the not working host (<) and the working host (>)
12,17c12,16
<  dbg: bayes: DB journal sync: last sync: 0
<  dbg: bayes: DB journal sync: last sync: 0
<  dbg: bayes: corpus size: nspam = 679, nham = 1212
<  dbg: bayes: score = 0.346512389874618
<  dbg: bayes: DB expiry: tokens in DB: 113590, Expiry max size: 150000, Oldest atime: 1405796976, Newest atime
: 1419005450, Last expire: 0, Current time: 1419349807
<  dbg: bayes: DB journal sync: last sync: 0
---
>  dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
>  dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
>  dbg: bayes: corpus size: nspam = 3783, nham = 5179
>  dbg: bayes: score = 0.484370339736726
>  dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
19,21c18,20
<  dbg: rules: ran eval rule BAYES_40 ======> got hit (1)
<  dbg: check: tests=BAYES_40,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
<  dbg: timing: total 2019 ms - init: 1469 (72.8%), parse: 0.49 (0.0%), extract_message_metadata: 0.86 (0.0%),
get_uri_detail_list: 0.78 (0.0%), tests_pri_-1000: 7 (0.3%), compile_gen: 107 (5.3%), compile_eval: 26 (1.3%),
tests_pri_-950: 6 (0.3%), tests_pri_-900: 11 (0.6%), tests_pri_-400: 29 (1.4%), check_bayes: 26 (1.3%), tests_p
ri_0: 412 (20.4%), tests_pri_500: 78 (3.8%), tests_pri_1000: 3 (0.2%)
---
>  dbg: rules: ran eval rule BAYES_50 ======> got hit (1)
>  dbg: check: tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
>  dbg: timing: total 1132 ms - init: 868 (76.7%), parse: 0.50 (0.0%), extract_message_metadata: 0.66 (0.1%), g
et_uri_detail_list: 0.68 (0.1%), tests_pri_-1000: 4 (0.4%), compile_gen: 75 (6.6%), compile_eval: 13 (1.1%), te
sts_pri_-950: 3 (0.2%), tests_pri_-900: 3 (0.3%), tests_pri_-400: 24 (2.1%), check_bayes: 21 (1.8%), tests_pri_
0: 180 (15.9%), tests_pri_500: 47 (4.2%)

Open in new window

The main differences I see are no "DB journal sync: last sync: 0" and "Last expire, 0" messages in the working host, and no "opportunistic call attempt skipped" in the not-working host.

Do these messages tell you anything?
0
 
LVL 63

Accepted Solution

by:
btan earned 250 total points
ID: 40521805
I saw there is a "autolearn=no" in the X-Spam report.
If a message has already been learned by SpamAssassin, then that message will not be learned again. Therefore, if you run a message through SpamAssassin to see why it was classified as spam or ham, and it has already been learned, you will always get the result "autolearn=no". (To see this more clearly, use the "-D" flag, and you will see debug output explaining that the message has already been learned.)
https://wiki.apache.org/spamassassin/AutolearningNotWorking

i am thinking if it need to relearn
If you find that SA never seems to learn messages, try using sa-learn --dump magic to find out more about your database. The line "nham" is the number of ham messages SA has learned, and the line "nspam" is the number of spam messages SA has learned.
http://wiki.apache.org/spamassassin/BayesNotWorking

Both messages shared in the working and non-working difference are pertaining to Bayes expiration of its tokens learnt in the messages seen.
SpamAssassin can sync the journal and expire the DB tokens either manually or opportunistically. A journal sync is due if --sync is passed to sa-learn (manual), or if the following is true (opportunistic)
(see Expiration section and good to check out the "Getting started" and "Effective Learning") http://spamassassin.apache.org/full/3.3.x/doc/sa-learn.html
0
 
LVL 21

Expert Comment

by:robocat
ID: 40531525
Can you run Spamassassin manually in debug mode?

spamassassin -D bayes <sometestspam.txt
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 40537693
sorry for the delay -- flu. Will test this afternoon
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 40545409
Well, it just started working all of a sudden! Perhaps I didn't have enough spam/ham in the database, but I was sure I did. I guess this one is solved.
0

Featured Post

Increase your protection from Zero Day threats!

Running two Antivirus' is never a good idea.
Taking advantage of Multiple Security layers on the other hand can often save your hide.
See which top notch security software brands have been proven to happily coexist together.
Reduce your chances of becoming a statistic.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article demonstrates probably the easiest way to configure domain-wide tier isolation within Active Directory. If you do not know tier isolation read https://technet.microsoft.com/en-us/windows-server-docs/security/securing-privileged-access/s…
Auditing domain password hashes is a commonly overlooked but critical requirement to ensuring secure passwords practices are followed. Methods exist to extract hashes directly for a live domain however this article describes a process to extract u…
In this Micro Video tutorial you will learn the basics about Database Availability Groups and How to configure one using a live Exchange Server Environment. The video tutorial explains the basics of the Exchange server Database Availability grou…
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question