Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Bayes not working in spamassassin

Posted on 2014-12-19
9
Medium Priority
?
289 Views
Last Modified: 2015-01-12
I have the same version of spamassassin running on two different Slackware computer. I believe I have enabled the Bayesian classifier on both. One host does give me Bayes scores in the message header, for example:
X-Spam-Report:
        *  0.0 HTML_MESSAGE BODY: HTML included in message
        *  2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60%
        *      [score: 0.5000]

Open in new window

The other host's spamassassin never does. I'm quite sure the recalcitrant host has been trained with plenty of spam and ham.

Any ideas why Bayes in not kicking in on this host? What can I check?
0
Comment
Question by:jmarkfoley
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
9 Comments
 
LVL 22

Expert Comment

by:robocat
ID: 40510539
Do you get a X-Spam-Report at all?
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 40514345
robocat: > Do you get a X-Spam-Report at all?

Yes, here's a complete report from a recent message. Notice no mention of Bayes.
X-Spam-Status: No, score=1.3 required=5.0 tests=AWL,DATE_IN_PAST_03_06,
        HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD autolearn=no
        version=3.3.2
X-Spam-Report:
        * -0.0 SPF_HELO_PASS SPF: HELO matches SPF record
        * -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
        *      domain
        * -0.0 SPF_PASS SPF: sender matches SPF record
        *  1.1 DATE_IN_PAST_03_06 Date: is 3 to 6 hours before Received: date
        *  0.0 HTML_MESSAGE BODY: HTML included in message
        *  0.2 AWL AWL: From: address is in the auto white-list
X-Spam-Level: *
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on

Open in new window

0
 
LVL 22

Assisted Solution

by:robocat
robocat earned 1000 total points
ID: 40514531
Run

spamassassin -D --lint

and check for any messages involving "bayes"
0
Moving data to the cloud? Find out if you’re ready

Before moving to the cloud, it is important to carefully define your db needs, plan for the migration & understand prod. environment. This wp explains how to define what you need from a cloud provider, plan for the migration & what putting a cloud solution into practice entails.

 
LVL 1

Author Comment

by:jmarkfoley
ID: 40515034
Here are all the bayes related messages from the lint run on the non-working host:
Dec 23 10:50:05.770 [26285] dbg: plugin: loading Mail::SpamAssassin::Plugin::Bayes from @INC
Dec 23 10:50:05.893 [26285] dbg: config: fixed relative path: /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 10:50:05.893 [26285] dbg: config: using "/var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf" for included file
Dec 23 10:50:05.893 [26285] dbg: config: read file /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 10:50:07.189 [26285] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97a7300) implements 'learner_new', priority 0
Dec 23 10:50:07.189 [26285] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x97a7300), bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Dec 23 10:50:07.204 [26285] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x9957380)
Dec 23 10:50:07.212 [26285] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97a7300) implements 'learner_is_scan_available', priority 0
Dec 23 10:50:07.212 [26285] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_toks
Dec 23 10:50:07.212 [26285] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_seen
Dec 23 10:50:07.213 [26285] dbg: bayes: found bayes db version 3
Dec 23 10:50:07.213 [26285] dbg: bayes: DB journal sync: last sync: 0
Dec 23 10:50:07.244 [26285] dbg: bayes: DB journal sync: last sync: 0
Dec 23 10:50:07.244 [26285] dbg: bayes: corpus size: nspam = 679, nham = 1212
Dec 23 10:50:07.269 [26285] dbg: bayes: score = 0.346512389874618
Dec 23 10:50:07.269 [26285] dbg: bayes: DB expiry: tokens in DB: 113590, Expiry max size: 150000, Oldest atime: 1405796976, Newest atime: 1419005450, Last expire: 0, Current time: 1419349807
Dec 23 10:50:07.269 [26285] dbg: bayes: DB journal sync: last sync: 0
Dec 23 10:50:07.269 [26285] dbg: bayes: untie-ing
Dec 23 10:50:07.661 [26285] dbg: rules: ran eval rule BAYES_40 ======> got hit (1)
Dec 23 10:50:07.764 [26285] dbg: check: tests=BAYES_40,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
Dec 23 10:50:07.764 [26285] dbg: timing: total 2019 ms - init: 1469 (72.8%), parse: 0.49 (0.0%), extract_message_metadata: 0.86 (0.0%), get_uri_detail_list: 0.78 (0.0%), tests_pri_-1000: 7 (0.3%), compile_gen: 107 (5.3%), compile_eval: 26 (1.3%), tests_pri_-950: 6 (0.3%), tests_pri_-900: 11 (0.6%), tests_pri_-400: 29 (1.4%), check_bayes: 26 (1.3%), tests_pri_0: 412 (20.4%), tests_pri_500: 78 (3.8%), tests_pri_1000: 3 (0.2%)

Open in new window

I don't see anything particularly suspicious. Here is the same lint command on the host where bayes is working:
Dec 23 11:06:39.168 [7123] dbg: plugin: loading Mail::SpamAssassin::Plugin::Bayes from @INC
Dec 23 11:06:39.276 [7123] dbg: config: fixed relative path: /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 11:06:39.276 [7123] dbg: config: using "/var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf" for included file
Dec 23 11:06:39.276 [7123] dbg: config: read file /var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Dec 23 11:06:40.001 [7123] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97f5ef8) implements 'learner_new', priority 0
Dec 23 11:06:40.001 [7123] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x97f5ef8), bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Dec 23 11:06:40.009 [7123] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x9a471f0)
Dec 23 11:06:40.009 [7123] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x97f5ef8) implements 'learner_is_scan_available', priority 0
Dec 23 11:06:40.009 [7123] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_toks
Dec 23 11:06:40.010 [7123] dbg: bayes: tie-ing to DB file R/O /root/.spamassassin/bayes_seen
Dec 23 11:06:40.010 [7123] dbg: bayes: found bayes db version 3
Dec 23 11:06:40.010 [7123] dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
Dec 23 11:06:40.025 [7123] dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
Dec 23 11:06:40.025 [7123] dbg: bayes: corpus size: nspam = 3783, nham = 5179
Dec 23 11:06:40.045 [7123] dbg: bayes: score = 0.484370339736726
Dec 23 11:06:40.045 [7123] dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
Dec 23 11:06:40.045 [7123] dbg: bayes: untie-ing
Dec 23 11:06:40.220 [7123] dbg: rules: ran eval rule BAYES_50 ======> got hit (1)
Dec 23 11:06:40.274 [7123] dbg: check: tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
Dec 23 11:06:40.275 [7123] dbg: timing: total 1132 ms - init: 868 (76.7%), parse: 0.50 (0.0%), extract_message_metadata: 0.66 (0.1%), get_uri_detail_list: 0.68 (0.1%), tests_pri_-1000: 4 (0.4%), compile_gen: 75 (6.6%), compile_eval: 13 (1.1%), tests_pri_-950: 3 (0.2%), tests_pri_-900: 3 (0.3%), tests_pri_-400: 24 (2.1%), check_bayes: 21 (1.8%), tests_pri_0: 180 (15.9%), tests_pri_500: 47 (4.2%)

Open in new window

This is a diff between the lint runs on the not working host (<) and the working host (>)
12,17c12,16
<  dbg: bayes: DB journal sync: last sync: 0
<  dbg: bayes: DB journal sync: last sync: 0
<  dbg: bayes: corpus size: nspam = 679, nham = 1212
<  dbg: bayes: score = 0.346512389874618
<  dbg: bayes: DB expiry: tokens in DB: 113590, Expiry max size: 150000, Oldest atime: 1405796976, Newest atime
: 1419005450, Last expire: 0, Current time: 1419349807
<  dbg: bayes: DB journal sync: last sync: 0
---
>  dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
>  dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
>  dbg: bayes: corpus size: nspam = 3783, nham = 5179
>  dbg: bayes: score = 0.484370339736726
>  dbg: bayes: opportunistic call attempt skipped, found fresh running expire magic token
19,21c18,20
<  dbg: rules: ran eval rule BAYES_40 ======> got hit (1)
<  dbg: check: tests=BAYES_40,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
<  dbg: timing: total 2019 ms - init: 1469 (72.8%), parse: 0.49 (0.0%), extract_message_metadata: 0.86 (0.0%),
get_uri_detail_list: 0.78 (0.0%), tests_pri_-1000: 7 (0.3%), compile_gen: 107 (5.3%), compile_eval: 26 (1.3%),
tests_pri_-950: 6 (0.3%), tests_pri_-900: 11 (0.6%), tests_pri_-400: 29 (1.4%), check_bayes: 26 (1.3%), tests_p
ri_0: 412 (20.4%), tests_pri_500: 78 (3.8%), tests_pri_1000: 3 (0.2%)
---
>  dbg: rules: ran eval rule BAYES_50 ======> got hit (1)
>  dbg: check: tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
>  dbg: timing: total 1132 ms - init: 868 (76.7%), parse: 0.50 (0.0%), extract_message_metadata: 0.66 (0.1%), g
et_uri_detail_list: 0.68 (0.1%), tests_pri_-1000: 4 (0.4%), compile_gen: 75 (6.6%), compile_eval: 13 (1.1%), te
sts_pri_-950: 3 (0.2%), tests_pri_-900: 3 (0.3%), tests_pri_-400: 24 (2.1%), check_bayes: 21 (1.8%), tests_pri_
0: 180 (15.9%), tests_pri_500: 47 (4.2%)

Open in new window

The main differences I see are no "DB journal sync: last sync: 0" and "Last expire, 0" messages in the working host, and no "opportunistic call attempt skipped" in the not-working host.

Do these messages tell you anything?
0
 
LVL 64

Accepted Solution

by:
btan earned 1000 total points
ID: 40521805
I saw there is a "autolearn=no" in the X-Spam report.
If a message has already been learned by SpamAssassin, then that message will not be learned again. Therefore, if you run a message through SpamAssassin to see why it was classified as spam or ham, and it has already been learned, you will always get the result "autolearn=no". (To see this more clearly, use the "-D" flag, and you will see debug output explaining that the message has already been learned.)
https://wiki.apache.org/spamassassin/AutolearningNotWorking

i am thinking if it need to relearn
If you find that SA never seems to learn messages, try using sa-learn --dump magic to find out more about your database. The line "nham" is the number of ham messages SA has learned, and the line "nspam" is the number of spam messages SA has learned.
http://wiki.apache.org/spamassassin/BayesNotWorking

Both messages shared in the working and non-working difference are pertaining to Bayes expiration of its tokens learnt in the messages seen.
SpamAssassin can sync the journal and expire the DB tokens either manually or opportunistically. A journal sync is due if --sync is passed to sa-learn (manual), or if the following is true (opportunistic)
(see Expiration section and good to check out the "Getting started" and "Effective Learning") http://spamassassin.apache.org/full/3.3.x/doc/sa-learn.html
0
 
LVL 22

Expert Comment

by:robocat
ID: 40531525
Can you run Spamassassin manually in debug mode?

spamassassin -D bayes <sometestspam.txt
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 40537693
sorry for the delay -- flu. Will test this afternoon
0
 
LVL 1

Author Comment

by:jmarkfoley
ID: 40545409
Well, it just started working all of a sudden! Perhaps I didn't have enough spam/ham in the database, but I was sure I did. I guess this one is solved.
0

Featured Post

Ransomware: The New Cyber Threat & How to Stop It

This infographic explains ransomware, type of malware that blocks access to your files or your systems and holds them hostage until a ransom is paid. It also examines the different types of ransomware and explains what you can do to thwart this sinister online threat.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

IF you are either unfamiliar with rootkits, or want to know more about them, read on ....
Check out what's been happening in the Experts Exchange community.
In this Micro Video tutorial you will learn the basics about Database Availability Groups and How to configure one using a live Exchange Server Environment. The video tutorial explains the basics of the Exchange server Database Availability grou…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question