Link to home
Start Free TrialLog in
Avatar of icsbudapest
icsbudapestFlag for Hungary

asked on

Debian/Postfix/Spamassasin bayes log error

I have this shown up in my Debian/Postfix/Spamassasin system log:

/etc/cron.daily/amavisd-new:
bayes: unknown packing format for bayes db, please re-learn: 70 at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 1875.

Any idea what this is?
Avatar of grblades
grblades
Flag of United Kingdom of Great Britain and Northern Ireland image

What version of spamassassin are you running?

can you do a "sa-learn --dump magic" while logged in as the same user as spamassassin runs as. You should get something like :-

# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0       2853          0  non-token data: nspam
0.000          0       1110          0  non-token data: nham
0.000          0     153157          0  non-token data: ntokens
0.000          0  971861971          0  non-token data: oldest atime
0.000          0 1186176350          0  non-token data: newest atime
0.000          0 1186176637          0  non-token data: last journal sync atime
0.000          0 1186164585          0  non-token data: last expiry atime
0.000          0   11059200          0  non-token data: last expire atime delta
0.000          0      15468          0  non-token data: last expire reduction count

Can you also do a "ls -la" of the directory containing the bayes database files. If you dont know where these are doing a 'locate bayes_toks' should find it as long as the locate utility is installed.
Avatar of icsbudapest

ASKER

Thanks for the response GRBLADES,

SpamAssassin version 3.1.7-deb

With sa-learn --dump magic I got:
0.000          0          3          0  non-token data: bayes db version
0.000          0      31333          0  non-token data: nspam
0.000          0       2621          0  non-token data: nham
0.000          0     218145          0  non-token data: ntokens
0.000          0 1181239397          0  non-token data: oldest atime
0.000          0 1186385258          0  non-token data: newest atime
0.000          0 1186385296          0  non-token data: last journal sync atime
0.000          0 1181626547          0  non-token data: last expiry atime
0.000          0     473964          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire reduction count

I was also able to do a ls -la of the bayes directory. (as the user that runs SA)

ASKER CERTIFIED SOLUTION
Avatar of grblades
grblades
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I will try out your suggestions and report back.

The spam filtering has been pretty good. I'm really not a spam specialist, and to be honest this is my first dedicated spam filter box that I have built myself. (I've administered one for some time at a micro-company.) It is basically just filtering out the spam/viruses and passing it to our internal server.

I basically configured in the same way as this guide: http://www200.pair.com/mecham/spam/spamfilter20061118.html#applications
with some minor changes.
Thats a very well written guide.

Rather than amavisd-new I use mailscanner (http://www.mailscanner.info) which is much better in my opinion as it does a lot more but is more complex to setup. Amavis is probably better for a small user system.

Have a look at http://sanesecurity.co.uk/. They do unofficial additional signatures for clamav which allows it to detect a lot of the phishing, stock image and stock pdf spam as viruses.

Have a look at http://www.rulesemporium.com/. There are some very good additional rules which you can either download manually (they get updated very infrequently) or configure sa-update to download updates automatically.
There is also a imageinfo plugin which detects certain image characteristics. Spamassassin 3.2 and above include this by default so if you are planning to upgrade then dont bother installing this now.
A new plugin is pdfinfo which is very good. It is currently just out of beta so is updated fairly frequently.

Here is a script which downloads the KAM rules which are updated multiple times a week and detect the latest stock, greeting card and other recent spams.
#!/bin/bash
cd /etc/mail/spamassassin
cp -f KAM.cf /tmp/
wget -N http://www.peregrinehw.com/downloads/SpamAssassin/contrib/KAM.cf
/etc/init.d/spamd restart
Thanks so much for the extra tips on rules. I will go through that info as soon as I have a chance.
I finally got around to upgrading spamassassin. Now running v2.1. Still getting the error though. It pops up when I try to do a:

sa-learn --force-expire                                
bayes: synced databases from journal in 0 seconds: 95 unique entries (113 total entries)
bayes: unknown packing format for bayes db, please re-learn: 70 at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 1879.
expired old bayes database entries in 9 seconds
218262 entries kept, 0 deleted
token frequency: 1-occurrence tokens: 0.00%
token frequency: less than 8 occurrences: 0.00%


Well I gave up. Did a sa-learn --clear and just cleared out my bayes database ( after creating a backup) and started with a fresh one. Not a real big deal since this server has only been up a few months, but hopefully it wont happen again. I will have to "re-learn" it but that shouldn't be a problem. We'll see if the database clear or spamassassin upgrade creates any new problems, but at least I won't have the database problem.
I assume you mean 3.2.1 and not 2.1?

Can you post a directory listing of the bayes directory?

You might have to backup, clear and restore the bayes database incase it is corrupted. Please post a directory listing first incase you have an old database format in which case you can use 'sa-learn --import' instead.
sa-learn -- backup >/tmp/bayes.backup
sa-learn --clear
sa-learn --restore /tmp/bayes.backup
Yea, I did mean 3.2.1

Here is my bayes dir:
# ls -al /usr/share/perl5/Mail/SpamAssassin/BayesStore/
total 196
drwxr-xr-x  2 root root  4096 2007-08-27 10:47 .
drwxr-xr-x 10 root root  4096 2007-08-27 10:47 ..
-rw-r--r--  1 root root 58788 2007-06-08 14:55 DBM.pm
-rw-r--r--  1 root root 27275 2007-06-08 14:55 MySQL.pm
-rw-r--r--  1 root root 26668 2007-06-08 14:55 PgSQL.pm
-rw-r--r--  1 root root  1917 2007-06-08 14:55 SDBM.pm
-rw-r--r--  1 root root 58595 2007-06-08 14:55 SQL.pm

But as I said, I already backed it up then cleared it. It is currently running with a new fresh one. I hesitate to restore the old one. Is there any reason to restore the old one other then the fact that it has one month of "experience?"
That directory is just the perl code used to manage bayes. The bayes database directory should contain files like :-

[root@gbhome .spamassassin]# ls -la
total 5164
drwx------    2 spam     spam         4096 Aug 27 06:25 .
drwx------    6 spam     spam         4096 Aug 24 11:18 ..
-rw-------    1 spam     spam       659456 Aug 27 15:25 auto-whitelist
-rw-------    1 spam     spam            6 Aug 27 15:25 auto-whitelist.mutex
-rw-------    1 spam     spam        36168 Aug 27 15:25 bayes_journal
-rw-------    1 spam     spam         1656 Aug 27 15:25 bayes.mutex
-rw-------    1 spam     spam       651264 Aug 27 15:25 bayes_seen
-rw-------    1 spam     spam      5398528 Aug 27 15:25 bayes_toks
-rw-rw-r--    1 spam     spam         1487 May 10 18:26 user_prefs

There is no problem with not importing the old data apart from you will have to wait until it gets 300 spam and non-spam messages before it will start working so you will probably let through more spam in the meantime.

A database backup backs up the data in human readable form so if you do a restore you are not simply copying back a potentially corrupt database.
After the update and bayes "clear" it seems that the error message is gone.
Thanks a lot for your help. I think I have a good handle on Bayes Db now.
Running into a different problem now though. (getting error for many messages like: Aug 30 17:10:02 icsb postfix/smtp[15617]: 8DB1CA008C: lost connection with b.mx.mail.yahoo.com[66.196.97.250] while sending message body) But I will start a new question regarding that.
That error is nothing to worry about. It just means the mail server was very slow responding after the ititial connection. It will try again to another one of their mail servers or again later.
Just a last note. The error messages were actually because my antivirus (clamAV) stopped functioning properly after the upgrade. ClamAV was upgraded and the clamd had to be re-added to the amavisd group. The messages kept trying to loop through the AV untill they gave up or got pushed through, but that really is another topic for another day. :)
Thanks a lot grbaldes, you exceed your reputation as an excellent help.