asked on

KSpam not learning spam patterns for one user

I have Kspam on my Domino server which up until this point no one complained of receiving multiple spammed messages. There is one user who actually setup the KSPam configuration who is receiving the majority of spam. I would like to know how should I configure the Bayesian configuration document in order to make this agent work effectively? In addition to this configuration should I also manually create rules? If so, what should be the numbering setup?

Below is the current Bayesian configuration:
General Tab:
Instance ID
This is a string of six characters or less, should be the same for every server in your organisation (KS_IID).      Rescon
Default action
This is the default action to be taken when one of the hard coded rules is matched (KS_DEFAULTACTION).      0 - Accept
Default probability increase
The default probability increase, only used if increase probability is selected for the default action (KS_DEFAULT_PROB_INC).      0%
Mark messages with a reason field?
Add KS_REASON item to an email if a rule is matched (KS_MARK).      Yes
Reload configuration every hour?
(KS_RELOAD).      Yes
Show statistics?
Log statistics under smtp.kSpam.* (KS_STATS).      Yes
Minimum "From:" header length?
Minimum length of the From: header (KS_MIN_FROM_LENGTH).      0
Maximum numbers in sender's username?
Maximum number of integers in the sender's username (KS_MAX_FROM_INTS).      0
First character in "From:" header must not be a number?
(KS_FILTER_FROM_INT).      No
Other forms to scan?
Forms other than Memo and Reply delimited by commas (KS_INTERESTING_FORMS).
Add recipients list to copied and denied messages?
Add KS_RECIPIENTS readers field to denied messages, username in email address must me included in the recipients username field in their person document. ( KS_RECIPIENTS).      No
Copied mail database
Database to copy copied messages to, default is mailspam.nsf (KS_COPIED_DB).
Turn on debugging?
Create log in ks_debug.txt (KS_DEBUG).      No

Bayesian Tab:
Bayesian filter enabled
Enable the Bayesian filter (KS_BAYESIAN_FILTER).      Yes
Token reload period
Period of time between recalculating probabilities (KS_BL_PERIOD).      360
Probability boundary
Boundary probability at which email is considered spam (KS_BAYESIAN_BOUNDARY).      90%
Mark messages with token list and probability?
Add KS_BL_PROB and KS_BL_TOKENS to incoming emails (KS_BAYESIAN_MARK).      Yes
Good message ratio
Ratio of good emails passing through the server before an email is copied to the good mail database (KS_BAYESIAN_RATIO).      5
Bayesian action
Action to take when an emails are probably spam (KS_BAYESIAN_ACTION).      3 - Copy & Deny
Mark with
Text to mark emails with is the default Bayesian action is mark. (KS_BAYESIAN_ACTION_MARK_WITH).
Tokens to ignore
Tokens to ignore when calculating probabilities (KS_BL_IGNORE).
Dump token lists to file
Write token lists to files goodlist.txt and spamlist.txt (KS_BL_DUMPLISTS).      No
Preparation setting 1
All emails that pass all rules without being matched are placed in the mailgood.nsf database (KS_BAYESIAN_PREP).      No
Preparation setting 2
All emails with a probability greater than 90% are copied to mailspam.nsf, all emails with a probability of less than 10% are copied to mailgood.nsf (KS_BAYESIAN_PREP_2).      No
Turn on debugging?
Create bload.txt log file (KS_BL_DEBUG).      No

Sjef Bosman

How many good and bad mails have been collected in mailgood and mailspam?

jahhan

ASKER

Yesterday mailgood collect 93 documents and mailspam 27.

Sjef Bosman

Apparently, the totals in both databases should reach at least 1000 a piece before good predictions can be expected.

jahhan

ASKER

This database has been in production since 6/2006. Each database should have collected 1000 documents by now.

jahhan

ASKER

I checked the number of documents for each database. Mailgood has 898 while mailspam has 30.

Sjef Bosman

Mailspam only 30? That's not even close to 1000, I'd say... I assume you looked in the Database Properties, 2nd tab, for the number of documents. I also assume that you activated the Bayesian filter only recently, and that the other rules you have work very well for you. Most rules, do they have Deny&Copy, or only Deny? I suppose the latter, otherwise you'd have had a large collection right now.

If you could post some of those rules in another EE-question, http:Q_22023216.html "kSpam standard rules", I'd be very grateful. The info there could also be interesting to you.

jahhan

ASKER

SJEF,
The configuration document was created on 6/19.
The rules are set to deny&copy, Bayesian filter enabled.

Sjef Bosman

Please verify the content of mailspam.nsf, in the database properties window. If there are only 30 mails in it, the Bayesian filter cannot function. You could manually add spam from your spam-folder in the mail database.

jahhan

ASKER

i was actually thinking about adding spam to speed up the process. Do it matter if there are priority rules starting with 200, or should I start from the number 1?

Sjef Bosman

Rule numbers are relevant only to the sequencing in the view, and hence to the order in which they are processed.

Yes, by all means, add all the spam you can find! Adding duplicates is useless, by the way...

jahhan

ASKER

i am not famiilar with regex formulas. I was able to access the site you referenced and add in those formulas. The orignal rules that were located in my database were rules that identified email addresses. Are there any other rules I can use that you did not specify?

Sjef Bosman

More rules? That's why I created that question... :-))

Rules based on email addresses are of very little use. I know that even my personal address has been used to send spam, and I'd hate to be blocked for that reason. Best practice: some rules denying the most obvious spam (private parts, watches, pills, etc), and the rest is for the Bayesian filter. Very important: check the mailspam database regularly, at least once a day, even more often during the first few days or weeks (every hour!). Open the mailspam.nsf database, select ALL mails, then deselect good mails and click on Confirm as Spam. The rest is good, so select them and click Move to Good, so they will be delivered.

jahhan

ASKER

the thread that contains the reg exp's, should I set my rules up in that same order?

jahhan

ASKER

I now have 4000 documents in mailgood and 10000 in mailspam.

Sjef Bosman

Ah, that must be enough for kSpam to find the characteristics of spam. After re-evaluation of the mails, things should be a lot better now. What kSpam does not offer unfortunately is a standard option for users to move accidentally delivered spam from their Inbox to mailspam. Maybe you could ask them to do that themselves, that they drop spam in the mailspam database?

You set kSpam to recalculate pretty often, I'd change that, because the re-evaluation of all mails is a very time-consuming process and it hogs the processor.

By the way, the problem for the user, does it still exist?

jahhan

ASKER

What line will need to be re-adjusted to minimize the time to recalculate? The problem for the user still exists I am getting a few "yo sir" and "It's me" messages.

jahhan

ASKER

I calculated 20 spammed messages in the mailgood database.

jahhan

ASKER

messages that are located in the new spam mail view, should that be confirmed?

Sjef Bosman

To minimize the time, change 360 seconds in 3600.

There are buttons at the top of the mailspam.nsf database. They operate on selected messages. You have to find the good ones, select them, and click on Move to good. The rest can be confirmed as spam. In the mailgood database, you have to do something similar. Select the spam messages and click on Move to mailspam.

jahhan

ASKER

made the change.
Ok, so now I have updated an allow rule that contained an email address in the "From contains:" field.
The original line read janedoe@test.com
I have updated the field to read #REGEX#:(?i)([janedoe@test.com][_W\s])

Is this the correct syntax to use?

jahhan

ASKER

Today I only see two spammed messages residing in mailgood. The subject line is "name" Check This. Should I use the regex syntax described in my previous post?

ASKER CERTIFIED SOLUTION

Sjef Bosman

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Sjef Bosman

Thanks! By the way, version 1.6 of kSpam is now available.

jahhan

ASKER

thanks