Solved

KSpam not learning spam patterns for one user

Posted on 2006-11-30
24
345 Views
Last Modified: 2013-12-18
I have Kspam on my Domino server which up until this point no one complained of receiving multiple spammed messages.  There is one user who actually setup the KSPam configuration who is receiving the majority of spam.  I would like to know how should I configure the Bayesian configuration document in order to make this agent work effectively?  In addition to this configuration should I also manually create rules?  If so, what should be the numbering setup?

Below is the current Bayesian configuration:
General Tab:
Instance ID
This is a string of six characters or less, should be the same for every server in your organisation (KS_IID).      Rescon
Default action
This is the default action to be taken when one of the hard coded rules is matched (KS_DEFAULTACTION).      0 - Accept
Default probability increase
The default probability increase, only used if increase probability is selected for the default action (KS_DEFAULT_PROB_INC).      0%
Mark messages with a reason field?
Add KS_REASON item to an email if a rule is matched (KS_MARK).      Yes
Reload configuration every hour?
(KS_RELOAD).      Yes
Show statistics?
Log statistics under smtp.kSpam.* (KS_STATS).      Yes
Minimum "From:" header length?
Minimum length of the From: header (KS_MIN_FROM_LENGTH).      0
Maximum numbers in sender's username?
Maximum number of integers in the sender's username (KS_MAX_FROM_INTS).      0
First character in "From:" header must not be a number?
(KS_FILTER_FROM_INT).      No
Other forms to scan?
Forms other than Memo and Reply delimited by commas (KS_INTERESTING_FORMS).      
Add recipients list to copied and denied messages?
Add KS_RECIPIENTS readers field to denied messages, username in email address must me included in the recipients username field in their person document. ( KS_RECIPIENTS).      No
Copied mail database
Database to copy copied messages to, default is mailspam.nsf (KS_COPIED_DB).      
Turn on debugging?
Create log in ks_debug.txt (KS_DEBUG).      No


Bayesian Tab:
Bayesian filter enabled
Enable the Bayesian filter (KS_BAYESIAN_FILTER).      Yes
Token reload period
Period of time between recalculating probabilities (KS_BL_PERIOD).      360
Probability boundary
Boundary probability at which email is considered spam (KS_BAYESIAN_BOUNDARY).      90%
Mark messages with token list and probability?
Add KS_BL_PROB and KS_BL_TOKENS to incoming emails (KS_BAYESIAN_MARK).      Yes
Good message ratio
Ratio of good emails passing through the server before an email is copied to the good mail database (KS_BAYESIAN_RATIO).      5
Bayesian action
Action to take when an emails are probably spam (KS_BAYESIAN_ACTION).      3 - Copy & Deny
 Mark with
Text to mark emails with is the default Bayesian action is mark. (KS_BAYESIAN_ACTION_MARK_WITH).      
Tokens to ignore
Tokens to ignore when calculating probabilities (KS_BL_IGNORE).      
Dump token lists to file
Write token lists to files goodlist.txt and spamlist.txt (KS_BL_DUMPLISTS).      No
Preparation setting 1
All emails that pass all rules without being matched are placed in the mailgood.nsf database (KS_BAYESIAN_PREP).      No
Preparation setting 2
All emails with a probability greater than 90% are copied to mailspam.nsf, all emails with a probability of less than 10% are copied to mailgood.nsf (KS_BAYESIAN_PREP_2).      No
Turn on debugging?
Create bload.txt log file (KS_BL_DEBUG).      No
0
Comment
Question by:jahhan
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 14
  • 10
24 Comments
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 18048711
How many good and bad mails have been collected in mailgood and mailspam?
0
 

Author Comment

by:jahhan
ID: 18056308
Yesterday mailgood collect 93 documents and mailspam 27.  
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 18057424
Apparently, the totals in both databases should reach at least 1000 a piece before good predictions can be expected.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:jahhan
ID: 18058081
This database has been in production since 6/2006.  Each database should have collected 1000 documents by now.
0
 

Author Comment

by:jahhan
ID: 18058110
I checked the number of documents for each database.  Mailgood has 898 while mailspam has 30.
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 18059124
Mailspam only 30? That's not even close to 1000, I'd say... I assume you looked in the Database Properties, 2nd tab, for the number of documents. I also assume that you activated the Bayesian filter only recently, and that the other rules you have work very well for you. Most rules, do they have Deny&Copy, or only Deny? I suppose the latter, otherwise you'd have had a large collection right now.

If you could post some of those rules in another EE-question, http:Q_22023216.html "kSpam standard rules", I'd be very grateful. The info there could also be interesting to you.
0
 

Author Comment

by:jahhan
ID: 18068883
SJEF,
The configuration document was created on 6/19.
The rules are set to deny&copy, Bayesian filter enabled.
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 18069086
Please verify the content of mailspam.nsf, in the database properties window. If there are only 30 mails in it, the Bayesian filter cannot function. You could manually add spam from your spam-folder in the mail database.
0
 

Author Comment

by:jahhan
ID: 18069178
i was actually thinking about adding spam to speed up the process.  Do it matter if there are priority rules starting with 200, or should I start from the number 1?
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 18069388
Rule numbers are relevant only to the sequencing in the view, and hence to the order in which they are processed.

Yes, by all means, add all the spam you can find! Adding duplicates is useless, by the way...
0
 

Author Comment

by:jahhan
ID: 18069632
i am not famiilar with regex formulas.  I was able to access the site you referenced and add in those formulas.  The orignal rules that were located in my database were rules that identified email addresses.  Are there any other rules I can use that you did not specify?
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 18070059
More rules? That's why I created that question... :-))

Rules based on email addresses are of very little use. I know that even my personal address has been used to send spam, and I'd hate to be blocked for that reason. Best practice: some rules denying the most obvious spam (private parts, watches, pills, etc), and the rest is for the Bayesian filter. Very important: check the mailspam database regularly, at least once a day, even more often during the first few days or weeks (every hour!). Open the mailspam.nsf database, select ALL mails, then deselect good mails and click on Confirm as Spam. The rest is good, so select them and click Move to Good, so they will be delivered.
0
 

Author Comment

by:jahhan
ID: 18070878
the thread that contains the reg exp's, should I set my rules up in that same order?
0
 

Author Comment

by:jahhan
ID: 18071154
I now have 4000 documents in mailgood and 10000 in mailspam.
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 18075102
Ah, that must be enough for kSpam to find the characteristics of spam. After re-evaluation of the mails, things should be a lot better now. What kSpam does not offer unfortunately is a standard option for users to move accidentally delivered spam from their Inbox to mailspam. Maybe you could ask them to do that themselves, that they drop spam in the mailspam database?

You set kSpam to recalculate pretty often, I'd change that, because the re-evaluation of all mails is a very time-consuming process and it hogs the processor.

By the way, the problem for the user, does it still exist?
0
 

Author Comment

by:jahhan
ID: 18079234
What line will need to be re-adjusted to minimize the time to recalculate?  The problem for the user still exists I am getting a few "yo sir" and "It's me" messages.
0
 

Author Comment

by:jahhan
ID: 18079275
I calculated 20 spammed messages in the mailgood database.
0
 

Author Comment

by:jahhan
ID: 18079910
messages that are located in the new spam mail view, should that be confirmed?
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 18081006
To minimize the time, change 360 seconds in 3600.

There are buttons at the top of the mailspam.nsf database. They operate on selected messages. You have to find the good ones, select them, and click on Move to good. The rest can be confirmed as spam. In the mailgood database, you have to do something similar. Select the spam messages and click on Move to mailspam.
0
 

Author Comment

by:jahhan
ID: 18085747
made the change.  
Ok, so now I have updated an allow rule that contained an email address in the "From contains:" field.  
The original line read janedoe@test.com
I have updated the field to read #REGEX#:(?i)([janedoe@test.com][_W\s])

Is this the correct syntax to use?
0
 

Author Comment

by:jahhan
ID: 18085960
Today I only see two spammed messages residing in mailgood.  The subject line is "name" Check This.  Should I use the regex syntax described in my previous post?
0
 
LVL 46

Accepted Solution

by:
Sjef Bosman earned 500 total points
ID: 18086291
The syntax seems not correct to me. I suppose this one is better:
    #REGEX#:(?i)janedoe@test.com[_W\s]
but for fixed strings you don't need regular expressions...

You do not have to add rules every time you get a spam mail. Just click on one of the buttons in order to tell kSpam what's got to be done with the mail. The lower the number of rules, the better! You have to feed the Bayesian filter, to make it work for your situation.
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 18086854
Thanks! By the way, version 1.6 of kSpam is now available.
0
 

Author Comment

by:jahhan
ID: 18087042
thanks
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For Desktop Techs: How to retain a user's Notes configuration data when swapping out the end user's computer. (Assuming that you are not upgrading to a completely different version of Notes client) All you need to do is: 1) install Notes o…
This article covers general Notes 8.5 troubleshooting information including recreating the Notes\Data folder.
This video Micro Tutorial shows how to password-protect PDF files with free software. Many software products can do this, such as Adobe Acrobat (but not Adobe Reader), Nuance PaperPort, and Nuance Power PDF, but they are not free products. This vide…
Do you want to know how to make a graph with Microsoft Access? First, create a query with the data for the chart. Then make a blank form and add a chart control. This video also shows how to change what data is displayed on the graph as well as form…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question