Solved

KSpam not learning spam patterns for one user

Posted on 2006-11-30
24
339 Views
Last Modified: 2013-12-18
I have Kspam on my Domino server which up until this point no one complained of receiving multiple spammed messages.  There is one user who actually setup the KSPam configuration who is receiving the majority of spam.  I would like to know how should I configure the Bayesian configuration document in order to make this agent work effectively?  In addition to this configuration should I also manually create rules?  If so, what should be the numbering setup?

Below is the current Bayesian configuration:
General Tab:
Instance ID
This is a string of six characters or less, should be the same for every server in your organisation (KS_IID).      Rescon
Default action
This is the default action to be taken when one of the hard coded rules is matched (KS_DEFAULTACTION).      0 - Accept
Default probability increase
The default probability increase, only used if increase probability is selected for the default action (KS_DEFAULT_PROB_INC).      0%
Mark messages with a reason field?
Add KS_REASON item to an email if a rule is matched (KS_MARK).      Yes
Reload configuration every hour?
(KS_RELOAD).      Yes
Show statistics?
Log statistics under smtp.kSpam.* (KS_STATS).      Yes
Minimum "From:" header length?
Minimum length of the From: header (KS_MIN_FROM_LENGTH).      0
Maximum numbers in sender's username?
Maximum number of integers in the sender's username (KS_MAX_FROM_INTS).      0
First character in "From:" header must not be a number?
(KS_FILTER_FROM_INT).      No
Other forms to scan?
Forms other than Memo and Reply delimited by commas (KS_INTERESTING_FORMS).      
Add recipients list to copied and denied messages?
Add KS_RECIPIENTS readers field to denied messages, username in email address must me included in the recipients username field in their person document. ( KS_RECIPIENTS).      No
Copied mail database
Database to copy copied messages to, default is mailspam.nsf (KS_COPIED_DB).      
Turn on debugging?
Create log in ks_debug.txt (KS_DEBUG).      No


Bayesian Tab:
Bayesian filter enabled
Enable the Bayesian filter (KS_BAYESIAN_FILTER).      Yes
Token reload period
Period of time between recalculating probabilities (KS_BL_PERIOD).      360
Probability boundary
Boundary probability at which email is considered spam (KS_BAYESIAN_BOUNDARY).      90%
Mark messages with token list and probability?
Add KS_BL_PROB and KS_BL_TOKENS to incoming emails (KS_BAYESIAN_MARK).      Yes
Good message ratio
Ratio of good emails passing through the server before an email is copied to the good mail database (KS_BAYESIAN_RATIO).      5
Bayesian action
Action to take when an emails are probably spam (KS_BAYESIAN_ACTION).      3 - Copy & Deny
 Mark with
Text to mark emails with is the default Bayesian action is mark. (KS_BAYESIAN_ACTION_MARK_WITH).      
Tokens to ignore
Tokens to ignore when calculating probabilities (KS_BL_IGNORE).      
Dump token lists to file
Write token lists to files goodlist.txt and spamlist.txt (KS_BL_DUMPLISTS).      No
Preparation setting 1
All emails that pass all rules without being matched are placed in the mailgood.nsf database (KS_BAYESIAN_PREP).      No
Preparation setting 2
All emails with a probability greater than 90% are copied to mailspam.nsf, all emails with a probability of less than 10% are copied to mailgood.nsf (KS_BAYESIAN_PREP_2).      No
Turn on debugging?
Create bload.txt log file (KS_BL_DEBUG).      No
0
Comment
Question by:jahhan
  • 14
  • 10
24 Comments
 
LVL 46

Expert Comment

by:Sjef Bosman
Comment Utility
How many good and bad mails have been collected in mailgood and mailspam?
0
 

Author Comment

by:jahhan
Comment Utility
Yesterday mailgood collect 93 documents and mailspam 27.  
0
 
LVL 46

Expert Comment

by:Sjef Bosman
Comment Utility
Apparently, the totals in both databases should reach at least 1000 a piece before good predictions can be expected.
0
 

Author Comment

by:jahhan
Comment Utility
This database has been in production since 6/2006.  Each database should have collected 1000 documents by now.
0
 

Author Comment

by:jahhan
Comment Utility
I checked the number of documents for each database.  Mailgood has 898 while mailspam has 30.
0
 
LVL 46

Expert Comment

by:Sjef Bosman
Comment Utility
Mailspam only 30? That's not even close to 1000, I'd say... I assume you looked in the Database Properties, 2nd tab, for the number of documents. I also assume that you activated the Bayesian filter only recently, and that the other rules you have work very well for you. Most rules, do they have Deny&Copy, or only Deny? I suppose the latter, otherwise you'd have had a large collection right now.

If you could post some of those rules in another EE-question, http:Q_22023216.html "kSpam standard rules", I'd be very grateful. The info there could also be interesting to you.
0
 

Author Comment

by:jahhan
Comment Utility
SJEF,
The configuration document was created on 6/19.
The rules are set to deny&copy, Bayesian filter enabled.
0
 
LVL 46

Expert Comment

by:Sjef Bosman
Comment Utility
Please verify the content of mailspam.nsf, in the database properties window. If there are only 30 mails in it, the Bayesian filter cannot function. You could manually add spam from your spam-folder in the mail database.
0
 

Author Comment

by:jahhan
Comment Utility
i was actually thinking about adding spam to speed up the process.  Do it matter if there are priority rules starting with 200, or should I start from the number 1?
0
 
LVL 46

Expert Comment

by:Sjef Bosman
Comment Utility
Rule numbers are relevant only to the sequencing in the view, and hence to the order in which they are processed.

Yes, by all means, add all the spam you can find! Adding duplicates is useless, by the way...
0
 

Author Comment

by:jahhan
Comment Utility
i am not famiilar with regex formulas.  I was able to access the site you referenced and add in those formulas.  The orignal rules that were located in my database were rules that identified email addresses.  Are there any other rules I can use that you did not specify?
0
 
LVL 46

Expert Comment

by:Sjef Bosman
Comment Utility
More rules? That's why I created that question... :-))

Rules based on email addresses are of very little use. I know that even my personal address has been used to send spam, and I'd hate to be blocked for that reason. Best practice: some rules denying the most obvious spam (private parts, watches, pills, etc), and the rest is for the Bayesian filter. Very important: check the mailspam database regularly, at least once a day, even more often during the first few days or weeks (every hour!). Open the mailspam.nsf database, select ALL mails, then deselect good mails and click on Confirm as Spam. The rest is good, so select them and click Move to Good, so they will be delivered.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:jahhan
Comment Utility
the thread that contains the reg exp's, should I set my rules up in that same order?
0
 

Author Comment

by:jahhan
Comment Utility
I now have 4000 documents in mailgood and 10000 in mailspam.
0
 
LVL 46

Expert Comment

by:Sjef Bosman
Comment Utility
Ah, that must be enough for kSpam to find the characteristics of spam. After re-evaluation of the mails, things should be a lot better now. What kSpam does not offer unfortunately is a standard option for users to move accidentally delivered spam from their Inbox to mailspam. Maybe you could ask them to do that themselves, that they drop spam in the mailspam database?

You set kSpam to recalculate pretty often, I'd change that, because the re-evaluation of all mails is a very time-consuming process and it hogs the processor.

By the way, the problem for the user, does it still exist?
0
 

Author Comment

by:jahhan
Comment Utility
What line will need to be re-adjusted to minimize the time to recalculate?  The problem for the user still exists I am getting a few "yo sir" and "It's me" messages.
0
 

Author Comment

by:jahhan
Comment Utility
I calculated 20 spammed messages in the mailgood database.
0
 

Author Comment

by:jahhan
Comment Utility
messages that are located in the new spam mail view, should that be confirmed?
0
 
LVL 46

Expert Comment

by:Sjef Bosman
Comment Utility
To minimize the time, change 360 seconds in 3600.

There are buttons at the top of the mailspam.nsf database. They operate on selected messages. You have to find the good ones, select them, and click on Move to good. The rest can be confirmed as spam. In the mailgood database, you have to do something similar. Select the spam messages and click on Move to mailspam.
0
 

Author Comment

by:jahhan
Comment Utility
made the change.  
Ok, so now I have updated an allow rule that contained an email address in the "From contains:" field.  
The original line read janedoe@test.com
I have updated the field to read #REGEX#:(?i)([janedoe@test.com][_W\s])

Is this the correct syntax to use?
0
 

Author Comment

by:jahhan
Comment Utility
Today I only see two spammed messages residing in mailgood.  The subject line is "name" Check This.  Should I use the regex syntax described in my previous post?
0
 
LVL 46

Accepted Solution

by:
Sjef Bosman earned 500 total points
Comment Utility
The syntax seems not correct to me. I suppose this one is better:
    #REGEX#:(?i)janedoe@test.com[_W\s]
but for fixed strings you don't need regular expressions...

You do not have to add rules every time you get a spam mail. Just click on one of the buttons in order to tell kSpam what's got to be done with the mail. The lower the number of rules, the better! You have to feed the Bayesian filter, to make it work for your situation.
0
 
LVL 46

Expert Comment

by:Sjef Bosman
Comment Utility
Thanks! By the way, version 1.6 of kSpam is now available.
0
 

Author Comment

by:jahhan
Comment Utility
thanks
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Problem "Can you help me recover my changes?  I double-clicked the attachment, made changes, and then hit Save before closing it.  But when I try to re-open it, my changes are missing!"    Solution This solution opens the Outlook Secure Temp Fold…
Notes Document Link used by IBM Notes is a link file which aids in the sharing of links to documents in email and webpages. The posts describe the importance and steps to create a Lotus Notes NDL file in brief.
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
Illustrator's Shape Builder tool will let you combine shapes visually and interactively. This video shows the Mac version, but the tool works the same way in Windows. To follow along with this video, you can draw your own shapes or download the file…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now