Link to home
Start Free TrialLog in
Avatar of Mark
Mark

asked on

how to trap utf-8 encoded Subject in spamassassin

I'm trying to catch utf-8 encoded header messages in spamassassin. I have the following in /etc/mail/spamassassin/local/cf:
header LOCAL_UTF_SUBJECT        Subject:raw =~ /^=?utf-8/i
score LOCAL_UTF_SUBJECT 2.0
describe LOCAL_UTF_SUBJECT Subject line is utf encoded

Open in new window

I'm sending a test message with the subject "=?utf-8?Q?hello?=" using `telnet <host> 25` to insure the subject does not get escaped (it does not). Yet my rule does not see this subject content. My rule must be wrong. Any ideas why?
Avatar of skullnobrains
skullnobrains

you should remove the ^

it is a bad idea to have it because it is perfectly legit to encode only part of the subject as utf8

additionally, i'm not 100% sure (havn't written an SA rule in years), but i recollect header rules are passed the whole header and not just the value so /subject:\s*=?utf-8/i would work for your test email but it is simpler to just remove the ^ if you don't need to match only subjects that start with utf-8

btw i'm unsure that rule is such a good idea (and 2 is quite some score)
Avatar of Mark

ASKER

skullnobrains: > it is a bad idea to have it because it is perfectly legit to encode only part of the subject as utf8

This may be true, but we have *only* received utf8 encoded subjects from people trying to sell things. Besides, it just marks it as spam so the users can look in their spam folders for false positives.

I believe I have filtered these out with:

Subject:raw =~ /(utf-8|Cp1252|iso-8859|Windows-1252)/I

Without the "raw" modifier, the subject is decoded first. I've tried the same thing for From:

From:raw =~ /UTF-8/I

any ideas?
ASKER CERTIFIED SOLUTION
Avatar of skullnobrains
skullnobrains

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Mark

ASKER

This appears to be working with

Subject:raw =~ /(utf-8|Cp1252|iso-8859|Windows-1252)/i

Thanks