asked on

how to trap utf-8 encoded Subject in spamassassin

I'm trying to catch utf-8 encoded header messages in spamassassin. I have the following in /etc/mail/spamassassin/local/cf:

header LOCAL_UTF_SUBJECT        Subject:raw =~ /^=?utf-8/i
score LOCAL_UTF_SUBJECT 2.0
describe LOCAL_UTF_SUBJECT Subject line is utf encoded

Open in new window

I'm sending a test message with the subject "=?utf-8?Q?hello?=" using `telnet <host> 25` to insure the subject does not get escaped (it does not). Yet my rule does not see this subject content. My rule must be wrong. Any ideas why?

skullnobrains

you should remove the ^

it is a bad idea to have it because it is perfectly legit to encode only part of the subject as utf8

additionally, i'm not 100% sure (havn't written an SA rule in years), but i recollect header rules are passed the whole header and not just the value so /subject:\s*=?utf-8/i would work for your test email but it is simpler to just remove the ^ if you don't need to match only subjects that start with utf-8

btw i'm unsure that rule is such a good idea (and 2 is quite some score)

Mark

ASKER

skullnobrains: > it is a bad idea to have it because it is perfectly legit to encode only part of the subject as utf8

This may be true, but we have *only* received utf8 encoded subjects from people trying to sell things. Besides, it just marks it as spam so the users can look in their spam folders for false positives.

I believe I have filtered these out with:

Subject:raw =~ /(utf-8|Cp1252|iso-8859|Windows-1252)/I

Without the "raw" modifier, the subject is decoded first. I've tried the same thing for From:

From:raw =~ /UTF-8/I

any ideas?

ASKER CERTIFIED SOLUTION

skullnobrains

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Mark

ASKER

This appears to be working with

Subject:raw =~ /(utf-8|Cp1252|iso-8859|Windows-1252)/i

Thanks