Mark
asked on
how to trap utf-8 encoded Subject in spamassassin
I'm trying to catch utf-8 encoded header messages in spamassassin. I have the following in /etc/mail/spamassassin/loc al/cf:
header LOCAL_UTF_SUBJECT Subject:raw =~ /^=?utf-8/i
score LOCAL_UTF_SUBJECT 2.0
describe LOCAL_UTF_SUBJECT Subject line is utf encoded
I'm sending a test message with the subject "=?utf-8?Q?hello?=" using `telnet <host> 25` to insure the subject does not get escaped (it does not). Yet my rule does not see this subject content. My rule must be wrong. Any ideas why?
ASKER
skullnobrains: > it is a bad idea to have it because it is perfectly legit to encode only part of the subject as utf8
This may be true, but we have *only* received utf8 encoded subjects from people trying to sell things. Besides, it just marks it as spam so the users can look in their spam folders for false positives.
I believe I have filtered these out with:
Subject:raw =~ /(utf-8|Cp1252|iso-8859|Wi ndows-1252 )/I
Without the "raw" modifier, the subject is decoded first. I've tried the same thing for From:
From:raw =~ /UTF-8/I
any ideas?
This may be true, but we have *only* received utf8 encoded subjects from people trying to sell things. Besides, it just marks it as spam so the users can look in their spam folders for false positives.
I believe I have filtered these out with:
Subject:raw =~ /(utf-8|Cp1252|iso-8859|Wi
Without the "raw" modifier, the subject is decoded first. I've tried the same thing for From:
From:raw =~ /UTF-8/I
any ideas?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
This appears to be working with
Subject:raw =~ /(utf-8|Cp1252|iso-8859|Wi ndows-1252 )/i
Thanks
Subject:raw =~ /(utf-8|Cp1252|iso-8859|Wi
Thanks
it is a bad idea to have it because it is perfectly legit to encode only part of the subject as utf8
additionally, i'm not 100% sure (havn't written an SA rule in years), but i recollect header rules are passed the whole header and not just the value so /subject:\s*=?utf-8/i would work for your test email but it is simpler to just remove the ^ if you don't need to match only subjects that start with utf-8
btw i'm unsure that rule is such a good idea (and 2 is quite some score)