how to trap utf-8 encoded Subject in spamassassin

I'm trying to catch utf-8 encoded header messages in spamassassin. I have the following in /etc/mail/spamassassin/local/cf:
header LOCAL_UTF_SUBJECT        Subject:raw =~ /^=?utf-8/i
score LOCAL_UTF_SUBJECT 2.0
describe LOCAL_UTF_SUBJECT Subject line is utf encoded

Open in new window

I'm sending a test message with the subject "=?utf-8?Q?hello?=" using `telnet <host> 25` to insure the subject does not get escaped (it does not). Yet my rule does not see this subject content. My rule must be wrong. Any ideas why?
LVL 1
jmarkfoleyAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

skullnobrainsCommented:
you should remove the ^

it is a bad idea to have it because it is perfectly legit to encode only part of the subject as utf8

additionally, i'm not 100% sure (havn't written an SA rule in years), but i recollect header rules are passed the whole header and not just the value so /subject:\s*=?utf-8/i would work for your test email but it is simpler to just remove the ^ if you don't need to match only subjects that start with utf-8

btw i'm unsure that rule is such a good idea (and 2 is quite some score)
0
jmarkfoleyAuthor Commented:
skullnobrains: > it is a bad idea to have it because it is perfectly legit to encode only part of the subject as utf8

This may be true, but we have *only* received utf8 encoded subjects from people trying to sell things. Besides, it just marks it as spam so the users can look in their spam folders for false positives.

I believe I have filtered these out with:

Subject:raw =~ /(utf-8|Cp1252|iso-8859|Windows-1252)/I

Without the "raw" modifier, the subject is decoded first. I've tried the same thing for From:

From:raw =~ /UTF-8/I

any ideas?
0
skullnobrainsCommented:
as far as the rule is concerned, if you're trying to filter only utf8 subjects, no problem, you know what you are doing.

your new rule does not have the ^ char so it is not docked which is why it works
i guess
/=?(utf-8|Cp1252|iso-8859|Windows-1252)?/I
would be a little better but it is quite unlikely that a valid mail subject would contain those strings

have you tried the syntax i gave ?
"/^subject:\s*=?utf-8/i" (with the forgotten carret)

i'm unsure that filtering all those charsets is very meaningful, but you know what you are doing. for the record outlook users with locales that handle accents are very likely to get filtered. depending on your usual traffic, this may or may not be a good idea
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
jmarkfoleyAuthor Commented:
This appears to be working with

Subject:raw =~ /(utf-8|Cp1252|iso-8859|Windows-1252)/i

Thanks
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
AntiSpam

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.