<

[Webinar] Learn how to a build a cloud-first strategyRegister Now

x

How to Spam the Experts (and Others)

Published on
16,978 Points
5,378 Views
21 Endorsements
Last Modified:
Awarded
Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fancy obfuscations of said addresses that I see posted on profiles (and message boards). Don't get me wrong, I think security-through-obscurity is a good approach to enabling communication with another party via your established email address. What gets me is how simple the obfuscations appear to be. In this article I am going to demonstrate how simple it is to de-obfuscate some of the simpler patterns I have seen using regular expressions (regex). I will also offer some alternative obfuscation methods. Though I am going to focus on email, the ideas presented here could be applied to any online moniker that could be used for spamming purposes (e.g. Twitter, Facebook, etc.). Although some of the discussion will be technical, the audience of this article is anyone who actively posts their email to public web sites.


Disclaimer

I am not advocating you go out and "screen-scrape" anyone's profile or any message boards. The intent of my article is to demonstrate the inherent vulnerability of certain obfuscation patterns, not to instigate some "script-kiddie" to go address fishing. If you are a script-kiddie, or just a genuine degenerate, please don't hold me responsible for your actions. You have been warned!


Prerequisites

Before going any further, you may want to become familiar with, if not already, regular expressions. Regular expressions are a type of programming language (more like a meta-language) that can be used to identify certain patterns within a target string. If you are unfamiliar with regular expressions, there are hundreds of articles and sites all over the web dedicated to the topic. BatuhanCetin has written an introductory article to them and my site of preference is Regular-Expression.info. Covering the ins and outs of regular expressions is outside the scope of this article.


Email Address Structure

Everyone reading this should be familiar with email address structure. The one thing an email address cannot live without is the "at" symbol ( @ ). It is the character that separates the "who" from the "where" in an email address. Generally speaking, an email address will only have one @ symbol. (It is possible, per RFCs 5321 and 5322 to have an @ within a quoted-string. For the sake of simplicity, I will assume only one @ symbol occurs within any example address shown.) In addition to the @ symbol, both the "who" and the "where" are required. To extract a simple, non-obfuscated address, we might use the following regular expression:
\S+@\S+

Open in new window

The pattern above will search for one or more non-whitespace characters, followed by the @ symbol, and then one or more non-whitespace characters. It should be easy to see that if we had the address jsmith@flibberty-flabber-jab.com, the regular expression would have no trouble extracting that address.


Enter Obfuscation

When one realizes how easy it is to extract one's email address from a web page's HTML code, one might then start to ponder how to disguise said address so that a colleague can still decipher the address, but the spammer cannot. One of the more popular ways to disguise an address is by simply obfuscating the dot and the @ symbol. Taking this approach against my previous address example, I could propose any of the following:
jsmith [at] flibberty-flabber-jab [dot] com
jsmith {at} flibberty-flabber-jab {dot} com
jsmith __AT__ flibberty-flabber-jab __DOT__ com

Open in new window


I could keep going, but hopefully you get the idea. So how could I create a regular expression to extract such variations in the address format? Quite simply in fact.
\S+\s*[[{_-]*[aA][tT][]}_-]*\s*\S+\s*[[{_-]*[dD][oO][tT][]}_-]*\s*\S+

Open in new window

The pattern above will match any of the examples listed above. I could easily include other characters in the special characters list. What to do, what to do?


Improving Obfuscation

I'm sure the above begs the question, "how can i improve my obfuscation techniques?" In order to accomplish that, we must think like a regex engine. So what does said engine do? Well, it processes each character in some input string to see if it conforms to part of some pattern. "Ok. How do I break it, then?" Excellent question. Simply changing characters is not enough, because a regex engine is very adept at inspecting characters. What you should be aiming for is semantic obfuscations. What do I mean by this? Instead of changing the value of a character, change what it means. Replace or insert some character that does not exist in your original email address and provide a note explaining this quirk. For example, I might post the following as my address in some forum:

jzsmithz@flibbertyz-flabberz-jazb.cozm (remove all zees)

Open in new window


Above, I have used the letter "z" as filler for my address. I staggered the insertions of this letter to prevent a pattern from emerging in my obfuscation (e.g. every other letter is a "z"). I have also provided a note explaining to anyone reading my address how to decipher it. You might ask why I used the mnemonic "zees" instead of just saying the letter "z". This is again, to make it harder for spammers to extract the address. I could easily write a regex pattern that would extract the letter to remove from the note, and then include other logic that would filter the address based on the found letter. Something like this:

remove(?: all)?(?: the)? [a-z]

Open in new window

Again, is this foolproof? No. Given a reason and time, a spammer can still grab your address. The point is to make it more difficult for them to do so. I once had a boss who said to me, "Does the security system make our store safer? No. But if Johnny Snatchalot sees that we have a security system, but the store next door doesn't, which target will be easier for him to score from?" I believe the same concept holds here.

Another technique you could use, but might be less feasible, would be to encrypt your address with some weak cipher. For example, you could use a Caesar cipher with a shift of 1. Again, you could include a note that the displayed address is incorrect and it should be deciphered using method "X". While this won't prevent the address from being extracted, per se, it would cause a spammer to have to go through another layer to determine your email address.


Good-Intentioned Obfuscations

I have seen this one from time to time: posting an email address as an image. I'm sure most of you reading this are familiar with Captcha. If not, its the picture with the swirled letters that you have to look at and then type what you see in the accompanying text box. If you're familiar with Captcha, then you may be familiar with why it came about. Optical character recognition (OCR) is a way of extracting text from an image. Long ago, some clever individual figured out that you could programmatically create web request to certain pages in order to "work" a web site to your advantage (think pinging Ticketmaster to grab all the high-dollar seats for later scalping). Captcha was invented to prevent this. Subsequently, another set of clever individuals realized you could perform OCR against a Captcha image to defeat it. This is the reason you now see swirly, sometimes indecipherable, letters inside Captcha windows. Why did I tell you this? Because all of the image-based email addresses I have see to date have been simple, unobfuscated text, in an image. This is ripe for an OCR attack. While I am not suggesting you cannot use an image to post your email address, try to take the Captcha route and obfuscate the image a bit.


Summary

Communication across the globe has increased exponentially over the past few decades. With the advent of the internet and email, so too has the increase in spam been notable. In an electronic world, we often desire instant communication, and email has granted us that. However, there is no reason for us to make it easy for spammers to turn our in-boxes electronic dumpsters. I'm a firm believer in "if it can be made, it can be broken." While the methods above are not foolproof and won't prevent all attacks against your in-box, they should ward off the simpleton spammers. While the most foolproof plan to protect against that is not to post in the first place, it makes sense to use some form of obfuscation. Don't be afraid to have some fun with your obfuscation techniques; just make sure you don't go overboard and lose the meaning of what you set out to convey!
21
Comment
8 Comments
 
LVL 38

Expert Comment

by:lherrou
Good coverage of an interesting topic. I voted YES above.
0
 
LVL 38

Expert Comment

by:younghv
kaufmed,
Well done on a timely and useful topic.

"Yes" vote above.
0
 
LVL 75

Author Comment

by:käµfm³d 👽
Thank you gentlemen. I appreciate your support  = )
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 30

Expert Comment

by:Dr. Klahn
Well done.  There are too many people relying on simple text obfuscations that, as you have shown, are easily defeatable by regex.

There is a reCaptcha free service that can be installed as an HTML link to protect email addresses.  I have been using it for two years now, and it cut the spam traceable to the email address on my consulting referral page down to zero.
0
 
LVL 75

Author Comment

by:käµfm³d 👽
DrKlahn,

I hadn't heard of reCaptcha before. Thanks for mentioning it. It certainly can be added to the list of protection routes above.

Gracias  : )
0
 
LVL 21

Expert Comment

by:tfewster
Very good article - I use an additional method, that humans can easily understand but (depending on the page layout) is hard to program a general scrape technique for :

<my_screen_name>@myisp.com

My screen name is clearly visible on the page, but detached from the mail provider info. If you want to test a scrape routine to get my email address from my profile - Go ahead, I can always change my email address :-)
0
 
LVL 75

Author Comment

by:käµfm³d 👽
tfewster,

Interesting angle. Not foolproof, at least not to someone actively reviewing the addresses returned by their scraper, but a good approach. I just hope you don't have to change your address too often  ; )
0
 
LVL 61

Expert Comment

by:mbizup
Nice article!

Voted "yes" above.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Join & Write a Comment

Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses
Course of the Month20 days, 11 hours left to enroll

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month