Stopping Spam in WordPress

Introduction

WordPress currently holds more than 54% of the Content Management System (CMS) market, and powers a staggering 73,000,000+ sites.. This makes WordPress a tempting target for spammers.

If you are running a WordPress site that allows pingbacks, comments on posts/pages, or allows registrations so users can start their own posts then it is likely that you have been spammed at some point and have had to spend a portion of your valuable time moderating and deleting the spammy content. Spam is not just an annoyance or a time suck; Google can also penalize you for having links to spam sites in your comments or posts and this is even more true since the Penguin update. This article will go through some settings and strategies to help reduce spam and also cover two free/paid services that can prevent spam from being posted.

Summary (or tl;dr)

If you are in a hurry and just want the good stuff:

The best way to prevent spam is not allow comments, pingbacks, public registrations

Use Akismet to protect against comment and pingback spam

Use WangGuard to protect against registration spam

CAPTCHAs are not preferred

For the rest of you, keep reading and enjoy. Please share this article with anyone you think can benefit from it and if you are an Experts Exchange member, please vote this helpful if you find it so.

Three Different Types of Spam

WordPress spam comes in three flavors (yum): comment spam, pingback spam, and registration/post spam. Because pingbacks, comments and author profiles include a field for a URL, that means when it's done successfully, the spammer gets a high-quality link back to their site which can help it be seen in Google. Worse, the spam sites gets any traffic from your visitors who click on the link and potentially expose themselves to malware. In any situation, you lose time handling the spam and the trust of visitors, which will result in decreasing traffic and potentially negative social signals.

Comment spam is exactly what it sounds like. If you enable comments on your site, spammers will attempt to post links to the pages they are being paid to spam out there.

Pingback spam is a form of comment spam but instead of a user/robot posting a comment, an external web site publishes a link to your WordPress site via XML-RPC engine and WordPress reads the link via the same engine and attempts to publish the notification that someone is linking to that specific page or post.

Registration/Post spam is actually two different things but they are tightly connected via the WordPress registration system. When you allow anyone to register as a user on your site, you will be attacked by robots who will attempt to create users that get whatever default privileges you allow (typically Subscriber). If you are running a site where you want to accept posts from users as well as comments then you may have raised the default user role to Contributor and you are now seeing spam Post submissions.

Stopping All Spam The Easy Way

A strange game. The only winning move is not to play. -- War Games, 1983

There is one easy and foolproof way to prevent spam from showing up on your WordPress site. Don't allow it. While WordPress accepts new users, pingbacks, and comments by default it is a simple matter to disable those settings immediately after running the install. If you have no need or desire to have random users log in to your site or comment on your content then just remove the ability for them to do so. To remove the ability for new users to register just go to Settings | General from the admin Dashboard and uncheck the "Anyone can register" box.

To disable the possibility of comments simply go to Settings | Discussion and make the "Default article settings" section look like this:

Please note that the Settings | Discussion settings are a prospective change: it only affects new posts and pages created from this point forward. If you do this before any content is generated then your WordPress site will be almost completely safe from spam. If you are reading this after dealing with spam and are in the process of closing the barn doors, then you need to also close the comments on existing, published posts and pages. I recommend you use the Extended Comment options plugin to do this quickly.

Reducing or Stopping Comment Spam

I understand that completely disabling the interactivity features of a WordPress site may be a non-starter for most people. If you do want the interactivity of comments but not the spam, there are two relatively effective ways to lower or eliminate the spam quotient.

1. Require users to register before they are allowed to comment.

This option is located in Settings | Discussion under "Other comment settings" and it does precisely what it says. When enabled, your site will not display the comment form except to logged in users and all users, regardless of login status, will see the posted comments. Since most spambot scripts are designed to leave a comment on an open form, enabling this option should eliminate the vast majority of bots, leaving you with a smaller, more sophisticated group of spammers with scripts that can create an account or login. While this sounds more like a drawback, it really isn't because we can use the techniques in the Registration Spam section (q.v.) to handle this group. The real drawbacks of this strategy are that a) it creates a huge amount of friction for legitimate users who do want to comment on your site and that will lower your participation rate; and b) it means that you have to allow users to create accounts and this gives you the potential to be exposed to a privilege escalation attack if one is created/discovered in the future. You need to balance your desire for security against those drawbacks.

2. Use Akismet to flag comments as spam

Akismet is a spam filtering system developed by Automattic (the company behind WordPress) and is available to all WordPress sites as a plugin. As an aside, it is not a WordPress-only service...Automattic makes the Akismet API open, libraries are available for many languages and plugins have been developed to work on a wide range of CMS's and forums.

The use of Akismet is simple enough. Download the plugin to your site and activate it. You will be prompted to obtain an API key from the Akismet site. The keys are not necessarily free, though. If you are running a low-traffic (defined by Akismet as < 100,000 API calls a month), personal or not-for-profit site, you have the option to register for a free API key for that site. Akismet will suggest you make an annual donation via an amount slider but you are not forced to pay. If you are running a small business site (or a personal site that generates money for you via ads or sales) then you will be asked to pay $5/month per site that will use Akismet. If you have more than ten sites, you should instead pay for the Enterprise option which is $50/month but allows for unlimited sites to be protected. If you are running an extremely active site (> 100,000 API calls/month; very rare) then you are at a different and much more expensive pricing level. Akismet is still good at this level but there are other anti-spam solutions that should be investigated too.

Once you have the API key for Akismet and activate the plugin, your comments will be checked for spam characteristics against the Akismet database. Any suspicious comment will be flagged as spam and never show up to annoy you. If Akismet misses a comment, which is rare:

then you will have to flag it as spam manually. By doing so with Akismet active, the comment details (name, email address, url, IP address, details) are added to the Akismet database and you train that database to recognize future instances of that particular spammer and prevent them. If Akismet registers a false positive (even more rare than a miss), all you need to do is mark that comment as “Not Spam” and you update the servers by white-listing the false positive.

Reducing or Stopping Pingback Spam

Pingback spam occurs when a spam site (splog) that runs the XML-RPC engine adds a link to your site. The engine notifies your site of the link and your site attempts to publish it in the comments section. This automated reciprocal link-building used to figure prominently in the Google ranking algorithms - hence the appeal to spammers - but since the Panda and Penguin updates the viability of the strategy has decreased and thus the frequency of the attacks. Even so, one can reasonable expect some amount of pingback spam attempts.

To stop it, there are again two basic strategies.

1. Just don't allow it.

Even though we have already touched on this strategy above, it bears repeating here. There isn't any real strong advantage for you to allow pingbacks on your site. Yes, you will be notified when someone links to a page on your site and yes, it is polite to make a reciprocal link and yes, pingbacks can lead you to like-minded people and thus opportunities for collaboration or whatever. But all of this can also be detected via various web analytics platforms and the time cost and potential SEO penalty of dealing with the inevitable spam is a huge negative. Just disable it per the above instructions and concentrate on your own site.

2. Akismet, again.

If you like the pingbacks and want to keep them, then definitely run Akismet per the comment spam section above. Akismet protects against pingback spam in the exact same way as it protects against comment spam, only with a slightly lower rate of success. Splogs are easy to generate and it is my experience that Akismet lags a little behind the spammers in this area and a few more spam pingbacks seem to creep through as opposed to spam comments.

Reducing or Stopping Registration Spam

If you allow anyone to register as a subscriber, you will need a strategy to combat registration spam. Simply put, registration spam is when a bot or human creates an account on your site for the eventual purpose of using that account to post spam comments or posts. These users have become know as "sploggers" (short for "spam bloggers") and without some level of protection they can quickly bring a site administrator to their knees manually reviewing and cleaning up accounts. As mentioned at the beginning of the article, you are also risking your Google ranking by allowing spam links to be published on your site.

Stopping registration spam is a little more difficult than stopping comment or pingback spam because Akismet and its huge community and database does not work here. There are other plugins that perform a similar function as Akismet in that they check to see if a user is coming from a known bad IP address or is registering using a pattern that has been previously reported as spam. Of these plugins, the one I have had the most success with is the unfortunately (for those of you who get English colloquialisms) WangGuard.

WangGuard, like Akismet, requires you to register and obtain an API key. API keys are free to sites that do not make more than $200/month and will not make more than 500 API queries per month. If your site is commercial or has a huge number of monthly registrations then they will charge but at the time of this writing they have not finalized their fee structures yet.

Once you have WangGuard activated and the API key entered it will immediately begin protected your site from known (to WangGuard) sploggers and also provides some basic pattern checks to see if a registration attempt is from a splogger.

As you can see above, you can set WangGuard to check for duplicated GMail accounts since name@gmail.com and n.ame@gmail.com are the same account on GMail but will bypass a simple unique email check on your site. You can also see above that you have the option of checking the email address against MX records to see if the domain is valid. This works extremely well but does slow down your registration page so use with caution. Also, I strongly suggest enabling the ability to delete a user when they are reported as a splogger as this will also remove all posts and comments from the bogus account.

The next step in cleaning/protecting your site is to begin training WangGuard who you consider to be a spammer. The WangGuard | Wizard menu option is your first stop. Running the wizard will check all registered users against the WangGuard database via the API and any known sploggers will be detected and flagged at this stage. You will then be able to mass-delete them and their spammy content in one step.

However, the Wizard check is unlikely to catch 100% of the splogger registrations initially. Sploggers simply change strategies, accounts, IP addresses, etc. too quickly to be fully caught by a single database check. WangGuard will also install a new column in your Users | All Users screen and you will be able to manually report any missed sploggers there. Each time you report a splogger manually, you train WangGuard to recognize who is created spam registrations on your site and also creating an algorithm that WangGuard will use to protect you against future attempts. In my experience with the service it takes about two weeks to fully train it but after those two weeks you should be almost (if not exactly) 100% protected from splogger registrations, as I am now:

WangGuard stats for live site requiring registration

Why No Mention of CAPTCHAs?

By now, just about everyone has run into a CAPTCHA while filling out web forms and the developers among you have probably configured one to secure a form. So why do I not talk about them above as a strategy to reduce or stop spam? Two reasons:

1) Legitimate users hate them
2) They don't work well enough to be worth the hassle they present to legitimate users

My EE colleague, Ray Paseur, has a lot to say about making a friendlier CAPTCHA but even if you use his excellent techniques to make an easier-to-read image you still generate friction in any process that uses it. With the recent emphasis on social signals and interaction on SEO, I firmly believe that any friction in the processes of leaving a comment or signing up for a site is ultimately self-defeating. You want people to leave comments and sign up, you just only want good examples of each. Users are also coming to expect a smoother, faster experience when interacting with a site, especially when on a mobile device. Taken together, I believe this provides a strong argument to not use a CAPTCHA when there are alternatives available. This advice also covers the math/logic tests that some sites use in place of an image (What does 2+3 equal? Is water wet or dry?).

The more compelling reason is the existence of businesses that allow spammers to hire a human to solve the CAPTCHA and thus gain entry into whatever resource you were attempting to protect. Since CAPTCHAs are designed to only foil robots but services like Akismet or WangGuard work equally well on robots or humans it seems to me to be a logical choice to not annoy my real users and instead concentrate on banning the bad actors.

Skip the CAPTCHA. It's not worth it.

I hope this article proves useful to you and if so, please consider sharing it using the buttons on the left (or bottom) of the screen.