Actually, to work a little better, change the $search line to this:
$search = '@(\\[url\\]|\\[url=)?http
Main Topics
Browse All TopicsI'm a bit rusty with my regex (okay, really rusty) and I need to put together a small set of filters to weed out banned or unwanted links from a forum system. Basically, I want to be able to specify a domain, such as website.com, and then filter a message for any URLs to that domain (whether in raw form or BBcode). If found, the entire URL is removed or replaced (preferably replaced with a message or warning of some kind). This would be done at the time of posting, and not on display.
Example:
"Hey everybody, go to http://www.bannedwebsite.c
If I was filtering for "bannedwebsite.com", it would catch this and change the text to:
"Hey everybody, go to <spam link removed> for great deals on ..."
I think that in many cases, the URLs will also be inside of BBCode, so I need to check for these as well. Examples of the BBcode used:
[url]http://www.bannedwebs
and
[url=http://www.bannedwebs
I need some examples to achieve this, if anyone would be so kind as to help out. I'm sure it's pretty simple to do, but my head is killing me just thinking about it at the moment :) Thanks in advance!
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
Wait....change it to this:
$search = '@(\\[url\\]|\\[url=)?http
I promise that's my last change. (Can you tell I do a lot of trial-and-error?) So the full context would be something like this:
<?php
function ban_domain($text, $array_of_domains)
{
$domains = str_replace('.', '\\.', implode('|', $array_of_domains));
$search = '@(\\[url\\]|\\[url=)?http
$replace = '<spam link removed>';
return preg_replace($search, $replace, $text);
}
$text1 = 'Hey everybody, go to http://www.bannedwebsite.c
$text2 = 'Hey everybody, go to [url]http://www.bannedwebs
$text3 = '[url=http://www.bannedweb
echo ban_domain($text1, array('bannedwebsite.com')
echo ban_domain($text2, array('bannedwebsite.com')
echo ban_domain($text3, array('bannedwebsite.com')
?>
Honestly I don't think it would be all that bad. It's not like I'm using a for-each loop or anything; the regex engine gets to do all the searching in one fell swoop. Your proposal of searching *only* for the domains would work fine, but it wouldn't look as pretty. You'd have extra bits lying around like "[url=" and "][/url]". True, you could go around for a second pass to get rid of those, but that's costly as far as efficiency is concerned as well. Plus I doubt that the cost would be all that great to begin with--probably a couple hundred nanoseconds. No big loss.
What I've done is count matches against a list of spammy strings and urls, and if there are two or more matches, block the post completely. If that's too restrictive, increase the threshold.
You can also do things like count the overall message size, find repetition in the message, and keywords like "viagra" or "online casino".
I know it seems draconian, but spam posts tend to have a lot of URLs, little content, and the same strings. This technique gets rid of half of them in one motion.
Also, you can test your algorithm by running it against the existing messages to see if it turns up false positives.
Exactly what I needed soapergem, thanks for the quick solution. It's implemented and works like a charm ;)
By the way, would there be a quick method of telling if this filter picked up matches? I'd like to trigger an action within the script in the event that a banned URL has been filtered. I believe they added a "count" feature in PHP 5+, but unfortunately I'm using version 4.x.
Business Accounts
Answer for Membership
by: soapergemPosted on 2006-12-20 at 17:23:47ID: 18177821
Here you go. I created a function and provided an example. All you need to do is feed it the text you want to filter and an array filled with the domains you wish to ban.
s?://.*?(? :' . $domains . ').*?/?(\\[/url\\]|\\].*?\ \[/url\\]) ?@si';
/url] Example. [url=http://www.bad.ru/]Sp am[/url] Example.';
<?php
function ban_domain($text, $array_of_domains)
{
$domains = str_replace('.', '\\.', implode('|', $array_of_domains));
$search = '@(\\[url\\]|\\[url=)?http
$replace = '<spam link removed>';
return preg_replace($search, $replace, $text);
}
$text = 'Example. http://www.bad.ru/ Example. [url]http://www.bad2.com/[
echo ban_domain($text, array('bad.ru', 'bad2.com'));
?>