Perl tarpit CGI for Apache?


I am running an Apache (1.13) httpd server on a Unix
box.  The system is constantly being probed by various
annoying robots that ignore robots.txt.  I would like a
Perl CGI script that tarpits those robots by limiting the
bandwidth of the response to some small value.

In other words, here's a little bit of data, Mr. Bot (not necessarily what you requested, either), stall, stall, here's
a little bit more, stall, stall ... and let's see how long we
can keep you teergrubed here with this tempting big file.

It would be entertaining, though certainly not necessary,
if the script tracked how long it was able to keep a bot
on the hook, and kept a Top 10 list of same.

I have looked through the cpan archives, and can't find
anything that quite fits, and I don't trust my own Perl
skill enough to handle exception conditions such as an
unexpected remote disconnect during transfer.
LVL 34
Dr. KlahnPrincipal Software EngineerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

kanduraCommented:
I tried the following script on Apache. If I press stop in my browser, the script is terminated by Apache, and the message to STDERR never gets written to the error_log.
I think it's safe to assume that Apache will handle remote disconnects for you.

How to keep track of how long you stall each bot can be done in any manner of ways: use a database, use a log file, use a pipe and a daemon etc...
You would just write an id for this run, the bot, (the request it originally made etc) and the time it's currently running.
Log file is probably the least recommendable as it might grow too hard.

Now, just drop in an Apache redirect file at the root and you're set :-)

---8<---
#!/usr/bin/perl
use CGI;

$|++;
$cgi = new CGI;
$t = time;

print $cgi->header();
print "<html><body>";

while(1) {
      last unless print "bla<p>";
      sleep(2);
}

print STDERR "fallen out of the loop, time taken ", (time() - $t), " sec.";
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
jmcgOwnerCommented:
What method do you use to decide that a request comes from an annoying bot and should be handled by your CGI?

On Apache 2, there's a mod_ext_filter example for "slowing down the server", but it applies to all callers.
0
Dr. KlahnPrincipal Software EngineerAuthor Commented:
> What method do you use to decide that a request comes from
> an annoying bot and should be handled by your CGI?

Examination of the browser-ident field (HTTP_USER_AGENT),
and for bots that spoof the browser-ident, examination and
filtering on the originating IP address.

See URL  http://www.leekillough.com/robots.html  for a more
in-depth example.

0
jmcgOwnerCommented:
That page about defeating bad robots is quite comprehensive.

The script Kandura gave you would certainly keep the bots entertained for as long as they could stand it. You'd still need to use the rewrite rules and module listed in the document to redirect the bot's request to this script (and why would that be better than just blocking it?).
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.