Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 651
  • Last Modified:

Perl tarpit CGI for Apache?


I am running an Apache (1.13) httpd server on a Unix
box.  The system is constantly being probed by various
annoying robots that ignore robots.txt.  I would like a
Perl CGI script that tarpits those robots by limiting the
bandwidth of the response to some small value.

In other words, here's a little bit of data, Mr. Bot (not necessarily what you requested, either), stall, stall, here's
a little bit more, stall, stall ... and let's see how long we
can keep you teergrubed here with this tempting big file.

It would be entertaining, though certainly not necessary,
if the script tracked how long it was able to keep a bot
on the hook, and kept a Top 10 list of same.

I have looked through the cpan archives, and can't find
anything that quite fits, and I don't trust my own Perl
skill enough to handle exception conditions such as an
unexpected remote disconnect during transfer.
0
Dr. Klahn
Asked:
Dr. Klahn
  • 2
1 Solution
 
kanduraCommented:
I tried the following script on Apache. If I press stop in my browser, the script is terminated by Apache, and the message to STDERR never gets written to the error_log.
I think it's safe to assume that Apache will handle remote disconnects for you.

How to keep track of how long you stall each bot can be done in any manner of ways: use a database, use a log file, use a pipe and a daemon etc...
You would just write an id for this run, the bot, (the request it originally made etc) and the time it's currently running.
Log file is probably the least recommendable as it might grow too hard.

Now, just drop in an Apache redirect file at the root and you're set :-)

---8<---
#!/usr/bin/perl
use CGI;

$|++;
$cgi = new CGI;
$t = time;

print $cgi->header();
print "<html><body>";

while(1) {
      last unless print "bla<p>";
      sleep(2);
}

print STDERR "fallen out of the loop, time taken ", (time() - $t), " sec.";
0
 
jmcgOwnerCommented:
What method do you use to decide that a request comes from an annoying bot and should be handled by your CGI?

On Apache 2, there's a mod_ext_filter example for "slowing down the server", but it applies to all callers.
0
 
Dr. KlahnPrincipal Software EngineerAuthor Commented:
> What method do you use to decide that a request comes from
> an annoying bot and should be handled by your CGI?

Examination of the browser-ident field (HTTP_USER_AGENT),
and for bots that spoof the browser-ident, examination and
filtering on the originating IP address.

See URL  http://www.leekillough.com/robots.html  for a more
in-depth example.

0
 
jmcgOwnerCommented:
That page about defeating bad robots is quite comprehensive.

The script Kandura gave you would certainly keep the bots entertained for as long as they could stand it. You'd still need to use the rewrite rules and module listed in the document to redirect the bot's request to this script (and why would that be better than just blocking it?).
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now