Solved

Perl tarpit CGI for Apache?

Posted on 2003-11-03
4
638 Views
Last Modified: 2012-05-04

I am running an Apache (1.13) httpd server on a Unix
box.  The system is constantly being probed by various
annoying robots that ignore robots.txt.  I would like a
Perl CGI script that tarpits those robots by limiting the
bandwidth of the response to some small value.

In other words, here's a little bit of data, Mr. Bot (not necessarily what you requested, either), stall, stall, here's
a little bit more, stall, stall ... and let's see how long we
can keep you teergrubed here with this tempting big file.

It would be entertaining, though certainly not necessary,
if the script tracked how long it was able to keep a bot
on the hook, and kept a Top 10 list of same.

I have looked through the cpan archives, and can't find
anything that quite fits, and I don't trust my own Perl
skill enough to handle exception conditions such as an
unexpected remote disconnect during transfer.
0
Comment
Question by:Dr. Klahn
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 18

Accepted Solution

by:
kandura earned 200 total points
ID: 9677609
I tried the following script on Apache. If I press stop in my browser, the script is terminated by Apache, and the message to STDERR never gets written to the error_log.
I think it's safe to assume that Apache will handle remote disconnects for you.

How to keep track of how long you stall each bot can be done in any manner of ways: use a database, use a log file, use a pipe and a daemon etc...
You would just write an id for this run, the bot, (the request it originally made etc) and the time it's currently running.
Log file is probably the least recommendable as it might grow too hard.

Now, just drop in an Apache redirect file at the root and you're set :-)

---8<---
#!/usr/bin/perl
use CGI;

$|++;
$cgi = new CGI;
$t = time;

print $cgi->header();
print "<html><body>";

while(1) {
      last unless print "bla<p>";
      sleep(2);
}

print STDERR "fallen out of the loop, time taken ", (time() - $t), " sec.";
0
 
LVL 20

Expert Comment

by:jmcg
ID: 9684893
What method do you use to decide that a request comes from an annoying bot and should be handled by your CGI?

On Apache 2, there's a mod_ext_filter example for "slowing down the server", but it applies to all callers.
0
 
LVL 27

Author Comment

by:Dr. Klahn
ID: 9691503
> What method do you use to decide that a request comes from
> an annoying bot and should be handled by your CGI?

Examination of the browser-ident field (HTTP_USER_AGENT),
and for bots that spoof the browser-ident, examination and
filtering on the originating IP address.

See URL  http://www.leekillough.com/robots.html  for a more
in-depth example.

0
 
LVL 20

Expert Comment

by:jmcg
ID: 9691696
That page about defeating bad robots is quite comprehensive.

The script Kandura gave you would certainly keep the bots entertained for as long as they could stand it. You'd still need to use the rewrite rules and module listed in the document to redirect the bot's request to this script (and why would that be better than just blocking it?).
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question