Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Perl tarpit CGI for Apache?

Posted on 2003-11-03
4
633 Views
Last Modified: 2012-05-04

I am running an Apache (1.13) httpd server on a Unix
box.  The system is constantly being probed by various
annoying robots that ignore robots.txt.  I would like a
Perl CGI script that tarpits those robots by limiting the
bandwidth of the response to some small value.

In other words, here's a little bit of data, Mr. Bot (not necessarily what you requested, either), stall, stall, here's
a little bit more, stall, stall ... and let's see how long we
can keep you teergrubed here with this tempting big file.

It would be entertaining, though certainly not necessary,
if the script tracked how long it was able to keep a bot
on the hook, and kept a Top 10 list of same.

I have looked through the cpan archives, and can't find
anything that quite fits, and I don't trust my own Perl
skill enough to handle exception conditions such as an
unexpected remote disconnect during transfer.
0
Comment
Question by:Dr. Klahn
  • 2
4 Comments
 
LVL 18

Accepted Solution

by:
kandura earned 200 total points
ID: 9677609
I tried the following script on Apache. If I press stop in my browser, the script is terminated by Apache, and the message to STDERR never gets written to the error_log.
I think it's safe to assume that Apache will handle remote disconnects for you.

How to keep track of how long you stall each bot can be done in any manner of ways: use a database, use a log file, use a pipe and a daemon etc...
You would just write an id for this run, the bot, (the request it originally made etc) and the time it's currently running.
Log file is probably the least recommendable as it might grow too hard.

Now, just drop in an Apache redirect file at the root and you're set :-)

---8<---
#!/usr/bin/perl
use CGI;

$|++;
$cgi = new CGI;
$t = time;

print $cgi->header();
print "<html><body>";

while(1) {
      last unless print "bla<p>";
      sleep(2);
}

print STDERR "fallen out of the loop, time taken ", (time() - $t), " sec.";
0
 
LVL 20

Expert Comment

by:jmcg
ID: 9684893
What method do you use to decide that a request comes from an annoying bot and should be handled by your CGI?

On Apache 2, there's a mod_ext_filter example for "slowing down the server", but it applies to all callers.
0
 
LVL 26

Author Comment

by:Dr. Klahn
ID: 9691503
> What method do you use to decide that a request comes from
> an annoying bot and should be handled by your CGI?

Examination of the browser-ident field (HTTP_USER_AGENT),
and for bots that spoof the browser-ident, examination and
filtering on the originating IP address.

See URL  http://www.leekillough.com/robots.html  for a more
in-depth example.

0
 
LVL 20

Expert Comment

by:jmcg
ID: 9691696
That page about defeating bad robots is quite comprehensive.

The script Kandura gave you would certainly keep the bots entertained for as long as they could stand it. You'd still need to use the rewrite rules and module listed in the document to redirect the bot's request to this script (and why would that be better than just blocking it?).
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question