Solved

Perl tarpit CGI for Apache?

Posted on 2003-11-03
4
630 Views
Last Modified: 2012-05-04

I am running an Apache (1.13) httpd server on a Unix
box.  The system is constantly being probed by various
annoying robots that ignore robots.txt.  I would like a
Perl CGI script that tarpits those robots by limiting the
bandwidth of the response to some small value.

In other words, here's a little bit of data, Mr. Bot (not necessarily what you requested, either), stall, stall, here's
a little bit more, stall, stall ... and let's see how long we
can keep you teergrubed here with this tempting big file.

It would be entertaining, though certainly not necessary,
if the script tracked how long it was able to keep a bot
on the hook, and kept a Top 10 list of same.

I have looked through the cpan archives, and can't find
anything that quite fits, and I don't trust my own Perl
skill enough to handle exception conditions such as an
unexpected remote disconnect during transfer.
0
Comment
Question by:Dr. Klahn
  • 2
4 Comments
 
LVL 18

Accepted Solution

by:
kandura earned 200 total points
ID: 9677609
I tried the following script on Apache. If I press stop in my browser, the script is terminated by Apache, and the message to STDERR never gets written to the error_log.
I think it's safe to assume that Apache will handle remote disconnects for you.

How to keep track of how long you stall each bot can be done in any manner of ways: use a database, use a log file, use a pipe and a daemon etc...
You would just write an id for this run, the bot, (the request it originally made etc) and the time it's currently running.
Log file is probably the least recommendable as it might grow too hard.

Now, just drop in an Apache redirect file at the root and you're set :-)

---8<---
#!/usr/bin/perl
use CGI;

$|++;
$cgi = new CGI;
$t = time;

print $cgi->header();
print "<html><body>";

while(1) {
      last unless print "bla<p>";
      sleep(2);
}

print STDERR "fallen out of the loop, time taken ", (time() - $t), " sec.";
0
 
LVL 20

Expert Comment

by:jmcg
ID: 9684893
What method do you use to decide that a request comes from an annoying bot and should be handled by your CGI?

On Apache 2, there's a mod_ext_filter example for "slowing down the server", but it applies to all callers.
0
 
LVL 24

Author Comment

by:Dr. Klahn
ID: 9691503
> What method do you use to decide that a request comes from
> an annoying bot and should be handled by your CGI?

Examination of the browser-ident field (HTTP_USER_AGENT),
and for bots that spoof the browser-ident, examination and
filtering on the originating IP address.

See URL  http://www.leekillough.com/robots.html  for a more
in-depth example.

0
 
LVL 20

Expert Comment

by:jmcg
ID: 9691696
That page about defeating bad robots is quite comprehensive.

The script Kandura gave you would certainly keep the bots entertained for as long as they could stand it. You'd still need to use the rewrite rules and module listed in the document to redirect the bot's request to this script (and why would that be better than just blocking it?).
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

26 Experts available now in Live!

Get 1:1 Help Now