Solved

Ban or limit site crawlers by ip

Posted on 2010-09-02
4
1,000 Views
Last Modified: 2012-05-10
Hi

I have a problem of people crawling my site using proxies - seems to be steeling content or at times a minor DOS problem. I have blocked lots of user agents and have mod_evasive in place and working. Helpfully, some of the crawlers are using a malformed URL and getting a 404 - so I can see them in my logs.  

What I am looking for is some kind of logic to ban the ip address which requests a particular URL, temporarily would be fine. Bandwidth throttling would also be fine, but ideally to target ip address of culprits.  The ideal solution would sit in the virtual host - <Location /honeytrap>

I have mod_security installed and think it is possible to use this to as described by B1vr half way down this page: http://www.linuxquestions.org/questions/linux-security-4/apache_mod_security-setup-help-607846/

Though I can't get it to work. The logic could be - if url x is requested, ban all requests from that IP for ten minutes.  One slight complexity is that the server is behind a proxy, so I use X-Forwarded-For in the logs - don't want to ban the downstream proxy!

I know how to block access to the single URL, but the crawlers are then grabbing lots of other pages as well - I can see who they are only because of the malformed POST that I assume is designed to hit the server resources.

Thanks for any help
0
Comment
Question by:richp10
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 3

Expert Comment

by:simoesp
ID: 33588397
0
 

Author Comment

by:richp10
ID: 33588507
No, it's not images it's the main pages of the site (images are all on a cdn anyway).

I wonder whether I could use the solution at the end of this: http://www.experts-exchange.com/Software/Server_Software/Web_Servers/Apache/Q_23722587.html?sfQueryTermInfo=1+10+30+block+ip+mod+secur

Any thoughts on how I could call this blocking programme using X-Forwarded-For and without PHP??!
0
 
LVL 3

Accepted Solution

by:
simoesp earned 500 total points
ID: 33588658
0
 

Author Closing Comment

by:richp10
ID: 33607404
Very good advice - not quite sure yet whether it will work correctly for X-Forwarded-For though this does seem to answer to the main part of my question.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Over the last year I have answered a couple of basic URL rewriting questions several times so I thought I might as well have a stab at: explaining the basics, providing a few useful links and consolidating some of the most common queries into a sing…
If you are a web developer, you would be aware of the <iframe> tag in HTML. The <iframe> stands for inline frame and is used to embed another document within the current HTML document. The embedded document could be even another website.
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question