Link to home
Start Free TrialLog in
Avatar of tarrigo
tarrigo

asked on

Protecting Web Site Content

Any suggestions on how to protect a web site from scrapers and other site capturing tools. I have reviewed a few products and methods, but wanted to get other thoughts.
ASKER CERTIFIED SOLUTION
Avatar of killbrad
killbrad
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If you referring to encrypting the traffic when the data travels from the public network to the web server, SSL certificate will do this for you. Also to protect your server from scrapers block all ports in the firewall and only port 80 for http and port 443 for ssl that you need to open, if you are hosting your server.
check out this site: http://browsers.garykeith.com/

We have plans to put something similar in place. It requires adding a script to the beginning of every page (more likely an included file) that will check the data file from the above site against the browser that is requesting the page and not not send any content to the browsers listed as blocked in that file.

However, we are planning on doing this in PHP, and not ASP. There are several other tools available there.
mcse2007:  Blocking other ports is not gonna stop someone from leeching his website with spider software.  
Hube:  It's a nice idea, but not realistic.  Many webleech programs give you the specific option to spoof the headers sent.  Also, trying to access the page you referenced above in Linux gives me this:

ACCESS DENIED
You do not appear to be using this form in accordance with my Terms of Use.
Continued abuse will eventually result in you losing access to this server!
It's also possible you are using security software that modifies the HTTP_REFERER header.
If I can't confirm the referrer is valid then you can't have access to my forms.

hmmm  :-/
killbrad:  In your case I would say that you have a firewall or are going through a proxy server that blocks or modifies the HTTP_REFERER header. (that isn't my site) but it is a security measure that I also implement on forms on sites I build. All forms submit to themselves and if the form gets a post request from somewhere other than itself it will not process the request. This effectively stops 90% of spam bots submitting forms. But it also blocks those that have firewall or proxy servers set up like yours.
nope, it just won't let me use elinks to access it.. tried it from several places.
Forced accept.

Computer101
EE Admin