Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


Preventing robots scrolling through search results

Posted on 2012-03-14
Medium Priority
Last Modified: 2012-03-17
Working on a Real Estate website (similar to, I do not want people to be scraping the details of all the properties available on the site.

Therefore, I thought of having an additional "128-bit MD5 key" in the database and not use an incremental ID. (Ie. instead of This way, it will be virtually impossible to look through all the permutations and capture the lot.

However, the problem I am currently facing is : what would prevent someone from getting the whole list of MD5 keys using the search results page ? (a robot can crawl through all the pages resulting of a wide search) All the IDs would then be visible in the search results page (on the link to the detailed page).

Does that constitute a risk ? How can this be avoided ?

Question by:davidbayonchen
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2

Expert Comment

ID: 37719604
Well behaved Robots that can be instructed to only index parts of your website by using robots.txt. For more information see

To thwart malicious robots, you could use human authentication like captcha (  or hide / encrypt the searchresult url's with javascript.

Author Comment

ID: 37719617
Thanks for that.
How would you hide / encrypt the search results w/ javascript or jQuery ?

Author Comment

ID: 37719633
Also, is it a good idea to use an 32 characters key instead of an ID ?
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

LVL 15

Assisted Solution

by:Ess Kay
Ess Kay earned 450 total points
ID: 37720157
I like you idea of a key instead of Id

you can stop robots with htaccesss, metatags, and robots.txt

Blocking bad bots and site rippers (aka offline browsers)
LVL 15

Expert Comment

by:Ess Kay
ID: 37720293

More on hiding

Other conent/link hiding techniques

Here I'll present some quick examples of hiding techniques that didn't fit into previous chapters.

1x1 pixel images/transparent images

This technique is pretty simple: a webmaster adds a very small image to a web page and uses image's alt text instead of a usual anchor text. Since the image is very small or transparent it is almost impossible to spot such link for humans. This method was very popular some time ago, but at the moment search engine robots get smarter. Since it's really easy to find image size a very small (1x1 pixel) image rises suspicion flag.


This technique is more effective when used in combination with some nice CSS rules to make a frame appear as a consistent part of the web page. robots.txt rules can also be used to tell robots not to index page in iframe.

Noscript tag

This tag is designed to hold alternative content which is shown to a user who has JavaScript turned off. Since almost every user has JavaScript on the contents of noscript tag remains invisible to them.


Cloaking is almost always considered a black hat seo technique. Cloaking is a method where different content versions are presented to human visitors and search engine robots.

Usually it works in this way: server side script tries to find out if a user requesting the page is robot or human either by checking user's IP address or HTTP header User-agent string (or both). User agents in HTTP headers are identified in a similar way as in robots.txt rules.

A tip: I've seen somewhere the following php code that was suggested as a reliable robot detection algorithm:

$is_robot = (strpos($_SERVER['HTTP_USER_AGENT'], '+http') === false);
It is based on presumption that all search engine robots leave their home url in their User-agent setting. This is absolutely false. Many robots do not specify their home urls. To learn more about User-agent identifiers you can simply look at your server logs; also do not forget to check robots.txt chapter if you skipped it.

More advanced client detection techniques can also be based on the client's behavior analysis after a few page requests.

Flash/Java applets

Flash was considered terra incognita for search engine spiders for quite a long time. But recently Google announced that they already index some parts of Flash (and I believe, that their skills of Flash reading will improve in the future). So this technique can not be considered as a very reliable one.

A similar alternative technique is Java applets. Search engines are not indexing applets content yet. And I can not be sure about the future. However, it is very easy to extract some information from Java, so Java applets should be used with care.

Robots-nocontent class

In May 2007 Yahoo introduced robots-nocontent html attribute which was meant to hide any part of a page from Yahoo robot. This attribute can be used on any html element, like:

<div class="robots-nocontent">
Yahoo robot (slurp) should read this as: content in the div marked with this attribute is unimportant. The biggest downside of this technique is that it works only in Yahoo. So it is not very popular among seo web masters.

Final words

I tried to sketch the most popular techniques of content/links hiding. However, a lot of other methods can be met online and you can also invent your own ways. The techniques I described above are more powerful when used together (e.g., JavaScript + CSS + robots.txt).

In any case, I do not advocate content/link hiding when it is used to manipulate search engine rankings in an unethical way. Still, if you do this in a good deed, I hope this article will give you a clue on how to hide your secrets properly in order not to get penalized accidentally. Also this should help you spot unethical guys and not mess with them.

Expert Comment

ID: 37724120
Just remember that this is an attempt to outsmart malicious robots / dataminers while maintaining good usability. Robots get smarter and may defeat your defenses or you have to stop because the solution is more damaging then the problem.

One assumption is that robots can't evaluate javascript and therefore won't be able to find URL's encrypted and inserted that way.

A good example can be found at:
I found some source code for this at but there are other (PHP) adaptations.

Another options is to use a navigation form, update its parameters with javascript in your links and post the form. Example:

<form name="navFrm" action="" method="post">
  <input type="hidden" name="uid" value="" />

<a href="#" onClick="document.navFrm.uid.value='ABCD123'; document.navFrm.submit();">Item ABCD123</a>

Open in new window

You can add encryption and other nifty tricks, but in the end all information is in your HTML and you use smoke and mirrors to fool the robot.

Author Comment

ID: 37724385
@ Ironhoofs : A robot cannot see the onClick="document.navFrm.uid.value='ABCD123';" but a page scrapper can pick it up. Am I right ?

I am still wondering whether I should go ahead with the 32-bit IDs...

Accepted Solution

Ironhoofs earned 1050 total points
ID: 37724525
A robot / page scraper will pick up any human readable URL or email address, like the URL you put in the <FORM> tag.

The robot will also parse the onClick event, but because most can't execute the script they discard it. However, smarter robots could make an educated guess about the parameter and value from the javascript. Thats why the Hivelogic enkoder obscures the data.

I witnessed malicious robots submitting data after they parsed the page for forms. Therefore using large non-sequenced id's can be another small step towards hiding your content from unwanted eyes.

But in the end, you have to decide how much trouble you are willing to go through and if the solution is not chasing your visitors away...

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Cyber News Rundown brings you the latest happenings in cyber news weekly. Who am I? I’m Connor Madsen, a Webroot Threat Research Analyst, and a guy with a passion for all things security. Any more questions? Just ask.
The recent Petya-like ransomware attack served a big blow to hundreds of banks, corporations and government offices The Acronis blog takes a closer look at this damaging worm to see what’s behind it – and offers up tips on how you can safeguard your…
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question