Wildcard Disallow for robots.txt file ?

I have a number of pages of a site that have a paramater that I would like, when a robot finds it, will ignore it.

Example:

/bananas/frog/apple/foobar.raw
/mary/jack/jill/foobar.raw
/tony/orlando/dawn/foobar.raw

Can I make one disallow to cover anything that ends in /foobar.raw ??

Example,

Specifically, can I put the following in the robots.txt file

disallow /foobar

....and then it would take care of all instances where /foobar.raw is the last item in a url?

Thanks

Rowby
LVL 9
Rowby GorenAsked:
Who is Participating?
 
grahamnonweilerCommented:
First you are assuming that all spiders will comply with your robots.txt instructions, and this is not actually the case. Apart from perhaps Google, Inktomi and MS, the majority of spiders that will hit your site will igonore the disallow instructions, and worse still anything "disallowed"  is like a big welcome sign for anyone with malicious intent.

With this said the robots.txt file is more directory orientated than individual files, so while you can disallow an entire directory (effectively wildcard'ing its contents) you must explicitly state individual files.

In your post the "disallow /foobar" would be interpreted as disallowing access to the directory "foobar" and all contents in it.

Thus, you will have to explicitly disallow each and every file you do not want spiders to access.

Example:

disallow /bananas/frog/apple/foobar.raw
disallow /mary/jack/jill/foobar.raw
disallow /tony/orlando/dawn/foobar.raw
0
 
Rowby GorenAuthor Commented:
Hi  grahamnonweiler,

Thanks for that info and observation re behavior of spiders.

Sorry for the delay in awarding you your points.

I know now what to do!

Rowby
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.