ISA Scheduled Cache content download ignoring remote robots.txt
Posted on 2005-04-08
We have set up a scheduled content download, for this local news site, which people access often, (while they should be working)..
Last week the admin of this web site, set up some formof security in order to ban IPs "attacking" their site... The guy explained that, the softawre is detecting a connection called "Fetch API Request" which is downloading content and ignoring the instructions given by robots.txt.
I have identified the problem to come from the scheduled ISa caching...and i disabled this setting which tells it to cache even if it doesn't get HTTP status code 200... but we still got banned..
Next.. I foudn a setting in the scheduled caching task, instructing the service keep content download within the URL domain... i enabled it, but we still got banned..
I tried lookign for any references to this robots.txt standard for ISA, but didn't find anything..
The usual " Screw teh standards, lets do it our own way", i guess...
I also foudn an event log error, trying to access an administrative page of apache..for this particular schedule..
I could disable the schedule, but this may happen in the future..with other sites
(Sorry if this category isn't exactly related, but this one is teh only one with ref to ISa)