ISA Scheduled Cache content download ignoring remote robots.txt

Posted on 2005-04-08
Last Modified: 2010-04-09
We have set up a scheduled content download, for this local news site, which people access often, (while they should be working)..

Last week the admin of this web site, set up some formof security in order to ban IPs "attacking" their site... The guy explained that, the softawre is detecting a connection called "Fetch API Request" which is downloading content and ignoring the instructions given by robots.txt.

I have identified the problem to come from the scheduled ISa caching...and i disabled this setting which tells it to cache even if it doesn't get HTTP status code 200... but we still got banned..

Next.. I foudn a setting in the scheduled caching task, instructing the service keep content download within the URL domain... i enabled it, but we still got banned..

I tried lookign for any references to this robots.txt standard for ISA, but didn't find anything..
The usual " Screw teh standards, lets do it our own way", i guess...

I also foudn an event log error, trying to access an administrative page of apache..for this particular schedule..
I could disable the schedule, but this may happen in the future..with other sites

(Sorry if this category isn't exactly related, but this one is teh only one with ref to ISa)

Question by:miklesw
    LVL 12

    Expert Comment

    Hmm...I dont think so the Robots.txt is a standard...Probably you wanna have a look in to it and comply to the "terms & Conditions"...
    LVL 1

    Author Comment

    If let's say, the robots.txt instructs me not to go to, So i tell ISA myself nto to download that, i'lls till have a problem..if next month is disallowed for example...

    PS this robots thing is on the w3c site
    LVL 35

    Accepted Solution

    Not quite sure, why you want to use then scheduled download for that, I think this functionality is more used to snyc static sites. Robots.txt is mainly introduced for web-spider software to avoid indexing non related or old content. Like some of the spiders, I'm would not wonder about, if  ISA would ignore it.

    As you said, that this is a news site, which has usually a very short TTL and may change often, I think that the normal cache functionality of ISA would do what you want. Whenever a site is viewed, ISA stores the content within the cache and it is not downloaded again during the first 50% TTL, if the site itself has not changed. Also any entry within the robots.txt do not affect the status code, as long as the site is still available. Robots.txt has nothing to do with the fact, if a site is available or not, it simply sais, that the robots (spiders) should not index the site anymore.

    What the content provider should do is, do enable a site redirection for the sites, which should not accessed anymore. This delivers a different status code, what can be recocnized by ISA.

    On you site, you may have to clear the ISA cache, if you have made changed to avoid, that ISA requests updates for the cache content.


    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Free Trending Threat Insights Every Day

    Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

    Suggested Solutions

    Title # Comments Views Activity
    Assymetric routing asa 4 35
    iptables and udp ports 23 74
    Cisco asax sourcefire Ips 7 56
    ACL per VPN User 12 95
    To setup a SonicWALL for policy based routing to be used with the Websense Content Gateway there are several steps that need to be completed. Below is a rough guide for accomplishing this. One thing of note is this guide is intended to assist in the…
    The DROP (Spamhaus Don't Route Or Peer List) is a small list of IP address ranges that have been stolen or hijacked from their rightful owners. The DROP list is not a DNS based list.  It is designed to be downloaded as a file, with primary intention…
    In this sixth video of the Xpdf series, we discuss and demonstrate the PDFtoPNG utility, which converts a multi-page PDF file to separate color, grayscale, or monochrome PNG files, creating one PNG file for each page in the PDF. It does this via a c…
    Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

    779 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now