Solved

Blocking a spider

Posted on 2004-10-26
345 Views
Last Modified: 2010-05-18
www.picsearch.com (spider4.picsearch.com ) I presume from the ip address 62.119.133.14 and various ranges is constantly indexing my site.  It seems to visit every day and goes through hundreds of pages and I'm even tracking multiple visits from it at the same time. One day there was 6 simultaneous sessions with over 4000 pages it had visited.
How can I block just this one spider?
0
Question by:Gary
    11 Comments
     
    LVL 32

    Accepted Solution

    by:
    Hi GaryC123,

    Use robots.txt to disallow the spider to index your site as explained here:
    http://www.picsearch.com/menu.cgi?item=FAQ#q5

    The spider will obeys the rules in that file.

    Greetings,

    LucF
    0
     
    LVL 58

    Author Comment

    by:Gary
    Maybe I should've explored the site a bit more...
    Never used robots.txt.  Do I literally just put a txt file on the server with
    User-agent: psbot
    Disallow: /

    in it?
    0
     
    LVL 32

    Expert Comment

    by:Luc Franken
    Yes, robots.txt is pretty straight forward.
    Just put the robots.txt file in the root of your webpage like Experts Exchange does:
    http://www.experts-exchange.com/robots.txt

    Greetings,

    LucF
    0
     
    LVL 32

    Expert Comment

    by:Luc Franken
    btw, you might want to know some more about the robots.txt standard, please take a look at http://www.robotstxt.org/

    LucF
    0
     
    LVL 58

    Author Comment

    by:Gary
    Thanks LucF
    0
     
    LVL 32

    Expert Comment

    by:Luc Franken
    You're very welcome Gary,

    LucF
    0
     
    LVL 24

    Expert Comment

    by:duz
    GaryC123 -

    You may want to make a proper job of it :)

    User-agent: Wget
    User-agent: vsecrawler
    User-agent: TutorGig
    User-agent: Teleport Pro
    User-agent: Steeler
    User-agent: semanticdiscovery
    User-agent: ScoutAbout
    User-agent: RPT-HTTPClient
    User-agent: Reaper
    User-agent: rabaz
    User-agent: QuepasaCreep
    User-agent: puf
    User-agent: psbot
    User-agent: PhpDig
    User-agent: OWR_Crawler
    User-agent: obot
    User-agent: NPBot
    User-agent: NexaBot
    User-agent: NaverRobot
    User-agent: MSIECrawler
    User-agent: Larbin
    User-agent: Jyxobot
    User-agent: InfoNaviRobot
    User-agent: http://www.almaden.ibm.com/cs/crawler
    User-agent: grub-client
    User-agent: Generic
    User-agent: Gaisbot
    User-agent: EgotoBot
    User-agent: Dumbot
    User-agent: dloader(NaverRobot)
    User-agent: BravoBrian
    User-agent: baiduspider
    User-agent: asterias
    User-agent: ASPSeek
    Disallow: /

    - duz
    0
     
    LVL 58

    Author Comment

    by:Gary
    Ermm who are all them?  I don't want to block everyone as I do want the site indexed, just this one particular robot was eating up bandwidth for nothing and I'm not really interested in having all the images on my site being indexed.
    0
     
    LVL 32

    Expert Comment

    by:Luc Franken
    duz has a point there, you don't want these spiders crawling your page :o)
    Those are undesired, but at least they obey the robots.txt standard, there are also others that don't obey the standard.
    0
     
    LVL 24

    Expert Comment

    by:duz
    GaryC123 -

    >Ermm who are all them?

    Useless bandwidth eating bots (that obey robots.txt)

    - duz

    LucF -

    >there are also others that don't obey the standard

    Well over 150 that I see regularly. If you are interested in stopping them create a 'spider trap' like this one for example http://www.kloth.net/internet/bottrap.php

    - duz
    0
     
    LVL 32

    Expert Comment

    by:Luc Franken
    Thanks for that link duz, I'm trying it now.

    LucF
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Highfive Gives IT Their Time Back

    Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

    Suggested Solutions

    Read about why website design really matters in today's demanding market.
    Before we dive into the marketing strategies involved with creating an effective homepage, it’s crucial that EE members know what a homepage is. In essence, a homepage is the introductory, or default page, of a website that typically highlights the …
    Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
    Use Wufoo, an online form creation tool, to make powerful forms. Learn how to choose which pages of your form are visible to your users based on their inputs. The page rules feature provides you with an opportunity to create if:then statements for y…

    856 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now