Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Blocking a spider

Posted on 2004-10-26
11
Medium Priority
?
396 Views
Last Modified: 2010-05-18
www.picsearch.com (spider4.picsearch.com ) I presume from the ip address 62.119.133.14 and various ranges is constantly indexing my site.  It seems to visit every day and goes through hundreds of pages and I'm even tracking multiple visits from it at the same time. One day there was 6 simultaneous sessions with over 4000 pages it had visited.
How can I block just this one spider?
0
Comment
Question by:Gary
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 3
  • 2
11 Comments
 
LVL 32

Accepted Solution

by:
LucF earned 1000 total points
ID: 12409115
Hi GaryC123,

Use robots.txt to disallow the spider to index your site as explained here:
http://www.picsearch.com/menu.cgi?item=FAQ#q5

The spider will obeys the rules in that file.

Greetings,

LucF
0
 
LVL 58

Author Comment

by:Gary
ID: 12409132
Maybe I should've explored the site a bit more...
Never used robots.txt.  Do I literally just put a txt file on the server with
User-agent: psbot
Disallow: /

in it?
0
 
LVL 32

Expert Comment

by:LucF
ID: 12409162
Yes, robots.txt is pretty straight forward.
Just put the robots.txt file in the root of your webpage like Experts Exchange does:
http://www.experts-exchange.com/robots.txt

Greetings,

LucF
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 32

Expert Comment

by:LucF
ID: 12409296
btw, you might want to know some more about the robots.txt standard, please take a look at http://www.robotstxt.org/

LucF
0
 
LVL 58

Author Comment

by:Gary
ID: 12409340
Thanks LucF
0
 
LVL 32

Expert Comment

by:LucF
ID: 12409395
You're very welcome Gary,

LucF
0
 
LVL 24

Expert Comment

by:duz
ID: 12409406
GaryC123 -

You may want to make a proper job of it :)

User-agent: Wget
User-agent: vsecrawler
User-agent: TutorGig
User-agent: Teleport Pro
User-agent: Steeler
User-agent: semanticdiscovery
User-agent: ScoutAbout
User-agent: RPT-HTTPClient
User-agent: Reaper
User-agent: rabaz
User-agent: QuepasaCreep
User-agent: puf
User-agent: psbot
User-agent: PhpDig
User-agent: OWR_Crawler
User-agent: obot
User-agent: NPBot
User-agent: NexaBot
User-agent: NaverRobot
User-agent: MSIECrawler
User-agent: Larbin
User-agent: Jyxobot
User-agent: InfoNaviRobot
User-agent: http://www.almaden.ibm.com/cs/crawler
User-agent: grub-client
User-agent: Generic
User-agent: Gaisbot
User-agent: EgotoBot
User-agent: Dumbot
User-agent: dloader(NaverRobot)
User-agent: BravoBrian
User-agent: baiduspider
User-agent: asterias
User-agent: ASPSeek
Disallow: /

- duz
0
 
LVL 58

Author Comment

by:Gary
ID: 12409427
Ermm who are all them?  I don't want to block everyone as I do want the site indexed, just this one particular robot was eating up bandwidth for nothing and I'm not really interested in having all the images on my site being indexed.
0
 
LVL 32

Expert Comment

by:LucF
ID: 12409496
duz has a point there, you don't want these spiders crawling your page :o)
Those are undesired, but at least they obey the robots.txt standard, there are also others that don't obey the standard.
0
 
LVL 24

Expert Comment

by:duz
ID: 12409708
GaryC123 -

>Ermm who are all them?

Useless bandwidth eating bots (that obey robots.txt)

- duz

LucF -

>there are also others that don't obey the standard

Well over 150 that I see regularly. If you are interested in stopping them create a 'spider trap' like this one for example http://www.kloth.net/internet/bottrap.php

- duz
0
 
LVL 32

Expert Comment

by:LucF
ID: 12409736
Thanks for that link duz, I'm trying it now.

LucF
0

Featured Post

Reclaim your office - Try the MB 660 headset now!

High level of background noise often makes it difficult for employees to concentrate fully on their jobs – or to communicate clearly on calls. The MB 660 headset helps you create a disruption free workspace.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

We aren’t perfect, just like everyone else.  Check out the email errors our community caught and learn the top errors every email marketer should avoid.
In today's business world, data is more important than ever for informing marketing campaigns. Accessing and using data, however, may not come naturally to some creative marketing professionals. Here are four tips for adapting to wield data for insi…
An overview of how to create reports in Adobe Analytics (formerly Omniture Site Catalyst) using pageNames, events, eVars and props. This video will show you how to install the Omniture Debugger tool so can see (and test) what is being passed int…
Use Wufoo, an online form creation tool, to make powerful forms. Learn how to choose which pages of your form are visible to your users based on their inputs. The page rules feature provides you with an opportunity to create if:then statements for y…

609 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question