Solved

building a bot

Posted on 2008-10-08
6
265 Views
Last Modified: 2013-12-08
suppose i want to build a web bot that will scan only the main pages of all websites in some country.
where do i start from?
0
Comment
Question by:Sasha-N
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 13

Accepted Solution

by:
Onthrax earned 500 total points
ID: 22673735
It depends on the language you want to write this in.

The correct search term you need to google your answer is:
'creating web spider'

For example a tutorial in .NET:
http://www.beansoftware.com/NET-Tutorials/Creating-Web-Spider.aspx

Or in Java:
http://www.javaworld.com/javaworld/jw-11-2004/jw-1101-spider.html

Hope this helps
0
 

Author Comment

by:Sasha-N
ID: 22673946
so i dont need special hardware and i can just run it from home server?
how much time do you think will take to scan all the net domains?
0
 
LVL 13

Expert Comment

by:Onthrax
ID: 22676379
Correct. Although I imagine it will take a long long time with only your own machine to spider ALL the net domains.

For example Google has many many server parks consisting of hundreds of computers in every country.
0
 

Author Comment

by:Sasha-N
ID: 22677001
suppose i want to scan all the sites in russia- with .ru domain
will it take about a month? or about an year?
0
 
LVL 13

Expert Comment

by:Onthrax
ID: 22677091
I really couldn't make an estimate m8. It all depends on a lot of factors.

For example:
- The method you will be using to spider domains. e.g. loop through all possible domains like a.ru, b.ru, ab.ru, abc.ru etc. or spider a few .ru sites and fetch links from those etc.
- The capacity of your machine. A pentium 1 would take longer than a powerfull Quad core.
- The bandwidth available and it's speed
- The size of the webpages you will be spidering. A single page with only a few words will be faster done than a site with a huge page and a lot of backpages.

Imagine google not having indexed the entire internet yet with all their machines. It's a huge job..

0

Featured Post

Space-Age Communications Transitions to DevOps

ViaSat, a global provider of satellite and wireless communications, securely connects businesses, governments, and organizations to the Internet. Learn how ViaSat’s Network Solutions Engineer, drove the transition from a traditional network support to a DevOps-centric model.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Adding a countdown to HTA 12 111
How to post data to an API using ASP Classic 3 49
modify font on ninja form 1 19
website linking 3 18
An enjoyable and seamless user experience can go a long way on an eCommerce site. While a cohesive layout and engaging copy play roles in creating a positive user experience, some sites neglect aspects that seem marginal but in actuality prove very …
Lease-to-own eliminates the expenditure of hardware replacement and allows you to pay off the server over time. Usually, this is much cheaper than leasing servers. Think of lease-to-own as credit without interest.
This video teaches users how to migrate an existing Wordpress website to a new domain.
The viewer will receive an overview of the basics of CSS showing inline styles. In the head tags set up your style tags: (CODE) Reference the nav tag and set your properties.: (CODE) Set the reference for the UL element and styles for it to ensu…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question