building a bot

suppose i want to build a web bot that will scan only the main pages of all websites in some country.
where do i start from?
Sasha-NAsked:
Who is Participating?
 
OnthraxConnect With a Mentor Commented:
It depends on the language you want to write this in.

The correct search term you need to google your answer is:
'creating web spider'

For example a tutorial in .NET:
http://www.beansoftware.com/NET-Tutorials/Creating-Web-Spider.aspx

Or in Java:
http://www.javaworld.com/javaworld/jw-11-2004/jw-1101-spider.html

Hope this helps
0
 
Sasha-NAuthor Commented:
so i dont need special hardware and i can just run it from home server?
how much time do you think will take to scan all the net domains?
0
 
OnthraxCommented:
Correct. Although I imagine it will take a long long time with only your own machine to spider ALL the net domains.

For example Google has many many server parks consisting of hundreds of computers in every country.
0
 
Sasha-NAuthor Commented:
suppose i want to scan all the sites in russia- with .ru domain
will it take about a month? or about an year?
0
 
OnthraxCommented:
I really couldn't make an estimate m8. It all depends on a lot of factors.

For example:
- The method you will be using to spider domains. e.g. loop through all possible domains like a.ru, b.ru, ab.ru, abc.ru etc. or spider a few .ru sites and fetch links from those etc.
- The capacity of your machine. A pentium 1 would take longer than a powerfull Quad core.
- The bandwidth available and it's speed
- The size of the webpages you will be spidering. A single page with only a few words will be faster done than a site with a huge page and a lot of backpages.

Imagine google not having indexed the entire internet yet with all their machines. It's a huge job..

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.