Advertisement

10.13.2008 at 06:09AM PDT, ID: 23809333 | Points: 125
[x]
Attachment Details

Creating a web bot/crawler/spider for multiple websites

Asked by kishorealla in Search Engines, Programming Languages, Java Programming Language

Tags:

Hello

I need to create a web bot/crawler/spider that would go into different web sites and collect data for us and store in a database. The crawler needs to 'READ' the options on a website (either from drop-downs, radio-buttons or check-boxesand) to create some input itself OR use some generic pre-defined words (that we provide it with).

For example, a webpage might be structure with a text field and some drop-downs. Typically, if the user enters the case number of a court case the web-site displays the status, and also there might be different legal documents thay could be retrieved through drop-down options like: 'Industry Permits', 'Civil Cases', 'Criminl cases' etc. So the crawler should be able to read and self-generate a list of suitable options and use them to get the data. we want to create a bot/crawler/spider that will automatically enter the information about multiple cases etc. i.e. case numbers (text field), case type (from drop-downs) and retrieve the data about the relevant cases available on the website.

What is the best approach to achieve this? We can write inidividual bots for each website but are trying to come-up with a more intelligent bot or crawler that can be used to crawl multiple websites. Please advise on how we can achive this.

We are not doing anything illegal, everything perfectly legal. Please advise on how we can achieve this.

Regards
KishoreStart Free Trial
 
Loading Advertisement...
 
[+][-]10.13.2008 at 08:01PM PDT, ID: 22708311

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]10.20.2008 at 11:17AM PDT, ID: 22760581

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
 
Loading Advertisement...
20080716-EE-VQP-32 - Hierarchy / EE_QW_2_20070628