Advertisement

06.02.2005 at 05:34PM PDT, ID: 21445368
[x]
Attachment Details

web archive

Asked by lomidien in Java Programming Language

This might be as much of a general architecture question as it is specifically related to java.

I'm taking a crack at writing a java app which will be running on a server to perform the following functions.

1) read in a list of keywords from a db table
2) perform google searches via their api
3) return the first 10,000 hits as links into another table
4) producer/consumer thread model to retrieve these links form the table and retrieve the page
5) save the text from this page in a db table identifying its source and this will be the table which will be used in an "archive" search

This is essentially my first though at approaching the problem.  Before I ask for particular advice on which method you would recommend in retrieving the pages, is there something inherently bad with my process above??  I think that I would really rather save off the pages and images in a directory structure of some sort, but I'm not entirely sure how that would be searchable from a webpage in that case.

Any advice????  Sorry it's somewhat of a broad question, but I think that perhaps you see my aim from the above.  The exact purpose is to automate the retrieval of static information from the internet for review by staff members of the U.N.

Thanks,
DavidStart Free Trial
[+][-]06.02.2005 at 06:01PM PDT, ID: 14136162

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]06.02.2005 at 06:34PM PDT, ID: 14136280

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]06.02.2005 at 08:47PM PDT, ID: 14136868

View this solution now by starting your 7-day free trial. Setting up your free trial is quick, easy, and secure. We will return you to this solution, unlocked, when you're done.

 

About this solution

Zone: Java Programming Language
Sign Up Now!
Solution Provided By: lhankins
Participating Experts: 1
Solution Grade: A
 
 
 
Loading Advertisement...
20080716-EE-VQP-32