asked on

How can I configure a java webcrawler to pull RSS data into a database?

I would like to configure a java webcrawler to read news all day from specific news sites and put the news titles and body into a database for analysis by another application. What would be the easiest way to work this out?

Ideally, I would create a webcrawler object with a constructor that takes an address of a RSS feed, and then call methods on the object that return the contents of the feed as strings or streams or anything that will allow me to parse and manipulate the data into a database or for some other reason. I have done the basic searches on the topic and most of the information I found about it was for JSP and contained code to reproduce html pages. I dont really need this, I just need the information from news sites in a java environment where I can run java code on it.

thanks!
-md

ASKER CERTIFIED SOLUTION

CEHJ

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Mick Barry

try informa

http://informa.sourceforge.net/index.html

or feedparser

http://commons.apache.org/sandbox/feedparser/

ysnky

have a look at these articles;
http://today.java.net/pub/a/today/2003/08/08/rss.html
http://java.sun.com/developer/technicalArticles/javaserverpages/rss_utilities/

CEHJ

>>
try informa
...
or feedparser
>>

(both already mentioned)

CEHJ

:-)

Mick Barry

thought you would have liked a little more insight than someone googling just an answer for you. Next time just type "java rss api" in google and save yourself some time :)