How can I configure a java webcrawler to pull RSS data into a database?
Posted on 2007-11-30
I would like to configure a java webcrawler to read news all day from specific news sites and put the news titles and body into a database for analysis by another application. What would be the easiest way to work this out?
Ideally, I would create a webcrawler object with a constructor that takes an address of a RSS feed, and then call methods on the object that return the contents of the feed as strings or streams or anything that will allow me to parse and manipulate the data into a database or for some other reason. I have done the basic searches on the topic and most of the information I found about it was for JSP and contained code to reproduce html pages. I dont really need this, I just need the information from news sites in a java environment where I can run java code on it.