Link to home
Start Free TrialLog in
Avatar of shore-support
shore-support

asked on

How to get more RSS feed items from reuters

We are trying to display relevant news items on our ASP.NET site. We download these feed items from reuters. for example, we use the link http://feeds.reuters.com/reuters/topNews?format=xml

The problem is, the reuters feed is giving only the latest 10 items and the older ones are quickly expiring. We would like to get all the news items of the last month or last 7 days depending on the preferences. Is there someway to get this done?
Avatar of jones1618
jones1618

Two answers spring to mind:

1. Locally cache Reuters news items yourself. If yours is a high traffic site, that's just the courteous thing to do.

Pseudo-code: Check cache age. If it is older than X hours, fetch latest RSS from Reuters. Add new items (if any) to cache. Expire any items older than X days/hours.

2. Reuters Labs provides a richer set of feed options including a "count" parameter (which doesn't work in the feedburner feed, unfortunately.) Note: You have to register and declare your intended use of the data.

Reuters Labs - Spotlight
http://spotlight.reuters.com/page/2007/07/10/feeds
Here is how I might approach it...

Use a cron job to read the Reuters feed.  You can tinker with the interval - perhaps run this every few minutes, depending on the rate of change in the Reuters feed.

Pull the items out of the feed.  Make a hash code from the entire text of the item from the open <item> to the close </item>  Insert a timestamp, the hash code and the item text into your data base.  I would mark the hash code UNIQUE to force an error when you try to insert the same item twice.

If you're getting a lot of duplicate hash code errors you might lengthen the time between runs of the cron job.

Then use your data base to serve the items in your RSS feed.

HTH, ~Ray
Avatar of shore-support

ASKER

We want to use the feed items for commercial purpose. But the "Reuters Labs - Spotlight" prohibits use of feeds for commercial purposes.

I think the best way is to keep running our scheduled job at some interval and catch the feed items.

Google reader keeps all the items in cache. But to access those items, one needs to authenticate into the google services using google API. I think this API requires entering of captcha code by the user. If we can authenticate some how into GOOGLE Reader, then we can read the feed items. This way we do not need to cache the items.

Please let me know your comments?
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial