Link to home
Start Free TrialLog in
Avatar of finnstone
finnstone

asked on

is it possible to normalize a rss feed to remove duplicate links?

I have a rss feed from twitter list.

This often has duplicates - tweets that have the same link

How can I remove the tweets from the rss that have the dupes?
Avatar of Scott Fell
Scott Fell
Flag of United States of America image

I don't know perl but I think I would do this client side anyway.  Create an array of links.  For each post, first find the links and see if they are in the current array.   If not, add the link to the array.  If the link is in the array, then don't use it.  If there are not that many tweets you are checking, there may be no difference client/serverside.
Avatar of finnstone
finnstone

ASKER

does not have to be perl, i didnt know where to post this
What language do you work in?  I'm sure there will be multiple ways to do this in any language.  

You would want to look at a post, then find each string starting, "<a " and ending in "</a>".  Then from the complete string <a href="http://mypage.com/link">Link</a> take just the mypage.com/link and add that to your array.

Then for each post look for a link, if the, "mypage.com/link" matches what is in the array, then do not use the link or post.  

I think you will find this harder though because the same link may be shortened with different url shorteners. If that is not a problem, this should work.
I do not. I am  going to hire someone and am budget insensitive.
Yes, but I think there is a system that could return true URL
ASKER CERTIFIED SOLUTION
Avatar of Scott Fell
Scott Fell
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial