Link to home
Start Free TrialLog in
Avatar of superfly18
superfly18

asked on

Comparing Documents by Lucene Indexes

I have an issue with duplicate documents.  My application automatically fetches documents from RSS feeds.  However, in some cases the same document will be available from multiple RSS 'channels'  This causes the same document to have different URLs.  I am wondering if there is a way to test for equality between two documents by comparing their Lucene Indexes.  Any thoughts?
ASKER CERTIFIED SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of superfly18
superfly18

ASKER

Ok.  I will test this...but any other options?  e.g. using native lucene methods?
I would guess that that guy had looked into that before he wrote it...