Lucene - Indexing & Reindexing

We would like to implement a Lucene based full text search for a 80 GB sized database. We are novice to Lucene technology, and the current plan is to achieve this by storing the index terms in a flat file.

1) For Indexing, is it wise to choose Zend Lucene (in terms of performance, stability & usage)? Research shows that Zend/PHP Lucene is much slower! So, would it better to use Java Lucene (or SOLR) for the indexing and Zend Framework for querying the search results?

2) In case of insert/update/delete in the records, how to handle the re-indexing?

Thanks!
LVL 32
ldbkuttyAsked:
Who is Participating?
 
hernst42Connect With a Mentor Commented:
Hi Jaggy,

have a look at http://framework.zend.com/manual/en/zend.search.lucene.best-practice.html#zend.search.lucene.best-practice.unique-id. We use the third mehtod, but haven't compared the speed between second and thrid method. The problem we encountered with this metho was that the __id__ field was Unindex and not trades as keyword field ant thus the query did not find anything.
$term  = new Zend_Search_Lucene_Index_Term($id, '__id__');
        $docIds  = $index->termDocs($term);
        foreach ($docIds as $docId) {
            if (!$index->isDeleted($docId)) {
                return $docId;
            }
        }

Open in new window

0
 
hernst42Commented:
Hi Jaggy,

1) we also use Zend Lucene in our product for full text searches. The searches are fast and reliable and you can customize the search the way you need it

2) insert are also fast and easy to do.
updates are not possible, you have to delete the old entry (and thus you need a unique and bijective id to identify the record in the lucene index) from the search index and then readd the new/updated content.

deletes can be very time consuming as there seems to be a bug in the Zend Lucen implementation. The reliable method is very slow but works on a consistent Database. The preferred method by the developers is fast but does not work in all cases. We will submit a ticket after me made a Testcase for that.

Advantage of the PHP/Lucene is that you have more controll over you searches and elements, but you might also use the java version depending on the interface you need from the java version
0
 
ldbkuttyAuthor Commented:
Thanks Jurgen!

Could you please reference the developer friendly version of the Lucene? (that deletes entry from the search index and insert the new content)
0
 
ldbkuttyAuthor Commented:
Thanks again :-)
0
All Courses

From novice to tech pro — start learning today.