Lucene - Indexing & Reindexing

We would like to implement a Lucene based full text search for a 80 GB sized database. We are novice to Lucene technology, and the current plan is to achieve this by storing the index terms in a flat file.

1) For Indexing, is it wise to choose Zend Lucene (in terms of performance, stability & usage)? Research shows that Zend/PHP Lucene is much slower! So, would it better to use Java Lucene (or SOLR) for the indexing and Zend Framework for querying the search results?

2) In case of insert/update/delete in the records, how to handle the re-indexing?

Thanks!
LVL 32
ldbkuttyAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

hernst42Commented:
Hi Jaggy,

1) we also use Zend Lucene in our product for full text searches. The searches are fast and reliable and you can customize the search the way you need it

2) insert are also fast and easy to do.
updates are not possible, you have to delete the old entry (and thus you need a unique and bijective id to identify the record in the lucene index) from the search index and then readd the new/updated content.

deletes can be very time consuming as there seems to be a bug in the Zend Lucen implementation. The reliable method is very slow but works on a consistent Database. The preferred method by the developers is fast but does not work in all cases. We will submit a ticket after me made a Testcase for that.

Advantage of the PHP/Lucene is that you have more controll over you searches and elements, but you might also use the java version depending on the interface you need from the java version
0
ldbkuttyAuthor Commented:
Thanks Jurgen!

Could you please reference the developer friendly version of the Lucene? (that deletes entry from the search index and insert the new content)
0
hernst42Commented:
Hi Jaggy,

have a look at http://framework.zend.com/manual/en/zend.search.lucene.best-practice.html#zend.search.lucene.best-practice.unique-id. We use the third mehtod, but haven't compared the speed between second and thrid method. The problem we encountered with this metho was that the __id__ field was Unindex and not trades as keyword field ant thus the query did not find anything.
$term  = new Zend_Search_Lucene_Index_Term($id, '__id__');
        $docIds  = $index->termDocs($term);
        foreach ($docIds as $docId) {
            if (!$index->isDeleted($docId)) {
                return $docId;
            }
        }

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
ldbkuttyAuthor Commented:
Thanks again :-)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Web Development

From novice to tech pro — start learning today.