Full Text Indexing problem with PDF Ifilter
Posted on 2005-03-11
I am having a problem with indexing of large pdf files. I have 2 large pdfs : both around 23meg and both around 2000 pages. When the gatherer tries to index them it fails and retries. It appears to be a 30sec time out failure as CPU usage drops after 30sec and then ramps up again.
It retries repeatedly without moving on to other documents - effectively getting stuck. It does not log an error in the Windows Event log or the sql log. It logs the following in the gatherer log:
09/03/2005 14:38:24 Add The gatherer has started
09/03/2005 14:38:26 Add The initialization has completed
09/03/2005 16:10:36 Add The gatherer has started
09/03/2005 16:10:40 Add The recovery has completed
09/03/2005 16:45:06 MSSQL75://SQLServer/76cba758/F87750AC4AACBF4BA9F2816993FBE5EA Add Error fetching URL, (80041201 - The object was not found. )
However this is only logged after the pdf files are deleted from the document library. Nothing is logged before.
Other documents in the database get indexed propoperly (if they were indexed before these pdfs) and the full-text catalogs are searchable. If the 2 large pdfs are removed then the indexing completes successfully. Other pdfs in the database are indexable and searchable.
I am using SQL 2000, SP3, Adobe IFilter 6.0, Windows 2003. The database is a Windows Sharepoint Services content database.