Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


Full Text Indexing problem with PDF Ifilter

Posted on 2005-03-11
Medium Priority
Last Modified: 2012-06-21
I am having a problem with indexing of large pdf files. I have 2 large pdfs : both around 23meg and both around 2000 pages. When the gatherer tries to index them it fails and retries. It appears to be a 30sec time out failure as CPU usage drops after 30sec and then ramps up again.

It retries repeatedly without moving on to other documents - effectively getting stuck. It does not log an error in the Windows Event log or the sql log. It logs the following in the gatherer log:

09/03/2005 14:38:24 Add The gatherer has started
09/03/2005 14:38:26 Add The initialization has completed
09/03/2005 16:10:36 Add The gatherer has started
09/03/2005 16:10:40 Add The recovery has completed
09/03/2005 16:45:06 MSSQL75://SQLServer/76cba758/F87750AC4AACBF4BA9F2816993FBE5EA Add Error fetching URL, (80041201 - The object was not found. )

However this is only logged after the pdf files are deleted from the document library.  Nothing is logged before.

Other documents in the database get indexed propoperly (if they were indexed before these pdfs) and the full-text catalogs are searchable. If the 2 large pdfs are removed then the indexing completes successfully. Other pdfs in the database are indexable and searchable.

I am using SQL 2000, SP3, Adobe IFilter 6.0, Windows 2003. The database is a Windows Sharepoint Services content database.

Any ideas?
Question by:noelkennedy

Author Comment

ID: 13555438
I have fixed this myself.  This fix is SharePoint specific so a lot of what follows will be irrelevant to general full-text users (but there is a lot of general stuff so if you don’t care about SharePoint skip the 2nd paragraph).  That said I don’t even know if my problem exists outside of SharePoint.

On further investigation I discovered the problem does and doesn’t exist in SharePoint Portal (SPS)!  Essentially I discovered that when a small web farm is created (1 backend SQL and 1 front end web server which is the search and index server) the front end index server will fail to index the document.  It says it has indexed it partially (but I couldn’t get it to show up in any searches).  More importantly the indexing process actually finished with errors.  This is significant because in SharePoint Services (WSS) , the indexing process never finishes – it repeatedly tries again.  The reason why I say it doesn’t work as well is because I discovered full text indexing in is turned on in the SPS site database – and this fails in the same way as WSS.  Essentially what is happening is that the document is being indexed in 2 places – the front end index/search server AND the SQL backend database!  If you open WSS central admin in the farm and turn off searching at the WSS level, the full text catalogues are deleted in SQL for the SPS Site database.  You can still search WSS sites from Portal but not from within WSS.  This means that documents that are stored in the Portal areas are indexed through full-text indexing in the backend SQL database as well as in the index catalogues on the front end web servers.

It is going to be difficult to prevent this problem from occurring or automatically detecting when it has occurred.  To prevent it from happening possible ways are to limit the size of files that users can upload or don’t index pdfs at all.  To spot when it is happening ‘in the wild’ you can monitor CPU usage of the msdmn.exe process.  This is the process that performs filtering through the IFilters.  If this is ramped up all the time or repeatedly ramping up and down then it’s likely you have hit this problem.  Another way is to check the full-text catalogs status in Enterprise Manager or Query Analyzer.  If it is ‘notifications processing’ or ‘change tracking’ for a significant length of time then it is likely you have hit this problem.  Another way to check is to look in the temp directory used by the indexer – usually:

C:\Program Files\Microsoft SQL Server\MSSQL\FTDATA

If this directory has large PDF files with recent creation dates (last couple of minutes) then you are likely to be experiencing the problem.  Another way of checking is to use PerfMon:
1.      Select the Performance Object – Microsoft Gather Projects.
2.      Select the Retries counter
3.      Select all the instances (if you have more than one) – theses instances can be matched back to SQL databases – the number at the end ie SQLServ~1c SQL00009~1c can be matched to database_ID 00009 by using Query Analyzer (SELECT DB_ID() tells you the id for the database).
4.      These counters should probably be at 0.  If they are incrementing at the rate of 1 or 2 per minute – you are probably experiencing the problem.

Obviously none of these are satisfactory!

Resolution to large PDF problem:
1.      Open WSS Central Administration
2.      Under Component Configuration click Configure Data Retrieval service Settings
3.      Under Data Source Time Out set the Request time-out to a number larger than 30 (ie 120)
4.      Sometimes this fixes the problem straight away.  Sometimes you have to rebuild the catalog by going into WSS central admin, clicking configure full text search, then clicking OK.

I was unable to locate the registry setting that is changed (it might not be a registry setting therefore as I used software to compare the registry before and after the change on both the front-end and back-end server) so general Full-text users are on their own from now (but as I said earlier I don’t even know if my problem exists outside of SharePoint)


Accepted Solution

ee_ai_construct earned 0 total points
ID: 13582602
Question answered by asker or dialog deemed valuable.
Closed, 500 points refunded.
ee_ai_construct (replacement part #xm34)
Community Support Admin

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article we will learn how to fix  “Cannot install SQL Server 2014 Service Pack 2: Unable to install windows installer msi file” error ?
Microsoft Access has a limit of 255 columns in a single table; SQL Server allows tables with over 255 columns, but reading that data is not necessarily simple.  The final solution for this task involved creating a custom text parser and then reading…
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to setup several different housekeeping processes for a SQL Server.
Suggested Courses

564 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question