Link to home
Start Free TrialLog in
Avatar of jjtimken
jjtimken

asked on

Indexing Service crawling slowly through PDFs

I have a problem with Indexing Service under Windows 2003 Standard working very, very slowly. Queries of the catalog are made with a browser interface using ASP scripted pages.

The files being indexed are all Acrobat (PDF) files, and the Indexing Service has PDF iFilter 6.0 installed. For quite some time, the system worked fine. There were separate catalogs for each of a growing number of directories (each of which holds between 2,500 and 15,000 or so documents, in various subdirectories). Each of the main directories is usually "completed" as far as new documents go within a fixed period of time, then it remains static, so I didn't notice a problem with speed until recently, with the most recently created catalog. Normally, the Indexing Service could read and process a few hundred documents in a minute or less.

I had posted a handful of new documents to the most recent catalog and conducted a search later that day, and the document for which I was searching and which I knew contained the matching term did not list. I checked Indexing Service and it showed a few thousand files left to index; watching it off and on for half an hour, it only processed about ten of those.

Research showed that having multiple catalogs can slow down the service, especially on older versions (but not Server 2003); nonetheless, I redesigned the indexing to create one catalog for the parent directory which holds the document directories (each of which previously had its own catalog). I stopped Indexing Service, deleted the individual catalogs, restarted the server, and made sure Indexing Service restarted. It did, but it's just as slow slogging through the files to index.

On a whim, I started over, deleting the catalog again, and tried creating a catalog on just one of the oldest subdirectories, one which I knew had processed easily originally and which, with about 10,000 documents, should have taken no more than an hour to catalog. It's been over an hour and it's processed only about 125 of those documents. So it's not a question of one document choking it.

I've even tried adjusting the "tuning" settings to "Instant  Indexing" and "Low Load" Querying - none of which seems to be affecting the speed.

Can anyone suggest a process, etc. that might be causing the Indexing Service to slow down so dramatically? No other services on the server seem to be a problem.
ASKER CERTIFIED SOLUTION
Avatar of gheist
gheist
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jjtimken
jjtimken

ASKER

I'm not opposed to upgrading to a different ifilter. However, I don't think that would explain why the service has suddenly gotten so slow when it was speedy before - and if it's the service that's slowed, then a faster filter may not be of much help.
have you got any disk benchmark to repeat to compare?
Last night I decided to test whether the problem was with the indexing service itsellf or just with PDF-filtered documents (since that's all we had in the index). I stripped about 1,500 PDF's down to plain text and created a new index just for them. Indexing Service processed them in a matter of seconds so I figured the filter may well have been at fault after all.

I removed Adobe's PDF iFilter and installed a trial version of the Foxit iFilter instead. The first 10,000+ directory I threw at it was indexed in a couple of minutes, so it appears gheist (above) indeed made a good recommendation. Many thanks!
remember they want to eat also, and they can not with plain generosity ;)
Oh, having tried the filter and seeing that it works, I fully intend to purchase the full version-I'm already putting in the budget adjustment request.
you have full version already.
some more to get metadata from less common formats: http://www.microsoft.com/downloads/en/results.aspx?pocId=&freetext=ifilter&DisplayLang=en