In our SharePoint 2010 farm, we have a lot of excel files with data related to commissions. This involves a lot of numbers, multiple pages within each file, and lookup fields between the tabs. For the most part, search does work properly, but there are a few cases when it appears that the crawl is not indexing the entire file. One file in particular, has 3 sets of numbers in the J column, and each of those numbers are in sequential rows, 1, 2 and 3. They are all 8 numbers long, and each of them have the same formatting in the excel cells, General. When searching, it can find the numbers in row 1 and 2, but not 3. I do not get search results for the 3rd number. This particular number is used in other documents within SharePoint and it finds those ok.
What I have done so far is I ran through this document related to increasing the chunk buffer size on the server, it did not fix the issue. http://support.microsoft.com/kb/970776
I also tried deleting the search index and re-running a full crawl, no effect. I can confirm that is it successfully crawling the document as if I search for other numbers in the file, results come in, so obviously its being crawled. I also tried to setup a separate Search Page on the same site collection and see if any different results come through, but it did not.
The latest full crawl resulted in 185,317 successes, 2 warnings, and 0 errors. The SharePoint Farm consists of 1 Web Front End server, which also runs search, and a separate server for SQL.
Any advise or other suggestions would be appreciated. This is a production environment, but I can make changes/try things after hours.
: Suggested from a Technet forum response was to look at this article, http://technet.microsoft.com/en-us/library/cc262787%28v=office.15%29.aspx
, which refers to "item size limits". This part was quoted.
"The item size limits safeguard crawling performance and the size of the index. Here are some examples of how the limits can affect searching:
If you can't get results when you search for an item, the item could be too large. A warning will show up in the Crawl Log, stating that the file exceeded the maximum size that the crawler can download.
If you search for text in an item and only get results from the first part of the text, the content processing component may have truncated the item because it exceeded some of item size limits. When the content processing component truncates an item, it indicates this by setting the managed property IsPartiallyProcessed to True. A warning will also show up in the Crawl Log, stating why the item was truncated.
If you tune item size limits, we recommend that you work with them in the order they appear in this table."
To which my response was:
"In that article after the part you quoted, it talks about "Document size crawl component can download" and that the Maximum value for Excel documents is 3 MB. The file in question is only 387 KB in size, so I don't see how it could be reaching this threshold. It also specifically says in the notes, "You can change the limit for the maximum document size, but it does not affect Excel documents.""