Solved

SharePoint 2010 search not indexing entire document

Posted on 2014-11-20
9
570 Views
Last Modified: 2014-11-24
In our SharePoint 2010 farm, we have a lot of excel files with data related to commissions.  This involves a lot of numbers, multiple pages within each file, and lookup fields between the tabs.  For the most part, search does work properly, but there are a few cases when it appears that the crawl is not indexing the entire file.  One file in particular, has 3 sets of numbers in the J column, and each of those numbers are in sequential rows, 1, 2 and 3.  They are all 8 numbers long, and each of them have the same formatting in the excel cells, General.  When searching, it can find the numbers in row 1 and 2, but not 3.  I do not get search results for the 3rd number.  This particular number is used in other documents within SharePoint and it finds those ok.  

What I have done so far is I ran through this document related to increasing the chunk buffer size on the server, it did not fix the issue.  http://support.microsoft.com/kb/970776

I also tried deleting the search index and re-running a full crawl, no effect.  I can confirm that is it successfully crawling the document as if I search for other numbers in the file, results come in, so obviously its being crawled.  I also tried to setup a separate Search Page on the same site collection and see if any different results come through, but it did not.

The latest full crawl resulted in 185,317 successes, 2 warnings, and 0 errors.  The SharePoint Farm consists of 1 Web Front End server, which also runs search, and a separate server for SQL.

Any advise or other suggestions would be appreciated.  This is a production environment, but I can make changes/try things after hours.


Edit:  Suggested from a Technet forum response was to look at this article, http://technet.microsoft.com/en-us/library/cc262787%28v=office.15%29.aspx, which refers to "item size limits".  This part was quoted.  

"The item size limits safeguard crawling performance and the size of the index. Here are some examples of how the limits can affect searching:

If you can't get results when you search for an item, the item could be too large. A warning will show up in the Crawl Log, stating that the file exceeded the maximum size that the crawler can download.

If you search for text in an item and only get results from the first part of the text, the content processing component may have truncated the item because it exceeded some of item size limits. When the content processing component truncates an item, it indicates this by setting the managed property IsPartiallyProcessed to True. A warning will also show up in the Crawl Log, stating why the item was truncated.

If you tune item size limits, we recommend that you work with them in the order they appear in this table."


To which my response was:

"In that article after the part you quoted, it talks about "Document size crawl component can download" and that the Maximum value for Excel documents is 3 MB.  The file in question is only 387 KB in size, so I don't see how it could be reaching this threshold.  It also specifically says in the notes, "You can change the limit for the maximum document size, but it does not affect Excel documents.""
0
Comment
Question by:titannj
  • 5
  • 4
9 Comments
 
LVL 15

Expert Comment

by:colly92002
Comment Utility
Check your patch level and ensure your farm is fully patched (and has all WSUS updates applied across all servers in the farm).
http://blogs.technet.com/b/sharepointjoe/archive/2011/02/01/sp2010-sharepoint-2010-build-level-and-version-numbers.aspx
0
 

Author Comment

by:titannj
Comment Utility
My farm is currently running on the Dec 2013 CU, and there is only 1 SharePoint Server.  I applied both the Server and Foundation Updates.

Do you know if any of the CU's after Dec 2013 specifically address this kind of issue?  Also, I usually run through and apply windows updates once a month at the beginning of the month.  Again, do you know of a specific update that may address this?
0
 
LVL 15

Expert Comment

by:colly92002
Comment Utility
I'm afraid I don't.  
However, whenever I've experienced peculiar behaviour with SP I've found it waiting on a WSUS.  This is using a full farm env rather than a single server.
It's also a good idea to keep SharePoint patched.

You may also find using this tool helps you analyse your crawl rules in case it is configured incorrectly somehow:
http://blogs.technet.com/b/speschka/archive/2010/08/15/free-developer-search-tool-for-sharepoint-2010-search-and-fast-search-for-sharepoint.aspx
0
 
LVL 15

Expert Comment

by:colly92002
Comment Utility
It just occured to me that it might be something as simple as the type of field in Excel not being one of the crawled property type defined in your search management.

Easiest way to check is reformat the 3rd column to use the same type as teh first (if this is practical), resave it to SP, wait for the index to update and try it.

If this isthe case you can possibly add the type to your managed columns in Central Admin->Search Service Application: Crawled Properties  rather than updating files.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 

Author Comment

by:titannj
Comment Utility
Is it really possible to have a different property type within the same document?  I guess I never knew that.

I did what you suggested, I opened up the file, used format painter to copy the formatting from row 1 to row 3, ran an incremental crawl, using the crawl log confirmed that it was indeed crawled, and tried to search for the number.  Unfortunately it did not change its behavior.

I can certainly plan on upgrading the farm to the latest CU, but I really don't think that is going to fix the issue, honestly.
0
 
LVL 15

Accepted Solution

by:
colly92002 earned 500 total points
Comment Utility
As already mentioned, have a look in
Central Admin->Search Service Application: Crawled Properties

the "Office:..." types are types it "inherits" from Office.  If it decides that the data in the .xls is not in this list of types, it will not index it.  I'm guessing this is the problem.  What you just tried should work, but this stuff is pretty complicated so its worth pursuing.

Can you copy the contents of one of the bad .xls files into a plane, unformatted .xls workbook (single workbook), try putting that in SP and see if you can find it.

A further suggestion - do you have any dev/test instances you could use?  If not, I would recommend you get one and try it out on it.  You can download VMs from MS provided you have the correct licences if you don't fancy configuring one yourself.
As an aside, I find it amazing how often applying patches/WSUS fixes odd things like this.
0
 

Author Comment

by:titannj
Comment Utility
It could very well be from something weird in the excel document, I know it was created from a template.

I did attempt to do the 2nd thing you mentioned, copy the contents of the problem file into an unformat xls, but I ran into a problem.  If you have a word document with bad formatting, you can copy Paste Special and do Text Only to remove all formatting.  I'm running Office 2013, and I couldn't find a similar thing in Excel, not sure if its a 2013 thing or not.  I couldn't find a way to copy and paste with zero formatting carried over, but I'll dig into that a little deeper.

I wish I had a test farm, I've been asking for one, but can't seem to get it approved.

If I can't get anything to work, I'll try upgrading the farm to the latest CU and windows patches.  Thanks for the suggestions.
0
 

Author Closing Comment

by:titannj
Comment Utility
Thanks for the suggestions, will take time for me to try to figure it out and try everything.  Nothing with SharePoint is ever simple, but you hopefully pointed me in the right direction.  Appreciate it.
0
 
LVL 15

Expert Comment

by:colly92002
Comment Utility
Best of luck :)

The more I think about the format of the xlsl file, the more I think it must be that, so worth pursuing I think.  A CU could definately help if you are using 2013, since that has probably been patched since the last time you applied a CU.
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Online collaboration is quickly becoming embedded in the workplace, and its benefits are tangible. See what the current landscape looks like and what the future holds for collaboration tools and the future of work.
I thought I'd write this up for anyone who has a request to create an anonymous whistle-blower-type submission form created using SharePoint 2010 (this would probably work the same for 2013). It's not 100% fool-proof but it's as close as you can get…
Viewers will learn how to find and create templates in Excel 2013.
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now