I am implementing a Lucene search for documents on a site. I have used PDFBox for extracting text from Pdfs and I used an XML parser to extract text from MS Word 2007. However, I still cannot read the older .Doc versions. I have tried NPOI, POI.NET etc without much luck.
I can use File.OpentText(path) but, it also returns some cryptic markup that messes up my search results.
Does anyone have samples for POI or know how to read the .doc files (without needing Office Installation or Interop because MS doesnt recommend that)?