Extracting text from a MS Word (pre 2007 .doc ) in C#

I am implementing a Lucene search for documents on a site. I have used PDFBox for extracting text from Pdfs and I used an XML parser to extract text from MS Word 2007. However, I still cannot read the older .Doc versions. I have tried NPOI, POI.NET etc without much luck.

I can use File.OpentText(path) but, it also returns some cryptic markup that messes up my search results.

Does anyone have samples for POI or know how to read the .doc files (without needing Office Installation or Interop because MS doesnt recommend that)?
LVL 14
robastaAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Éric MoreauSenior .Net ConsultantCommented:
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
_Katka_Commented:
Hi, I guess you'll have to go for interop Office assemblies.

Here's a good tutorial on how to accomplish that:

http://eggheadcafe.com/tutorials/aspnet/b6f75379-840c-4745-a76c-04d43694333b/read-a-word-document-do.aspx

regards,
Kate
0
robastaAuthor Commented:
I did not want to use Interop because MS discourages it (http://support.microsoft.com/kb/257757) and licensing issues.

Aspose does the job but its not free.

Free Solutions:
1. use this MS dll to get properties -http://blogs.msdn.com/erikaehrli/archive/2005/11/30/dsofileproperties.aspx
2. use Ifilters (http://www.codeproject.com/KB/cs/IFilter.aspx)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Web Development

From novice to tech pro — start learning today.