We help IT Professionals succeed at work.

Convert word document to text using c# running ASP.net

I have a web application built using c#, asp.net.  I need to convert an uploaded word document to a text string.  I have looked at microsoft.office.interop.word (which works locally), but when I run on a server (where word is not installed) it doesn't work because the DLL is not registered.  It there another method of accomplishing this task?  OpenXML?

Here is code I have worked with thus far:

        public string convertWordToText(string fileName)
                object missing = Type.Missing;
                object readOnly = true;

                Microsoft.Office.Interop.Word.Application application = new Microsoft.Office.Interop.Word.Application();
                Microsoft.Office.Interop.Word.Document document = application.Documents.Open(fileName, ref missing, ref readOnly, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing);

                string text = document.Content.Text;
                ((Microsoft.Office.Interop.Word._Application)application).Quit(); //cast as _Application because there's ambiguity
                object saveChanges = Microsoft.Office.Interop.Word.WdSaveOptions.wdDoNotSaveChanges;
                ((Microsoft.Office.Interop.Word._Application)application).Quit(ref saveChanges, ref missing, ref missing); //cast as _Application because there's ambiguity
                return text;
            catch (Exception ex)
                Log.error("Error converting document to text.");
                return string.Format("An error occured when converting this document to text.  The error returned is '{0}'.  \n\nYou can try copy and paste instead.", ex.Message);
Watch Question

Definitely want to use OpenXML. Using microsoft.office.interop.word on the server is unsupported and may violate your licensing. Microsoft recommends using OpenXML via System.IO.Package.IO. See http://support.microsoft.com/en-us/kb/257757 for more details, specifically the section entitled "Alternatives to server-side Automation"
please downlaod

office runtime from following location and install it on your server.

More on OpenXML, looks like Microsoft has an SDK for OpenXML built on top of System.IO.Packaging at http://www.microsoft.com/en-us/search/Results.aspx?q=Open%20xml%20sdk&form=DLC (There are a few different versions and samples)

Then you should be able to us

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
// Open a WordprocessingDocument for editing using the filepath.
WordprocessingDocument wordprocessingDocument = 
    WordprocessingDocument.Open(filepath, true);
// Assign a reference to the existing document body.
Body body = wordprocessingDocument.MainDocumentPart.Document.Body;

Open in new window

Code taken from https://msdn.microsoft.com/en-us/library/office/ff478255.aspx?cs-save-lang=1&cs-lang=csharp#code-snippet-1
Éric MoreauSenior .Net Consultant
Top Expert 2016

There are also some 3rd party that can help you. The one I have used is http://www.aspose.com/.net/word-component.aspx