Use C# to extract text from word document without Interop.

Posted on 2007-08-06
Last Modified: 2013-12-17

Does anyone have any experience of opening word documents and saving them as text files? Or just extracting the words (not the formatting etc).

I don't want to install word on my server and use the interop libraries. Is there another way to do this?  I dont mind paying for some kind of dll if its not too expensive (less than $99ish!)

Many thanks in advance...
Question by:benwilliamson
    LVL 29

    Accepted Solution

    i got this out from another forum for a small word doc it's working but dont know how it will fair against word with heavy formatting

    StreamReader reader = null;
    StreamWriter writer = null;

    SortedList table = new SortedList();
    //Hashtable table = new Hashtable();

    string logFile = "logfile.txt";

    //iterate one word at a time. Each word/count gets updated for each instance that gets encountered.

    reader = new StreamReader(textBox1.Text);//opens the file

    writer = new StreamWriter(logFile, false);

    int h=0;

    for (string line = reader.ReadLine(); line != null; line = reader.ReadLine())
    string[] words = GetWords (line);

    foreach (string word in words)
    string iword = word.ToLower();

    if (table.ContainsKey (iword))
    table[iword] = table[iword] + "," + "'" + h + "'";
    table[iword] = "'" + h + "'";

    foreach (DictionaryEntry entry in table)
    writer.WriteLine ("{0} ({1})", entry.Key, entry.Value);

    catch (Exception c)

    if (reader != null)
    if (writer != null)


    static string[] GetWords(string line)
    ArrayList al = new ArrayList(); //for intermediate results

    int i = 0;
    string word;
    char[] characters = line.ToCharArray();

    while ((word = GetNextWord(line, characters, ref i)) != null)

    string[]words = new string[al.Count];
    return words;

    static string GetNextWord (string line, char[] characters, ref int i)

    while (i < characters.Length && !Char.IsLetterOrDigit (characters[i]))

    if (i == characters.Length)
    return null;

    int start = i;

    //find the end of the word
    while (i< characters.Length && Char.IsLetterOrDigit (characters[i]))

    //return the word
    return line.Substring (start, i - start);
    LVL 8

    Assisted Solution

    This product should do the job but it's 399 euro.


    Featured Post

    Looking for New Ways to Advertise?

    Engage with tech pros in our community with native advertising, as a Vendor Expert, and more.

    Join & Write a Comment

    Update (December 2011): Since this article was published, the things have changed for good for Android native developers. The Sequoyah Project ( automates most of the tasks discussed in this article. You can even fin…
    This article describes relatively difficult and non-obvious issues that are likely to arise when creating COM class in Visual Studio and deploying it by professional MSI-authoring tools. It is assumed that the reader is already familiar with the cla…
    The viewer will learn how to use NetBeans IDE 8.0 for Windows to connect to a MySQL database. Open Services Panel: Create a new connection using New Connection Wizard: Create a test database called eetutorial: Create a new test tabel called ee…
    The viewer will learn how to use and create new code templates in NetBeans IDE 8.0 for Windows.

    755 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now