[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Use C# to extract text from word document without Interop.

Posted on 2007-08-06
4
Medium Priority
?
7,012 Views
Last Modified: 2013-12-17
Hi

Does anyone have any experience of opening word documents and saving them as text files? Or just extracting the words (not the formatting etc).

I don't want to install word on my server and use the interop libraries. Is there another way to do this?  I dont mind paying for some kind of dll if its not too expensive (less than $99ish!)

Many thanks in advance...
Ben
0
Comment
Question by:benwilliamson
2 Comments
 
LVL 29

Accepted Solution

by:
Gautham Janardhan earned 1000 total points
ID: 19637807
i got this out from another forum for a small word doc it's working but dont know how it will fair against word with heavy formatting

StreamReader reader = null;
StreamWriter writer = null;

SortedList table = new SortedList();
//Hashtable table = new Hashtable();

string logFile = "logfile.txt";

try
{
//iterate one word at a time. Each word/count gets updated for each instance that gets encountered.

reader = new StreamReader(textBox1.Text);//opens the file

writer = new StreamWriter(logFile, false);

int h=0;

for (string line = reader.ReadLine(); line != null; line = reader.ReadLine())
{
string[] words = GetWords (line);

foreach (string word in words)
{
string iword = word.ToLower();

h++;
if (table.ContainsKey (iword))
{
table[iword] = table[iword] + "," + "'" + h + "'";
}
else
{
table[iword] = "'" + h + "'";
}
}
}


foreach (DictionaryEntry entry in table)
{
writer.WriteLine ("{0} ({1})", entry.Key, entry.Value);
}


catch (Exception c)
{
writer.WriteLine(c.Message);
}

finally
{
if (reader != null)
reader.Close();
if (writer != null)
writer.Close();

}

static string[] GetWords(string line)
{
ArrayList al = new ArrayList(); //for intermediate results

int i = 0;
string word;
char[] characters = line.ToCharArray();

while ((word = GetNextWord(line, characters, ref i)) != null)
al.Add(word);

string[]words = new string[al.Count];
al.CopyTo(words);
return words;
}


static string GetNextWord (string line, char[] characters, ref int i)
{

while (i < characters.Length && !Char.IsLetterOrDigit (characters[i]))
i++;

if (i == characters.Length)
return null;

int start = i;

//find the end of the word
while (i< characters.Length && Char.IsLetterOrDigit (characters[i]))
i++;

//return the word
return line.Substring (start, i - start);
}
0
 
LVL 8

Assisted Solution

by:sjturner2
sjturner2 earned 1000 total points
ID: 19803416
This product should do the job but it's 399 euro.

http://www.independentsoft.de/word/index.html

0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many of us here at EE write code. Many of us write exceptional code; just as many of us write exception-prone code. As we all should know, exceptions are a mechanism for handling errors which are typically out of our control. From database errors, t…
In real business world data are crucial and sometimes data are shared among different information systems. Hence, an agreeable file transfer protocol need to be established.
THe viewer will learn how to use NetBeans IDE 8.0 for Windows to perform CRUD operations on a MySql database.
The viewer will learn how to use and create keystrokes in Netbeans IDE 8.0 for Windows.
Suggested Courses

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question