Reading of  MS Word Document  through automation is Very Slow?

Posted on 2008-06-15
Last Modified: 2013-11-20
I am trying to read the data present in MS word Document through Automation,But is very slow?How can i speed up the reading?I have attached the code ,in that OnGetTextFromWord() is a event handler that will be involked whenever the user press the GetText Button (in my App).  

 ***MsWord.GetLine(i); is the instruction that gets data from the word line By line and displays in RichEDitControl using AppendToLog(szFirstLine, RGB(0, 0, 0)); function.

CString CWordAutomation::GetLine(int nLine)
	CString szLine = _T("");
	if(NULL  == m_pdispWordApp)
		return szLine;
	VARIANTARG varg1, varg2;
	int wdGoToLine = 3;		//MsWord constant
	int wdGoToAbsolute = 1;	//MsWord constant
	int wdLine = 5;			//MsWord constant
	int wdExtend = 1;		//MsWord constant
	//Got to line
	if (!WordInvoke(m_pdispWordApp, L"Selection", &varg1, DISPATCH_PROPERTYGET, 0))
		return szLine;
	AddArgumentInt2(L"What", 0, wdGoToLine);
	AddArgumentInt2(L"Which", 0, wdGoToAbsolute);
	AddArgumentInt2(L"Count", 0, nLine);
	if (!WordInvoke(varg1.pdispVal, L"GoTo", NULL, DISPATCH_METHOD, 0))
		return szLine;
	//Selection.HomeKey Unit:=wdLine
	AddArgumentInt2(L"Unit", 0, wdLine);
	if (!WordInvoke(varg1.pdispVal, L"HomeKey", NULL, DISPATCH_METHOD, 0))
		return szLine;
	//Selection.EndKey Unit:=wdLine, Extend:=wdExtend
	AddArgumentInt2(L"Unit", 0, wdLine);
	AddArgumentInt2(L"Extend", 0, wdExtend);
	if (!WordInvoke(varg1.pdispVal, L"EndKey", &varg2, DISPATCH_METHOD, 0))
		return szLine;
	if (!WordInvoke(varg1.pdispVal, L"Text", &varg2, DISPATCH_PROPERTYGET, 0))
		return szLine;
	//Get text from varg2
	VARTYPE Type = varg2.vt;
	switch (Type) 
			case VT_UI1:
					unsigned char nChr = varg2.bVal;
					szLine.Format("%c", nChr);
			case VT_I4:
					long nVal = varg2.lVal;
					szLine.Format("%i", nVal);
			case VT_R4:
					float fVal = varg2.fltVal;
					szLine.Format("%f", fVal);
			case VT_R8:
					double dVal = varg2.dblVal;
					szLine.Format("%f", dVal);
			case VT_BSTR:
					BSTR b = varg2.bstrVal;
					szLine = b;
			case VT_BYREF|VT_UI1:
					//Not tested
					unsigned char* pChr = varg2.pbVal;
					szLine.Format("%c", *pChr);
					//Not tested
					BSTR* pb = varg2.pbstrVal;
					szLine = *pb;
			case 0:
					szLine = _T("");
	return szLine;
void CMSWordDemoDlg::OnGetTextFromWord() 
	//Use Windows file dialog to obtain FileName 
	char szFilter[] =
      "Word Files (*.*)|*.doc|Text Files (*.txt)|*.txt|All Files (*.*)|*.*||";
	CFileDialog	DataRead(TRUE, // TRUE for FileOpen, FALSE for FileSaveAs
		int nFileRead = DataRead.DoModal();
		if(IDOK == nFileRead)
			//Get file name for opening Excel file
			CString szFileName = DataRead.GetPathName();
			//Do not make Word visible
			CEzWordAutomation MsWord(FALSE);	
			int nLineCount = MsWord.GetLineCount();
			CString szFirstLine, szLastLine;
                  for(int i=1;i<=nLineCount;i++)
			      szFirstLine = MsWord.GetLine(i);
	                       AppendToLog(szFirstLine, RGB(0, 0, 0));
	                      AppendToLog("\n",RGB(0, 0x99, 0));
			CString szMessage;
			szMessage.Format("Found %i line(s) in this file. \n First and Last Lines in this file are: \n", nLineCount);
			szMessage = szMessage + szFirstLine+_T("\n ... \n") + szLastLine;

Open in new window

Question by:Rajeshm8484
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 11

Expert Comment

ID: 21788719
Is it slow reading the document or starting word?  From past experience, it has always been slow starting word (15-20 seconds).  Once that has started, it just zips through.  Excel is the same - takes about 20s on a 3GHz machine running XP.

Also is AppendToLog in memory?  i.e. is the log stored in memory.  Memory reallocation is pretty expensive if you're reallocating large chunks in small steps.

Author Comment

ID: 21790844
Reading the Document from my application is slow....

Also AppendToLog is a function that will display the content of Document  in RichEDitControl...

Accepted Solution

chip3d earned 500 total points
ID: 21791717
Hi Rajeshm8484,

to use your GetLine function to read a word document line by line could be quite time consuming. You are using goto with wdGoToAbsolute and the current line to read. Every time you do that, word starts moving the range form the beginning of the document to the line you have specified. If you do this one time even for a line that is at the end of a big document, it is fast enough for most circumstances. But doing this for every line in a big document can be very time consuming because you have a comlexity of O(n²) (n are the number of lines you read). Instead you could use goto with wdGoToRelative to read the document line by line, ending with a complexity that is linear instead of quadratic.
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Author Comment

ID: 21793427
Hi chip3d,

Thanks for ur suggestion..Can u modify my Getline function using wdGoToRelative and attach the code snippet?Because i am new to Automation.

Author Closing Comment

ID: 31467307
Thanks for ur solution...

Expert Comment

ID: 22648632
Hi ,
      After opening the word document through the following code u can get the Text content of Word Document.

IDispatch* pDispRange = oDocument.GetContent();
Range objRange(pDispRange);

How to get the entire word document content ( text, images and tables) in Byte Array.


Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
wordcount challenge 11 147
trigs fail! I thought I knew how to do trignometry 3 83
Capture logon name 13 105
Increment column based of a FK 8 51
Introduction: Dialogs (1) modal - maintaining the database. Continuing from the ninth article about sudoku.   You might have heard of modal and modeless dialogs.  Here with this Sudoku application will we use one of each type: a modal dialog …
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question