Solved

Reading of  MS Word Document  through automation is Very Slow?

Posted on 2008-06-15
8
1,598 Views
Last Modified: 2013-11-20
I am trying to read the data present in MS word Document through Automation,But is very slow?How can i speed up the reading?I have attached the code ,in that OnGetTextFromWord() is a event handler that will be involked whenever the user press the GetText Button (in my App).  

 ***MsWord.GetLine(i); is the instruction that gets data from the word line By line and displays in RichEDitControl using AppendToLog(szFirstLine, RGB(0, 0, 0)); function.



CString CWordAutomation::GetLine(int nLine)

{

	CString szLine = _T("");

	if(NULL  == m_pdispWordApp)

		return szLine;

 

	VARIANTARG varg1, varg2;

	int wdGoToLine = 3;		//MsWord constant

	int wdGoToAbsolute = 1;	//MsWord constant

	int wdLine = 5;			//MsWord constant

	int wdExtend = 1;		//MsWord constant

	

	//Got to line

	ClearAllArgs();

	if (!WordInvoke(m_pdispWordApp, L"Selection", &varg1, DISPATCH_PROPERTYGET, 0))

		return szLine;

	ClearAllArgs();

	AddArgumentInt2(L"What", 0, wdGoToLine);

	AddArgumentInt2(L"Which", 0, wdGoToAbsolute);

	AddArgumentInt2(L"Count", 0, nLine);

	if (!WordInvoke(varg1.pdispVal, L"GoTo", NULL, DISPATCH_METHOD, 0))

		return szLine;

	

	//Selection.HomeKey Unit:=wdLine

	ClearAllArgs();

	AddArgumentInt2(L"Unit", 0, wdLine);

	if (!WordInvoke(varg1.pdispVal, L"HomeKey", NULL, DISPATCH_METHOD, 0))

		return szLine;

	//Selection.EndKey Unit:=wdLine, Extend:=wdExtend

	ClearAllArgs();

	AddArgumentInt2(L"Unit", 0, wdLine);

	AddArgumentInt2(L"Extend", 0, wdExtend);

	if (!WordInvoke(varg1.pdispVal, L"EndKey", &varg2, DISPATCH_METHOD, 0))

		return szLine;

	ClearAllArgs();

	if (!WordInvoke(varg1.pdispVal, L"Text", &varg2, DISPATCH_PROPERTYGET, 0))

		return szLine;

 

	//Get text from varg2

	VARTYPE Type = varg2.vt;

	switch (Type) 

		{

			case VT_UI1:

				{

					unsigned char nChr = varg2.bVal;

					szLine.Format("%c", nChr);

				}

				break;

			case VT_I4:

				{

					long nVal = varg2.lVal;

					szLine.Format("%i", nVal);

				}

				break;

			case VT_R4:

				{

					float fVal = varg2.fltVal;

					szLine.Format("%f", fVal);

				}

				break;

			case VT_R8:

				{

					double dVal = varg2.dblVal;

					szLine.Format("%f", dVal);

				}

				break;

			case VT_BSTR:

				{

					BSTR b = varg2.bstrVal;

					szLine = b;

				}

				break;

			case VT_BYREF|VT_UI1:

				{

					//Not tested

					unsigned char* pChr = varg2.pbVal;

					szLine.Format("%c", *pChr);

				}

				break;

			case VT_BYREF|VT_BSTR:

				{

					//Not tested

					BSTR* pb = varg2.pbstrVal;

					szLine = *pb;

				}

			case 0:

				{

					//Empty

					szLine = _T("");

				}

			}

 

	

	return szLine;

 

}

/****************************************************************************************/

 

void CMSWordDemoDlg::OnGetTextFromWord() 

{

	//Use Windows file dialog to obtain FileName 

	char szFilter[] =

      "Word Files (*.*)|*.doc|Text Files (*.txt)|*.txt|All Files (*.*)|*.*||";

 

	CFileDialog	DataRead(TRUE, // TRUE for FileOpen, FALSE for FileSaveAs

		NULL, NULL,

		OFN_PATHMUSTEXIST|OFN_OVERWRITEPROMPT,

		szFilter,

		NULL);

 

		int nFileRead = DataRead.DoModal();

		

		if(IDOK == nFileRead)

		{

			//Get file name for opening Excel file

			CString szFileName = DataRead.GetPathName();

			if(szFileName.IsEmpty())

				return;

			//Do not make Word visible

			CEzWordAutomation MsWord(FALSE);	

			MsWord.OpenWordFile(szFileName);

 

			int nLineCount = MsWord.GetLineCount();

			CString szFirstLine, szLastLine;

                  for(int i=1;i<=nLineCount;i++)

				 {

			      szFirstLine = MsWord.GetLine(i);

	                       AppendToLog(szFirstLine, RGB(0, 0, 0));

	                      AppendToLog("\n",RGB(0, 0x99, 0));

				 }

			MsWord.CloseDocument(FALSE);

			MsWord.ReleaseWord();

 

			CString szMessage;

			szMessage.Format("Found %i line(s) in this file. \n First and Last Lines in this file are: \n", nLineCount);

			szMessage = szMessage + szFirstLine+_T("\n ... \n") + szLastLine;

			MessageBox(szMessage);

		}

}

Open in new window

0
Comment
Question by:Rajeshm8484
8 Comments
 
LVL 11

Expert Comment

by:cup
ID: 21788719
Is it slow reading the document or starting word?  From past experience, it has always been slow starting word (15-20 seconds).  Once that has started, it just zips through.  Excel is the same - takes about 20s on a 3GHz machine running XP.

Also is AppendToLog in memory?  i.e. is the log stored in memory.  Memory reallocation is pretty expensive if you're reallocating large chunks in small steps.
0
 

Author Comment

by:Rajeshm8484
ID: 21790844
Reading the Document from my application is slow....

Also AppendToLog is a function that will display the content of Document  in RichEDitControl...
0
 
LVL 4

Accepted Solution

by:
chip3d earned 500 total points
ID: 21791717
Hi Rajeshm8484,

to use your GetLine function to read a word document line by line could be quite time consuming. You are using goto with wdGoToAbsolute and the current line to read. Every time you do that, word starts moving the range form the beginning of the document to the line you have specified. If you do this one time even for a line that is at the end of a big document, it is fast enough for most circumstances. But doing this for every line in a big document can be very time consuming because you have a comlexity of O(n²) (n are the number of lines you read). Instead you could use goto with wdGoToRelative to read the document line by line, ending with a complexity that is linear instead of quadratic.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:Rajeshm8484
ID: 21793427
Hi chip3d,

Thanks for ur suggestion..Can u modify my Getline function using wdGoToRelative and attach the code snippet?Because i am new to Automation.
0
 

Author Closing Comment

by:Rajeshm8484
ID: 31467307
Thanks for ur solution...
0
 

Expert Comment

by:ILGDRM
ID: 22648632
Hi ,
      After opening the word document through the following code u can get the Text content of Word Document.

IDispatch* pDispRange = oDocument.GetContent();
Range objRange(pDispRange);
AfxMessageBox(objRange.GetText());

How to get the entire word document content ( text, images and tables) in Byte Array.

0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Introduction: Dialogs (1) modal - maintaining the database. Continuing from the ninth article about sudoku.   You might have heard of modal and modeless dialogs.  Here with this Sudoku application will we use one of each type: a modal dialog …
If you use Adobe Reader X it is possible you can't open OLE PDF documents in the standard. The reason is the 'save box mode' in adobe reader X. Many people think the protected Mode of adobe reader x is only to stop the write access. But this fe…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now