Solved

Reading of  MS Word Document  through automation is Very Slow?

Posted on 2008-06-15
8
1,608 Views
Last Modified: 2013-11-20
I am trying to read the data present in MS word Document through Automation,But is very slow?How can i speed up the reading?I have attached the code ,in that OnGetTextFromWord() is a event handler that will be involked whenever the user press the GetText Button (in my App).  

 ***MsWord.GetLine(i); is the instruction that gets data from the word line By line and displays in RichEDitControl using AppendToLog(szFirstLine, RGB(0, 0, 0)); function.



CString CWordAutomation::GetLine(int nLine)

{

	CString szLine = _T("");

	if(NULL  == m_pdispWordApp)

		return szLine;

 

	VARIANTARG varg1, varg2;

	int wdGoToLine = 3;		//MsWord constant

	int wdGoToAbsolute = 1;	//MsWord constant

	int wdLine = 5;			//MsWord constant

	int wdExtend = 1;		//MsWord constant

	

	//Got to line

	ClearAllArgs();

	if (!WordInvoke(m_pdispWordApp, L"Selection", &varg1, DISPATCH_PROPERTYGET, 0))

		return szLine;

	ClearAllArgs();

	AddArgumentInt2(L"What", 0, wdGoToLine);

	AddArgumentInt2(L"Which", 0, wdGoToAbsolute);

	AddArgumentInt2(L"Count", 0, nLine);

	if (!WordInvoke(varg1.pdispVal, L"GoTo", NULL, DISPATCH_METHOD, 0))

		return szLine;

	

	//Selection.HomeKey Unit:=wdLine

	ClearAllArgs();

	AddArgumentInt2(L"Unit", 0, wdLine);

	if (!WordInvoke(varg1.pdispVal, L"HomeKey", NULL, DISPATCH_METHOD, 0))

		return szLine;

	//Selection.EndKey Unit:=wdLine, Extend:=wdExtend

	ClearAllArgs();

	AddArgumentInt2(L"Unit", 0, wdLine);

	AddArgumentInt2(L"Extend", 0, wdExtend);

	if (!WordInvoke(varg1.pdispVal, L"EndKey", &varg2, DISPATCH_METHOD, 0))

		return szLine;

	ClearAllArgs();

	if (!WordInvoke(varg1.pdispVal, L"Text", &varg2, DISPATCH_PROPERTYGET, 0))

		return szLine;

 

	//Get text from varg2

	VARTYPE Type = varg2.vt;

	switch (Type) 

		{

			case VT_UI1:

				{

					unsigned char nChr = varg2.bVal;

					szLine.Format("%c", nChr);

				}

				break;

			case VT_I4:

				{

					long nVal = varg2.lVal;

					szLine.Format("%i", nVal);

				}

				break;

			case VT_R4:

				{

					float fVal = varg2.fltVal;

					szLine.Format("%f", fVal);

				}

				break;

			case VT_R8:

				{

					double dVal = varg2.dblVal;

					szLine.Format("%f", dVal);

				}

				break;

			case VT_BSTR:

				{

					BSTR b = varg2.bstrVal;

					szLine = b;

				}

				break;

			case VT_BYREF|VT_UI1:

				{

					//Not tested

					unsigned char* pChr = varg2.pbVal;

					szLine.Format("%c", *pChr);

				}

				break;

			case VT_BYREF|VT_BSTR:

				{

					//Not tested

					BSTR* pb = varg2.pbstrVal;

					szLine = *pb;

				}

			case 0:

				{

					//Empty

					szLine = _T("");

				}

			}

 

	

	return szLine;

 

}

/****************************************************************************************/

 

void CMSWordDemoDlg::OnGetTextFromWord() 

{

	//Use Windows file dialog to obtain FileName 

	char szFilter[] =

      "Word Files (*.*)|*.doc|Text Files (*.txt)|*.txt|All Files (*.*)|*.*||";

 

	CFileDialog	DataRead(TRUE, // TRUE for FileOpen, FALSE for FileSaveAs

		NULL, NULL,

		OFN_PATHMUSTEXIST|OFN_OVERWRITEPROMPT,

		szFilter,

		NULL);

 

		int nFileRead = DataRead.DoModal();

		

		if(IDOK == nFileRead)

		{

			//Get file name for opening Excel file

			CString szFileName = DataRead.GetPathName();

			if(szFileName.IsEmpty())

				return;

			//Do not make Word visible

			CEzWordAutomation MsWord(FALSE);	

			MsWord.OpenWordFile(szFileName);

 

			int nLineCount = MsWord.GetLineCount();

			CString szFirstLine, szLastLine;

                  for(int i=1;i<=nLineCount;i++)

				 {

			      szFirstLine = MsWord.GetLine(i);

	                       AppendToLog(szFirstLine, RGB(0, 0, 0));

	                      AppendToLog("\n",RGB(0, 0x99, 0));

				 }

			MsWord.CloseDocument(FALSE);

			MsWord.ReleaseWord();

 

			CString szMessage;

			szMessage.Format("Found %i line(s) in this file. \n First and Last Lines in this file are: \n", nLineCount);

			szMessage = szMessage + szFirstLine+_T("\n ... \n") + szLastLine;

			MessageBox(szMessage);

		}

}

Open in new window

0
Comment
Question by:Rajeshm8484
8 Comments
 
LVL 11

Expert Comment

by:cup
ID: 21788719
Is it slow reading the document or starting word?  From past experience, it has always been slow starting word (15-20 seconds).  Once that has started, it just zips through.  Excel is the same - takes about 20s on a 3GHz machine running XP.

Also is AppendToLog in memory?  i.e. is the log stored in memory.  Memory reallocation is pretty expensive if you're reallocating large chunks in small steps.
0
 

Author Comment

by:Rajeshm8484
ID: 21790844
Reading the Document from my application is slow....

Also AppendToLog is a function that will display the content of Document  in RichEDitControl...
0
 
LVL 4

Accepted Solution

by:
chip3d earned 500 total points
ID: 21791717
Hi Rajeshm8484,

to use your GetLine function to read a word document line by line could be quite time consuming. You are using goto with wdGoToAbsolute and the current line to read. Every time you do that, word starts moving the range form the beginning of the document to the line you have specified. If you do this one time even for a line that is at the end of a big document, it is fast enough for most circumstances. But doing this for every line in a big document can be very time consuming because you have a comlexity of O(n²) (n are the number of lines you read). Instead you could use goto with wdGoToRelative to read the document line by line, ending with a complexity that is linear instead of quadratic.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:Rajeshm8484
ID: 21793427
Hi chip3d,

Thanks for ur suggestion..Can u modify my Getline function using wdGoToRelative and attach the code snippet?Because i am new to Automation.
0
 

Author Closing Comment

by:Rajeshm8484
ID: 31467307
Thanks for ur solution...
0
 

Expert Comment

by:ILGDRM
ID: 22648632
Hi ,
      After opening the word document through the following code u can get the Text content of Word Document.

IDispatch* pDispRange = oDocument.GetContent();
Range objRange(pDispRange);
AfxMessageBox(objRange.GetText());

How to get the entire word document content ( text, images and tables) in Byte Array.

0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
fizzArray2 challenge 1 72
C++ question 3 62
Making an alias 7 82
Issues with C++ Class 19 81
Article by: SunnyDark
This article's goal is to present you with an easy to use XML wrapper for C++ and also present some interesting techniques that you might use with MS C++. The reason I built this class is to ease the pain of using XML files with C++, since there is…
If you use Adobe Reader X it is possible you can't open OLE PDF documents in the standard. The reason is the 'save box mode' in adobe reader X. Many people think the protected Mode of adobe reader x is only to stop the write access. But this fe…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now