?
Solved

VB.Net - Speed up search for multiple strings in MS Word

Posted on 2011-10-22
19
Medium Priority
?
368 Views
Last Modified: 2012-05-12
Hey guys,

I have been working on an application that searches a document for 3 different strings (based off an excel list). Initially I was given two example that worked (thank you Graham http://rdsrc.us/byxEui); however, one was slower than the other due to the search range.

After recieving this solution I realised that what I asked for was flawed and what I needed was the range for the first string to be the first word in the paragraph and the other two strings to be the found in the paragraph that the first string was found.

Naturally I tried to modify the faster version of the script. Unfortunately I could only get the slower version working; when I say slow I mean 45mins to search 200 items. This is the code for what I require to be refined:
Sub TwoStrings(ByVal objSourceDoc As Word.Document, ByVal objDestDoc As Word.Document, _
                   ByVal intRow As Integer, ByVal objWorksheet As Excel.Worksheet)

        Dim strExcelString1 As String
        Dim strExcelString2 As String
        Dim strExcelString3 As String
        Dim para As Word.Paragraph

        strExcelString1 = objWorksheet.Cells(intRow, 1).Value
        strExcelString2 = objWorksheet.Cells(intRow, 2).Value
        strExcelString3 = objWorksheet.Cells(intRow, 3).Value

        For Each para In objSourceDoc.Paragraphs
            If para.Range.Words(1).Find.Execute(strExcelString1) Then
                If para.Range.Find.Execute(strExcelString2) Then
                    If para.Range.Find.Execute(strExcelString3) Then
                        para.Range.Copy()
                        objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
                        objWorksheet.Range("F" & intRow).Value = "Reference Found"
                        Exit Sub
                    End If
                End If
            End If
        Next para
    End Sub

Open in new window

The above code is essentially triggered by this:
intRow = 2
        Do Until objWorksheet.Cells(intRow, 1).Value = ""
            TwoStrings(objReferenceIndex, objFinalOutputDocument, intRow, objWorksheet)
            intRow = intRow + 1
            objFinalOutputDocument.Save()
            objWorkbook.Save()
        Loop

Open in new window

This is the example of the faster version that searches for the first string in the entire paragraph.
Sub TwoStringsFast(ByVal objSourceDoc As Word.Document, ByVal objDestDoc As Word.Document, _
                       ByVal strText1 As String, ByVal strText2 As String) 'Not in use...
        Dim para As Word.Paragraph
        Dim rng As Word.Range

        rng = objSourceDoc.Range
        Do While rng.Find.Execute(strText1)
            para = rng.Paragraphs.First
            If para.Range.Find.Execute(strText2) Then
                para.Range.Copy()
                objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
            End If
        Loop
    End Sub

Open in new window


Attached also is some sample test data including the excel list and the word document I'm searching.
 samplelist.xlsx
 input.docx
Hope someone can help.
Cheers
G_M

0
Comment
Question by:G_M
  • 12
  • 3
  • 2
17 Comments
 
LVL 40
ID: 37011107
If you want to speed up code in any .NET to Office application, move as much of the code as you want in a VBA macro in the Office document or template. Then call that macro from you .NET application by calling the Office Application object Run method. You then have only one call between .NET and the Office application, and that usually speeds things up a lot.

0
 

Author Comment

by:G_M
ID: 37011157
Wouldn't that method involve having to modify the "document to be searched" to a .docm and creating the macro within that?

I am trying to keep everything within the application itself so the user can select the documents themselves.
0
 
LVL 40
ID: 37011593
If you are working with a lone document, yes, that would mean adding code to the document.

But Word documents are built on templates (.dot or .dotx files). Unless told otherwise, they use a default template.

If you create your own template however, and put the code in the template, all documents built with that template will have access to the code. This is a standard practice when working with Word (or Excel, although the mechanism is a little different) when you have many documents that should share the same code, formatting, styles, you name it. This is what is used when you are in Work and go File...New and ask for a template. You can easily create your own. Coding in the template is even easier because a lot of the things you have to take care of in .NET, such as the Application and Document object are seen as default in VBA, and you do not have to think about them.

In such a situation, creating a template for your application is akind to having a dll or a configuration file. The template could be seen as being part of your application. And the gain in performance is usually quite important. You can easily expect 10x to 20x speed improvement.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 

Author Comment

by:G_M
ID: 37011696
So in practice if/when I deployed this application, if I go down this track, it would replace the existing default word template so my VBA code is embedded within it...?

I believe I saw a run command in VB.Net somewhere... I'm guessing this is how I would run said macro from my application.

It's an interesting concept, but dangerous for a novice. Although backing up the original template is a given. I may leave this one for the moment as a last resort. Thank you for the idea though.
0
 
LVL 76

Assisted Solution

by:GrahamSkan
GrahamSkan earned 2000 total points
ID: 37012077
You specify that string 1 must be the first word in the document, so I wrote this code

Do While rng.Find.Execute(strText2)
            para = rng.Paragraphs.First
            Debug.Print("Word1 " & para.Range.Words.First.Text)
            If StrComp(strText1, Trim(para.Range.Words.First.Text), CompareMethod.Text) Then

Open in new window


but several of the items in the first column of the sample comprise more than one word, so no matches are found
0
 

Author Comment

by:G_M
ID: 37012418
Sorry, I uploaded the wrong list... here is the proper sample list

 samplelist.xlsx
0
 

Author Comment

by:G_M
ID: 37012588
strComp returned an integer, so I changed the code a little to suit. It seems to worked well. I threw a message box in there to compare the matches and it all seems pretty good.  This is the code edit I did:

 
Do While rng.Find.Execute(strExcelString2)
            para = rng.Paragraphs.First
            Debug.Print(para.Range.Words.First.Text)
            MessageBox.Show(strExcelString1 & " " & Trim(para.Range.Words.First.Text))

            If StrComp(strExcelString1, Trim(para.Range.Words.First.Text), CompareMethod.Text) = 0 Then
                para.Range.Copy()
                objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
            End If
        Loop

Open in new window


Unfortunately though for this one I need to add a third string... Is there an easy way to encorporate that?
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 37012602
Sorry, I'm flagging a bit tonight. Perhaps I shouldn't have drunk the third bottle of wine (:>).

I'll try to pick this up in the morning. Prompt me with a spurious comment if I seem to have forgotten.
0
 

Author Comment

by:G_M
ID: 37013360
That's fine Graham... thank you for giving it a good crack anyway :o)

I'll be monitoring this through the day anyway, so whenever you get over that third bottle, I look forward to hearing from you.

Cheers
G_M
0
 

Author Comment

by:G_M
ID: 37013835
Ok I worked it out... really not that hard after all. Thanks for the leg up Graham. If there is something blaringly wrong with my code (apart from the unnecessary blnStringFound variable), please let me know. I’ll leave this one open for you reply and give you the points after that.

Thanks again Graham
Cheers
G_M

 
Sub TwoStringsFast(ByVal objSourceDoc As Word.Document, ByVal objDestDoc As Word.Document, _
                       ByVal intRow As Integer, ByVal objWorksheet As Excel.Worksheet)
        Dim para As Word.Paragraph
        Dim rng As Word.Range
        Dim strExcelString1 As String
        Dim strExcelString2 As String
        Dim strExcelString3 As String
        Dim blnStringFound As Boolean

        rng = objSourceDoc.Range
        strExcelString1 = objWorksheet.Cells(intRow, 1).Value
        strExcelString2 = objWorksheet.Cells(intRow, 2).Value
        strExcelString3 = objWorksheet.Cells(intRow, 3).Value

        Do While rng.Find.Execute(strExcelString3)
            para = rng.Paragraphs.First
            blnStringFound = para.Range.Find.Execute(strExcelString2)
            If blnStringFound = True Then
                Debug.Print(para.Range.Words.First.Text)
                If StrComp(strExcelString1, Trim(para.Range.Words.First.Text), CompareMethod.Text) = 0 Then
                    para.Range.Copy()
                    objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
                    Exit Sub
                End If
            End If
        Loop
    End Sub

Open in new window

0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 37013964
That looks OK to me.

For increased performance, because the paragraphs and the spreadsheet are both in alphabetical order,  I was working on method of stepping through each paragraph, adjusting the spreadsheet row as necessary. Let us know if your code is still not fast enough & I'll complete it.


0
 

Author Comment

by:G_M
ID: 37014413
It's fantastic Graham... It's gone from 45min to 3mins for 200 items. I can't complain. Thanks again for your help.

Cheers
G_M
0
 

Author Comment

by:G_M
ID: 37015628
I've requested that this question be closed as follows:

Accepted answer: 500 points for GrahamSkan's comment http:/Q_27410078.html#37012077
Assisted answer: 0 points for G_M's comment http:/Q_27410078.html#37013835

for the following reason:

Graham definately put me on the right track, so I happily award him the points
0
 

Accepted Solution

by:
G_M earned 0 total points
ID: 37014434
For anyone looking later, this is all the code from start to finish for this function.

Cheers
G_M

 
Public Sub SearchReferences()

        Dim objWorksheet As Excel.Worksheet
        Dim objReferenceIndex As Word.Document
        Dim objFinalOutputDocument As Word.Document
        Dim strReferenceIndex As String
        Dim misValue As Object = System.Reflection.Missing.Value
        Dim objParagraph As Word.Paragraph
        Dim objWordRange As Word.Range
        Dim intRow As Integer
	Dim objWord As Word.Application
	Dim objExcel As Excel.Application
	Dim objWorkbook As Excel.Workbook
	Dim strWorkingFile As String
	
	strWorkingFile = "C:\Working File.xlsx"
        strReferenceIndex = "C:\ReferenceIndex.docx"
        objWord = New Word.Application
	objExcel = New Excel.Application
	objWorkbook = objExcel.Workbooks.Open(strWorkingFile)
	objWorksheet = objWorkbook.Worksheets(1)
        objReferenceIndex = objWord.Documents.Open(strReferenceIndex)
        objFinalOutputDocument = objWord.Documents.Add(misValue)
        objFinalOutputDocument.SaveAs("C:\Output.docx")
        objWordRange = objReferenceIndex.Range
        objParagraph = objWordRange.Paragraphs.First

        intRow = 2
        Do Until objWorksheet.Cells(intRow, 1).Value = ""
            TwoStringsFast(objReferenceIndex, objFinalOutputDocument, intRow, objWorksheet)
            intRow = intRow + 1
            objFinalOutputDocument.Save()
        Loop

        objFinalOutputDocument.Save()
        objReferenceIndex.Close()
        objFinalOutputDocument.Close()
        
    End Sub

Sub TwoStringsFast(ByVal objReferenceIndex As Word.Document, ByVal objFinalOutputDocument As Word.Document, _
                       ByVal intRow As Integer, ByVal objWorksheet As Excel.Worksheet)

        Dim objParagraph As Word.Paragraph
        Dim objWordRange As Word.Range
        Dim strExcelString1 As String
        Dim strExcelString2 As String
        Dim strExcelString3 As String
        Dim blnStringFound As Boolean

        objWordRange = objReferenceIndex.Range
        strExcelString1 = objWorksheet.Cells(intRow, 1).Value
        strExcelString2 = objWorksheet.Cells(intRow, 2).Value
        strExcelString3 = objWorksheet.Cells(intRow, 3).Value

        Do While objWordRange.Find.Execute(strExcelString3)
            objParagraph = objWordRange.Paragraphs.First
            blnStringFound = objParagraph.Range.Find.Execute(strExcelString2)
            If blnStringFound = True Then
                Debug.Print(objParagraph.Range.Words.First.Text)
                If StrComp(strExcelString1, Trim(objParagraph.Range.Words.First.Text), CompareMethod.Text) = 0 Then
                    objParagraph.Range.Copy()
                    objFinalOutputDocument.Bookmarks("\EndOfDoc").Range.Paste()
                    objWorksheet.Cells(intRow, 6).Value = "Reference Found"
                    Exit Sub
                End If
            End If
        Loop
    End Sub

Open in new window

0
 

Author Comment

by:G_M
ID: 37016327
I've requested that this question be closed as follows:

Accepted answer: 0 points for G_M's comment http:/Q_27410078.html#37014434

for the following reason:

Graham definately put me on the right track, so I happily award him the points
0
 

Author Comment

by:G_M
ID: 37016328
Still trying to close this question properly :oP
0
 

Author Closing Comment

by:G_M
ID: 37043532
Graham definately put me on the right track, so I happily award him the points
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Selection object is designed for user interaction. It has a Range property, so it can be used in most places that a Range object can. Recorded macros must use the Selection because they are simply copying what the user is doing. A Range prope…
Microsoft Word is a program we have all encountered at some point, but very few of us have dug deep into its full scope of features, let alone customized it to suit our needs. Luckily making the ribbon (aka toolbar, first introduced in Word 2007) wo…
Learn how to make your own table of contents in Microsoft Word using paragraph styles and the automatic table of contents tool. We'll be using the paragraph styles in Word’s Home toolbar to help you create a table of contents. Type out your initial …
This Experts Exchange video Micro Tutorial shows how to tell Microsoft Office that a word is NOT spelled correctly. Microsoft Office has a built-in, main dictionary that is shared by Office apps, including Excel, Outlook, PowerPoint, and Word. When …
Suggested Courses

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question