Link to home
Start Free TrialLog in
Avatar of G_M
G_MFlag for Australia

asked on

VB.Net - Speed up search for multiple strings in MS Word

Hey guys,

I have been working on an application that searches a document for 3 different strings (based off an excel list). Initially I was given two example that worked (thank you Graham http://rdsrc.us/byxEui); however, one was slower than the other due to the search range.

After recieving this solution I realised that what I asked for was flawed and what I needed was the range for the first string to be the first word in the paragraph and the other two strings to be the found in the paragraph that the first string was found.

Naturally I tried to modify the faster version of the script. Unfortunately I could only get the slower version working; when I say slow I mean 45mins to search 200 items. This is the code for what I require to be refined:
Sub TwoStrings(ByVal objSourceDoc As Word.Document, ByVal objDestDoc As Word.Document, _
                   ByVal intRow As Integer, ByVal objWorksheet As Excel.Worksheet)

        Dim strExcelString1 As String
        Dim strExcelString2 As String
        Dim strExcelString3 As String
        Dim para As Word.Paragraph

        strExcelString1 = objWorksheet.Cells(intRow, 1).Value
        strExcelString2 = objWorksheet.Cells(intRow, 2).Value
        strExcelString3 = objWorksheet.Cells(intRow, 3).Value

        For Each para In objSourceDoc.Paragraphs
            If para.Range.Words(1).Find.Execute(strExcelString1) Then
                If para.Range.Find.Execute(strExcelString2) Then
                    If para.Range.Find.Execute(strExcelString3) Then
                        para.Range.Copy()
                        objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
                        objWorksheet.Range("F" & intRow).Value = "Reference Found"
                        Exit Sub
                    End If
                End If
            End If
        Next para
    End Sub

Open in new window

The above code is essentially triggered by this:
intRow = 2
        Do Until objWorksheet.Cells(intRow, 1).Value = ""
            TwoStrings(objReferenceIndex, objFinalOutputDocument, intRow, objWorksheet)
            intRow = intRow + 1
            objFinalOutputDocument.Save()
            objWorkbook.Save()
        Loop

Open in new window

This is the example of the faster version that searches for the first string in the entire paragraph.
Sub TwoStringsFast(ByVal objSourceDoc As Word.Document, ByVal objDestDoc As Word.Document, _
                       ByVal strText1 As String, ByVal strText2 As String) 'Not in use...
        Dim para As Word.Paragraph
        Dim rng As Word.Range

        rng = objSourceDoc.Range
        Do While rng.Find.Execute(strText1)
            para = rng.Paragraphs.First
            If para.Range.Find.Execute(strText2) Then
                para.Range.Copy()
                objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
            End If
        Loop
    End Sub

Open in new window


Attached also is some sample test data including the excel list and the word document I'm searching.
 samplelist.xlsx
 input.docx
Hope someone can help.
Cheers
G_M

Avatar of Jacques Bourgeois (James Burger)
Jacques Bourgeois (James Burger)
Flag of Canada image

If you want to speed up code in any .NET to Office application, move as much of the code as you want in a VBA macro in the Office document or template. Then call that macro from you .NET application by calling the Office Application object Run method. You then have only one call between .NET and the Office application, and that usually speeds things up a lot.

Avatar of G_M

ASKER

Wouldn't that method involve having to modify the "document to be searched" to a .docm and creating the macro within that?

I am trying to keep everything within the application itself so the user can select the documents themselves.
If you are working with a lone document, yes, that would mean adding code to the document.

But Word documents are built on templates (.dot or .dotx files). Unless told otherwise, they use a default template.

If you create your own template however, and put the code in the template, all documents built with that template will have access to the code. This is a standard practice when working with Word (or Excel, although the mechanism is a little different) when you have many documents that should share the same code, formatting, styles, you name it. This is what is used when you are in Work and go File...New and ask for a template. You can easily create your own. Coding in the template is even easier because a lot of the things you have to take care of in .NET, such as the Application and Document object are seen as default in VBA, and you do not have to think about them.

In such a situation, creating a template for your application is akind to having a dll or a configuration file. The template could be seen as being part of your application. And the gain in performance is usually quite important. You can easily expect 10x to 20x speed improvement.
Avatar of G_M

ASKER

So in practice if/when I deployed this application, if I go down this track, it would replace the existing default word template so my VBA code is embedded within it...?

I believe I saw a run command in VB.Net somewhere... I'm guessing this is how I would run said macro from my application.

It's an interesting concept, but dangerous for a novice. Although backing up the original template is a given. I may leave this one for the moment as a last resort. Thank you for the idea though.
SOLUTION
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of G_M

ASKER

Sorry, I uploaded the wrong list... here is the proper sample list

 samplelist.xlsx
Avatar of G_M

ASKER

strComp returned an integer, so I changed the code a little to suit. It seems to worked well. I threw a message box in there to compare the matches and it all seems pretty good.  This is the code edit I did:

 
Do While rng.Find.Execute(strExcelString2)
            para = rng.Paragraphs.First
            Debug.Print(para.Range.Words.First.Text)
            MessageBox.Show(strExcelString1 & " " & Trim(para.Range.Words.First.Text))

            If StrComp(strExcelString1, Trim(para.Range.Words.First.Text), CompareMethod.Text) = 0 Then
                para.Range.Copy()
                objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
            End If
        Loop

Open in new window


Unfortunately though for this one I need to add a third string... Is there an easy way to encorporate that?
Sorry, I'm flagging a bit tonight. Perhaps I shouldn't have drunk the third bottle of wine (:>).

I'll try to pick this up in the morning. Prompt me with a spurious comment if I seem to have forgotten.
Avatar of G_M

ASKER

That's fine Graham... thank you for giving it a good crack anyway :o)

I'll be monitoring this through the day anyway, so whenever you get over that third bottle, I look forward to hearing from you.

Cheers
G_M
Avatar of G_M

ASKER

Ok I worked it out... really not that hard after all. Thanks for the leg up Graham. If there is something blaringly wrong with my code (apart from the unnecessary blnStringFound variable), please let me know. I’ll leave this one open for you reply and give you the points after that.

Thanks again Graham
Cheers
G_M

 
Sub TwoStringsFast(ByVal objSourceDoc As Word.Document, ByVal objDestDoc As Word.Document, _
                       ByVal intRow As Integer, ByVal objWorksheet As Excel.Worksheet)
        Dim para As Word.Paragraph
        Dim rng As Word.Range
        Dim strExcelString1 As String
        Dim strExcelString2 As String
        Dim strExcelString3 As String
        Dim blnStringFound As Boolean

        rng = objSourceDoc.Range
        strExcelString1 = objWorksheet.Cells(intRow, 1).Value
        strExcelString2 = objWorksheet.Cells(intRow, 2).Value
        strExcelString3 = objWorksheet.Cells(intRow, 3).Value

        Do While rng.Find.Execute(strExcelString3)
            para = rng.Paragraphs.First
            blnStringFound = para.Range.Find.Execute(strExcelString2)
            If blnStringFound = True Then
                Debug.Print(para.Range.Words.First.Text)
                If StrComp(strExcelString1, Trim(para.Range.Words.First.Text), CompareMethod.Text) = 0 Then
                    para.Range.Copy()
                    objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
                    Exit Sub
                End If
            End If
        Loop
    End Sub

Open in new window

That looks OK to me.

For increased performance, because the paragraphs and the spreadsheet are both in alphabetical order,  I was working on method of stepping through each paragraph, adjusting the spreadsheet row as necessary. Let us know if your code is still not fast enough & I'll complete it.


Avatar of G_M

ASKER

It's fantastic Graham... It's gone from 45min to 3mins for 200 items. I can't complain. Thanks again for your help.

Cheers
G_M
Avatar of G_M

ASKER

I've requested that this question be closed as follows:

Accepted answer: 500 points for GrahamSkan's comment http:/Q_27410078.html#37012077
Assisted answer: 0 points for G_M's comment http:/Q_27410078.html#37013835

for the following reason:

Graham definately put me on the right track, so I happily award him the points
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of G_M

ASKER

I've requested that this question be closed as follows:

Accepted answer: 0 points for G_M's comment http:/Q_27410078.html#37014434

for the following reason:

Graham definately put me on the right track, so I happily award him the points
Avatar of G_M

ASKER

Still trying to close this question properly :oP
Avatar of G_M

ASKER

Graham definately put me on the right track, so I happily award him the points