G_M
asked on
VB.Net - Speed up search for multiple strings in MS Word
Hey guys,
I have been working on an application that searches a document for 3 different strings (based off an excel list). Initially I was given two example that worked (thank you Graham http://rdsrc.us/byxEui); however, one was slower than the other due to the search range.
After recieving this solution I realised that what I asked for was flawed and what I needed was the range for the first string to be the first word in the paragraph and the other two strings to be the found in the paragraph that the first string was found.
Naturally I tried to modify the faster version of the script. Unfortunately I could only get the slower version working; when I say slow I mean 45mins to search 200 items. This is the code for what I require to be refined:
Attached also is some sample test data including the excel list and the word document I'm searching.
samplelist.xlsx
input.docx
Hope someone can help.
Cheers
G_M
I have been working on an application that searches a document for 3 different strings (based off an excel list). Initially I was given two example that worked (thank you Graham http://rdsrc.us/byxEui); however, one was slower than the other due to the search range.
After recieving this solution I realised that what I asked for was flawed and what I needed was the range for the first string to be the first word in the paragraph and the other two strings to be the found in the paragraph that the first string was found.
Naturally I tried to modify the faster version of the script. Unfortunately I could only get the slower version working; when I say slow I mean 45mins to search 200 items. This is the code for what I require to be refined:
Sub TwoStrings(ByVal objSourceDoc As Word.Document, ByVal objDestDoc As Word.Document, _
ByVal intRow As Integer, ByVal objWorksheet As Excel.Worksheet)
Dim strExcelString1 As String
Dim strExcelString2 As String
Dim strExcelString3 As String
Dim para As Word.Paragraph
strExcelString1 = objWorksheet.Cells(intRow, 1).Value
strExcelString2 = objWorksheet.Cells(intRow, 2).Value
strExcelString3 = objWorksheet.Cells(intRow, 3).Value
For Each para In objSourceDoc.Paragraphs
If para.Range.Words(1).Find.Execute(strExcelString1) Then
If para.Range.Find.Execute(strExcelString2) Then
If para.Range.Find.Execute(strExcelString3) Then
para.Range.Copy()
objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
objWorksheet.Range("F" & intRow).Value = "Reference Found"
Exit Sub
End If
End If
End If
Next para
End Sub
The above code is essentially triggered by this:intRow = 2
Do Until objWorksheet.Cells(intRow, 1).Value = ""
TwoStrings(objReferenceIndex, objFinalOutputDocument, intRow, objWorksheet)
intRow = intRow + 1
objFinalOutputDocument.Save()
objWorkbook.Save()
Loop
This is the example of the faster version that searches for the first string in the entire paragraph.Sub TwoStringsFast(ByVal objSourceDoc As Word.Document, ByVal objDestDoc As Word.Document, _
ByVal strText1 As String, ByVal strText2 As String) 'Not in use...
Dim para As Word.Paragraph
Dim rng As Word.Range
rng = objSourceDoc.Range
Do While rng.Find.Execute(strText1)
para = rng.Paragraphs.First
If para.Range.Find.Execute(strText2) Then
para.Range.Copy()
objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
End If
Loop
End Sub
Attached also is some sample test data including the excel list and the word document I'm searching.
samplelist.xlsx
input.docx
Hope someone can help.
Cheers
G_M
If you want to speed up code in any .NET to Office application, move as much of the code as you want in a VBA macro in the Office document or template. Then call that macro from you .NET application by calling the Office Application object Run method. You then have only one call between .NET and the Office application, and that usually speeds things up a lot.
ASKER
Wouldn't that method involve having to modify the "document to be searched" to a .docm and creating the macro within that?
I am trying to keep everything within the application itself so the user can select the documents themselves.
I am trying to keep everything within the application itself so the user can select the documents themselves.
If you are working with a lone document, yes, that would mean adding code to the document.
But Word documents are built on templates (.dot or .dotx files). Unless told otherwise, they use a default template.
If you create your own template however, and put the code in the template, all documents built with that template will have access to the code. This is a standard practice when working with Word (or Excel, although the mechanism is a little different) when you have many documents that should share the same code, formatting, styles, you name it. This is what is used when you are in Work and go File...New and ask for a template. You can easily create your own. Coding in the template is even easier because a lot of the things you have to take care of in .NET, such as the Application and Document object are seen as default in VBA, and you do not have to think about them.
In such a situation, creating a template for your application is akind to having a dll or a configuration file. The template could be seen as being part of your application. And the gain in performance is usually quite important. You can easily expect 10x to 20x speed improvement.
But Word documents are built on templates (.dot or .dotx files). Unless told otherwise, they use a default template.
If you create your own template however, and put the code in the template, all documents built with that template will have access to the code. This is a standard practice when working with Word (or Excel, although the mechanism is a little different) when you have many documents that should share the same code, formatting, styles, you name it. This is what is used when you are in Work and go File...New and ask for a template. You can easily create your own. Coding in the template is even easier because a lot of the things you have to take care of in .NET, such as the Application and Document object are seen as default in VBA, and you do not have to think about them.
In such a situation, creating a template for your application is akind to having a dll or a configuration file. The template could be seen as being part of your application. And the gain in performance is usually quite important. You can easily expect 10x to 20x speed improvement.
ASKER
So in practice if/when I deployed this application, if I go down this track, it would replace the existing default word template so my VBA code is embedded within it...?
I believe I saw a run command in VB.Net somewhere... I'm guessing this is how I would run said macro from my application.
It's an interesting concept, but dangerous for a novice. Although backing up the original template is a given. I may leave this one for the moment as a last resort. Thank you for the idea though.
I believe I saw a run command in VB.Net somewhere... I'm guessing this is how I would run said macro from my application.
It's an interesting concept, but dangerous for a novice. Although backing up the original template is a given. I may leave this one for the moment as a last resort. Thank you for the idea though.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
ASKER
strComp returned an integer, so I changed the code a little to suit. It seems to worked well. I threw a message box in there to compare the matches and it all seems pretty good. This is the code edit I did:
Unfortunately though for this one I need to add a third string... Is there an easy way to encorporate that?
Do While rng.Find.Execute(strExcelString2)
para = rng.Paragraphs.First
Debug.Print(para.Range.Words.First.Text)
MessageBox.Show(strExcelString1 & " " & Trim(para.Range.Words.First.Text))
If StrComp(strExcelString1, Trim(para.Range.Words.First.Text), CompareMethod.Text) = 0 Then
para.Range.Copy()
objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
End If
Loop
Unfortunately though for this one I need to add a third string... Is there an easy way to encorporate that?
Sorry, I'm flagging a bit tonight. Perhaps I shouldn't have drunk the third bottle of wine (:>).
I'll try to pick this up in the morning. Prompt me with a spurious comment if I seem to have forgotten.
I'll try to pick this up in the morning. Prompt me with a spurious comment if I seem to have forgotten.
ASKER
That's fine Graham... thank you for giving it a good crack anyway :o)
I'll be monitoring this through the day anyway, so whenever you get over that third bottle, I look forward to hearing from you.
Cheers
G_M
I'll be monitoring this through the day anyway, so whenever you get over that third bottle, I look forward to hearing from you.
Cheers
G_M
ASKER
Ok I worked it out... really not that hard after all. Thanks for the leg up Graham. If there is something blaringly wrong with my code (apart from the unnecessary blnStringFound variable), please let me know. I’ll leave this one open for you reply and give you the points after that.
Thanks again Graham
Cheers
G_M
Thanks again Graham
Cheers
G_M
Sub TwoStringsFast(ByVal objSourceDoc As Word.Document, ByVal objDestDoc As Word.Document, _
ByVal intRow As Integer, ByVal objWorksheet As Excel.Worksheet)
Dim para As Word.Paragraph
Dim rng As Word.Range
Dim strExcelString1 As String
Dim strExcelString2 As String
Dim strExcelString3 As String
Dim blnStringFound As Boolean
rng = objSourceDoc.Range
strExcelString1 = objWorksheet.Cells(intRow, 1).Value
strExcelString2 = objWorksheet.Cells(intRow, 2).Value
strExcelString3 = objWorksheet.Cells(intRow, 3).Value
Do While rng.Find.Execute(strExcelString3)
para = rng.Paragraphs.First
blnStringFound = para.Range.Find.Execute(strExcelString2)
If blnStringFound = True Then
Debug.Print(para.Range.Words.First.Text)
If StrComp(strExcelString1, Trim(para.Range.Words.First.Text), CompareMethod.Text) = 0 Then
para.Range.Copy()
objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
Exit Sub
End If
End If
Loop
End Sub
That looks OK to me.
For increased performance, because the paragraphs and the spreadsheet are both in alphabetical order, I was working on method of stepping through each paragraph, adjusting the spreadsheet row as necessary. Let us know if your code is still not fast enough & I'll complete it.
For increased performance, because the paragraphs and the spreadsheet are both in alphabetical order, I was working on method of stepping through each paragraph, adjusting the spreadsheet row as necessary. Let us know if your code is still not fast enough & I'll complete it.
ASKER
It's fantastic Graham... It's gone from 45min to 3mins for 200 items. I can't complain. Thanks again for your help.
Cheers
G_M
Cheers
G_M
ASKER
I've requested that this question be closed as follows:
Accepted answer: 500 points for GrahamSkan's comment http:/Q_27410078.html#37012077
Assisted answer: 0 points for G_M's comment http:/Q_27410078.html#37013835
for the following reason:
Graham definately put me on the right track, so I happily award him the points
Accepted answer: 500 points for GrahamSkan's comment http:/Q_27410078.html#37012077
Assisted answer: 0 points for G_M's comment http:/Q_27410078.html#37013835
for the following reason:
Graham definately put me on the right track, so I happily award him the points
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I've requested that this question be closed as follows:
Accepted answer: 0 points for G_M's comment http:/Q_27410078.html#37014434
for the following reason:
Graham definately put me on the right track, so I happily award him the points
Accepted answer: 0 points for G_M's comment http:/Q_27410078.html#37014434
for the following reason:
Graham definately put me on the right track, so I happily award him the points
ASKER
Still trying to close this question properly :oP
ASKER
Graham definately put me on the right track, so I happily award him the points