VB.Net - Search for two strings in the same paragraph MS Word

Hey guys,

I have a list of pairs of strings contained in Excel that I am wanting to use to loop a search through a Word document that will detect if both strings appear before the next vbCr (carriage return). If both words are detected, then copy the paragraph to a seperate word document. I want to encorporate this into a windows app that i have been developing using VB.Net.

The general process is:
 
Open "Reference.xlsx"
Open "Input.docx"
Create "Output.docx"

In Input.docx Find "stringA1"
If "stringA1" found = True Then
	
	Find "stringB1" before next vbCr
		If "stringB1" found = True Then
			Copy paragraph to "Output.docx"
		Else 
			continue search
		End If
Else
	Excel.Range("D1").Value = "Reference Not Found"
End If

Open in new window


If only scripting was that easy :o) I hope atleast that you get the general idea of what I am trying to achieve. I have attached some test data to clarify further.

Hope someone can help.
Cheers
G_M

 references.xlsx
 input.docx
G_MAsked:
Who is Participating?
 
GrahamSkanConnect With a Mentor RetiredCommented:
And just as I clicked Submit, I realised that I had left out the vital r = r+1 line
Imports System
Imports Microsoft.Office.Interop
Imports Microsoft.Office.Interop.Word
Imports Microsoft.Office.Interop.Excel
Public Class Form1

    Public Sub SearchReferences()

        Dim objExcel As Excel.Application
        Dim objWorkbook As Excel.Workbook
        Dim objWorksheet As Excel.Worksheet
        Dim objWord As Word.Application
        Dim objWordDocumentInput As Word.Document
        Dim objWordDocumentOutput As Word.Document
        Dim strInputFileExcel As String
        Dim strInputFileWord As String
        Dim misValue As Object = System.Reflection.Missing.Value
        Dim strExcelString1 As String
        Dim strExcelString2 As String
        Dim objExcelRange1 As Excel.Range
        Dim objExcelRange2 As Excel.Range
        Dim r As Integer

        strInputFileWord = "D:\References.docx"
        strInputFileExcel = frmGetReferences.lblOutputFolder.Text & "wf_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"
        objExcel = New Excel.Application
        objWorkbook = objExcel.Workbooks.Open(strInputFileExcel)
        objWorksheet = objWorkbook.Worksheets(1)
        objWord = New Word.Application
        objWordDocumentInput = objWord.Documents.Open(strInputFileWord)
        objWordDocumentOutput = objWord.Documents.Add(misValue)
        r = 2
        Do Until objWorksheet.Cells(r, 1).Value = ""
            strExcelString1 = objWorksheet.Cells(r, 1).Value
            strExcelString2 = objWorksheet.Cells(r, 2).Value
            TwoStrings(objWordDocumentInput, objWordDocumentOutput, strExcelString1, strExcelString2)
            r = r + 1
        Loop
        'objWorkbook.Save()
        objWordDocumentOutput.SaveAs(frmGetReferences.lblOutputFolder.Text & "" & frmGetReferences.txtOutputFilename.Text & ".docx")
        'objWordDocumentOutput.Save()
        objWordDocumentInput.Close()
        objWordDocumentOutput.Close()
        objWord.Quit()
        objExcel.Quit()

        'I believe that all this setting to Nothing is unnesessary. VB does it automatically.
        'objWord = Nothing
        'objWordDocumentInput = Nothing
        'objWordDocumentOutput = Nothing
        'objExcel = Nothing
        'objWorkbook = Nothing
        'objWorksheet = Nothing


    End Sub

    Sub TwoStrings(ByVal objSourceDoc As Word.Document, ByVal objDestDoc As Word.Document, ByVal strText1 As String, ByVal strText2 As String)
        Dim para As Word.Paragraph

        For Each para In objSourceDoc.Paragraphs
            If para.Range.Find.Execute(strText1) Then
                If para.Range.Find.Execute(strText2) Then
                    para.Range.Copy()
                    objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
                End If
            End If
        Next para
    End Sub


End Class

Open in new window

0
 
NorieVBA ExpertCommented:
Why use Excel and Word?

Aren't the references you want to search for coming from a Word document?

1. Open document with original references.

2. Open document to search.

3. Search for reference(s) in first document, then search for them in the second document.

4. If found copy paragraph to new document, or even Excel.

This can probably be refined/improved a lot but kind of seems like a good idea.

PS You haven't you mentioned Access yet, you could store all this data in a database and then have code to generate the Excel/text files when needed.

PPS Or did you mention Access?
0
 
GrahamSkanRetiredCommented:
It isn't clear which bit you are stuck on; working in VB.Net,  opening Word and/or Excel documents, reading from Excel sheets,  searching in Word documents or all of the above. If it is all of them, then it is more like a project than a question.

Meanwhile, I'll try to show how to do the Word searching bit
0
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

 
G_MAuthor Commented:
imnorie, I am already working in too many applications to take Access into account (although i know you have experience there too).

I am just working with what I have at the moment. All my data is in the Excel document so I'll use it if I can I guess.

As for your question Graham, I just need a hand writing up the vb.net script required to achieve said task.
0
 
G_MAuthor Commented:
All the string references are in the excel document... the word document contains the paragraph of text those references are associated with.
0
 
G_MAuthor Commented:
Not that this will help, though you may atleast laugh... This is what I have so far ;o)

 
Public Sub SearchReferences()

        Dim objExcel As Excel.Application
        Dim objWorkbook As Excel.Workbook
        Dim objWorksheet As Excel.Worksheet
        Dim objWord As Word.Application
        Dim objWordDocumentInput As Word.Document
        Dim objWordDocumentOutput As Word.Document
        Dim strInputFileExcel As String
        Dim strInputFileWord As String
        Dim misValue As Object = System.Reflection.Missing.Value

        strInputFileWord = "D:\References.docx"
        strInputFileExcel = frmGetReferences.lblOutputFolder.Text & "wf_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"
        objExcel = New Excel.Application
        objWorkbook = objExcel.Workbooks.Open(strInputFileExcel)
        objWorksheet = objWorkbook.Worksheets(1)
        objWord = New Word.Application
        objWordDocumentInput = objWord.Documents.Open(strInputFileWord)
        objWordDocumentOutput = objWord.Documents.Add(misValue)

        'Required script for search....

        objWorkbook.Save()
        objWordDocumentOutput.Save()
        objWordDocumentInput.Close()
        objWordDocumentOutput.Close()
        objWord.Quit()
        objExcel.Quit()

        objWord = Nothing
        objWordDocumentInput = Nothing
        objWordDocumentOutput = Nothing
        objExcel = Nothing
        objWorkbook = Nothing
        objWorksheet = Nothing


    End Sub

Open in new window

0
 
G_MAuthor Commented:
missed a line... :o)


objWordDocumentOutput.SaveAs(frmGetReferences.lblOutputFolder.Text & "" & frmGetReferences.txtOutputFilename.Text & ".docx")

Open in new window

0
 
GrahamSkanRetiredCommented:
Here is the bit to open and the Word app and docs and to call the sub.

I'll try to fit it with you code
Dim objWordApp As Word.Application
        Dim objSrceDoc As Word.Document
        Dim objDstDoc As Word.Document
        Dim objSelection As Word.Selection
        
        objWordApp = CreateObject("Word.Application")
        objSrceDoc = objWord.Documents.Open("C:\MyFolder\input.DocX")
        objDstDoc = objWord.Documents.Add()
        '...
        TwoStrings(objSrceDoc, objSrceDoc, "septation", "block")
        '...

Open in new window

0
 
G_MAuthor Commented:
Ok I am now trying to put these values into a range so they can be offset before the loop

 
Public Sub SearchReferences()

        Dim objExcel As Excel.Application
        Dim objWorkbook As Excel.Workbook
        Dim objWorksheet As Excel.Worksheet
        Dim objWord As Word.Application
        Dim objWordDocumentInput As Word.Document
        Dim objWordDocumentOutput As Word.Document
        Dim strInputFileExcel As String
        Dim strInputFileWord As String
        Dim misValue As Object = System.Reflection.Missing.Value
        Dim strExcelString1 As String
        Dim strExcelString2 As String
        Dim objExcelRange1 As Excel.Range
        Dim objExcelRange2 As Excel.Range

        strInputFileWord = "D:\References.docx"
        strInputFileExcel = frmGetReferences.lblOutputFolder.Text & "wf_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"
        objExcel = New Excel.Application
        objWorkbook = objExcel.Workbooks.Open(strInputFileExcel)
        objWorksheet = objWorkbook.Worksheets(1)
        objWord = New Word.Application
        objWordDocumentInput = objWord.Documents.Open(strInputFileWord)
        objWordDocumentOutput = objWord.Documents.Add(misValue)
        objWordDocumentOutput.SaveAs(frmGetReferences.lblOutputFolder.Text & "" & frmGetReferences.txtOutputFilename.Text & ".docx")
        objExcelRange1 = objExcel.Range("A2")
        objExcelRange2 = objExcel.Range("B2")

        strExcelString1 = objExcel.Range(objExcelRange1).Value
        strExcelString2 = objExcel.Range(objExcelRange2).Value


        'Required script for search....

        objWorkbook.Save()
        objWordDocumentOutput.Save()
        objWordDocumentInput.Close()
        objWordDocumentOutput.Close()
        objWord.Quit()
        objExcel.Quit()

        objWord = Nothing
        objWordDocumentInput = Nothing
        objWordDocumentOutput = Nothing
        objExcel = Nothing
        objWorkbook = Nothing
        objWorksheet = Nothing


    End Sub

Open in new window

0
 
GrahamSkanRetiredCommented:
Hmm. I've just switched from VBA (where it is much faster to develop) to .Net, opened an earlier project and found some very similar code with the object 'frmGetReferences' flagged as unknown, so I guess that I've tried to work on this before.
0
 
GrahamSkanRetiredCommented:
I don't understand where the pair of strings is coming from. Your sheet has only one text column.
0
 
G_MAuthor Commented:
That would be this one... http://rdsrc.us/SJ6yR2

I am looking at the same code
0
 
G_MAuthor Commented:
The second string is the "Year" column B2:B
0
 
GrahamSkanRetiredCommented:
I haven't set up all the externals yet, so this code is untested
Imports System
Imports Microsoft.Office.Interop
Imports Microsoft.Office.Interop.Word
Imports Microsoft.Office.Interop.Excel
Public Class Form1

    Public Sub SearchReferences()

        Dim objExcel As Excel.Application
        Dim objWorkbook As Excel.Workbook
        Dim objWorksheet As Excel.Worksheet
        Dim objWord As Word.Application
        Dim objWordDocumentInput As Word.Document
        Dim objWordDocumentOutput As Word.Document
        Dim strInputFileExcel As String
        Dim strInputFileWord As String
        Dim misValue As Object = System.Reflection.Missing.Value
        Dim strExcelString1 As String
        Dim strExcelString2 As String
        Dim objExcelRange1 As Excel.Range
        Dim objExcelRange2 As Excel.Range
        Dim r As Integer

        strInputFileWord = "D:\References.docx"
        strInputFileExcel = frmGetReferences.lblOutputFolder.Text & "wf_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"
        objExcel = New Excel.Application
        objWorkbook = objExcel.Workbooks.Open(strInputFileExcel)
        objWorksheet = objWorkbook.Worksheets(1)
        objWord = New Word.Application
        objWordDocumentInput = objWord.Documents.Open(strInputFileWord)
        objWordDocumentOutput = objWord.Documents.Add(misValue)
        r = 2
        Do Until objWorksheet.Cells(r, 1).Value = ""
            strExcelString1 = objWorksheet.Cells(r, 1).Value
            strExcelString2 = objWorksheet.Cells(r, 2).Value
            TwoStrings(objWordDocumentInput, objWordDocumentOutput, strExcelString1, strExcelString2)
        Loop
        'objWorkbook.Save()
        objWordDocumentOutput.SaveAs(frmGetReferences.lblOutputFolder.Text & "" & frmGetReferences.txtOutputFilename.Text & ".docx")
        'objWordDocumentOutput.Save()
        objWordDocumentInput.Close()
        objWordDocumentOutput.Close()
        objWord.Quit()
        objExcel.Quit()

        'I believe that all this setting to Nothing is unnesessary. VB does it automatically.
        'objWord = Nothing
        'objWordDocumentInput = Nothing
        'objWordDocumentOutput = Nothing
        'objExcel = Nothing
        'objWorkbook = Nothing
        'objWorksheet = Nothing


    End Sub

    Sub TwoStrings(ByVal objSourceDoc As Word.Document, ByVal objDestDoc As Word.Document, ByVal strText1 As String, ByVal strText2 As String)
        Dim para As Word.Paragraph

        For Each para In objSourceDoc.Paragraphs
            If para.Range.Find.Execute(strText1) Then
                If para.Range.Find.Execute(strText2) Then
                    para.Range.Copy()
                    objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
                End If
            End If
        Next para
    End Sub


End Class

Open in new window

0
 
GrahamSkanConnect With a Mentor RetiredCommented:
I have now tested it, and I think that it was too slow. Here is a faster version of the TwoStrings procedure. It searches the whole of the document for the first string.
Sub TwoStrings(ByVal objSourceDoc As Word.Document, ByVal objDestDoc As Word.Document, ByVal strText1 As String, ByVal strText2 As String)
        Dim para As Word.Paragraph
        Dim rng As Word.Range

        rng = objSourceDoc.Range
        Do While rng.Find.Execute(strText1)
            para = rng.Paragraphs.First
            If para.Range.Find.Execute(strText2) Then
                para.Range.Copy()
                objDestDoc.Bookmarks("\EndOfDoc").Range.Paste()
            End If
        Loop
    End Sub

Open in new window

0
 
G_MAuthor Commented:
It wasnt that it was too slow. you were passing the wrong value names to the TwoStrings Sub. I'm just running a test with those changes.

If you can post your code again with those values changed, I can give it the tick
0
 
GrahamSkanRetiredCommented:
I require some convincing about the wrong values being passed. The first worked slowly because it was searching each paragraph separately. The second version seems to work without changing the passed parameters.
0
 
G_MAuthor Commented:
hmmm... sorry my misunderstanding; no offense meant.

That second one was a lot faster.

there does seem to be some inconsistancy though. I have shortened my list so I got some hits and it does something strange with how it detects the first hit but then doesn't repeat the strangeness on the last entry.

You'll need to look at the files attached to get what I'm saying :o)

ch1.docx
 input.docx
 references.xlsx
0
 
G_MAuthor Commented:
Disregard that last one... it's 1am here I missed the second string not matching... Thank you for your help.

Thanks again Graham, I really appreciate you help.
0
 
GrahamSkanRetiredCommented:
Thanks. It's 1am here (Guildford), too
0
 
G_MAuthor Commented:
I cannot believe you bothered to help me out with this one at 1am... thank you again Graham :o)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.