Link to home
Start Free TrialLog in
Avatar of WonHop
WonHopFlag for United States of America

asked on

Find The Correct Page Number In A MS Word Document Using Visual Basic Studio

Hello All.
I am working with a Visual Basic Studio Application to Find text in a Microsoft Word Document.  It was written by a guy who is long gone from the company.  
Since I did some Excel VBA, I was given the task of trying to work on this Application.  I have not done actual Visual Basic since the year 2000.

Please be patient and give me time to work on any submitted attempts.  I am rusty in VB and I will need time to check your attempt with his code without messing the whole thing up.

What Should be happening:
It should be looping thru MS Access Query to find Text (Acronyms) in the MS Word Document.  When is finds the text, it captures the Page Number from the footer.
The text is in the Document several times.  It should be capturing the text from the very first page it appears.  
Example:   The Term "ACBM" is in the document 10 times.  It is on Pages 7-31, 7-33,  7-35 and A-9.

The Problem:
It will loop thru the Document and find all 10 and capture the page number.  The problem is, on the VERY FIRST FIND and PAGE NUMBER captured is wrong. Once it starts looping thru the document, it finds all of them correctly.
It is very random.  It will show the very first find being on Pages 7-30 or 7-38.  It never shows 7-31 as being the first page.  I wrote code the compare the lowest page numbers as it looped.  But when the first page number it finds is before the actual first page, the code is useless.

Is there a way to search the MS Word Document and get accurate first page numbers?  
I would also like to know how many results where found from each search
Once the text is found, is there a way to capture that text if a variable so the I can verify that is has found the correct text when I am stepping thru the code?

Maybe something like what is listed below:
strTerm = ACBM" -  The current Term to be searched
FindAll strTerm
intSearchResuts = 10 Found
FindFirst strTerm
strTerm.Select - Select the term in the different locations in the document as it loops thru the document.
strTermSelected = Selection  -  I will like this code just as a verification that the correct Term is found and selected.
strPageNumber = PageNumber From the Footer.

Thanks
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

It is difficult to guess what is going wrong. However, be aware that pagination can vary between environments, so that the requested text might be a page or two different from what is expected.

Ideally we would like to see the code and the document.
Avatar of WonHop

ASKER

I can send the code if you think you might be able to maybe work with that.  I checked with my company and I am not allowed to send the documents.
Sorry.  Let me know if you still want to see the code.

Nate
Yes, It will be a help to see the actual code.
Avatar of WonHop

ASKER

Here is the code.  There is a lot of other stuff that happens before this,  This is the code that gathers the data.  A report is created later.

Private Sub FirstOccurrencesReportToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles FirstOccurrencesReportToolStripMenuItem.Click
        If Me.DataGridView2.RowCount = 0 Then
            MessageBox.Show("There are no items to find First Occurrences for.", "No Items For First Occurrences", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
            Exit Sub
        End If

        If IsNothing(gSourceDocument) = True Then
            MessageBox.Show("Internal error: target document not found.", "Internal error. No Target Document Found", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
            Exit Sub
        End If

        SetPanelDimensions()
        Panel1.Show()
        WaitDialogStatuslbl.Text = "Creating First Occurrences report"

        LoadApplictionProgressBar.Minimum = 0
        LoadApplictionProgressBar.Maximum = DataGridView2.Rows.Count + 1  '+1 is for bug in progressbar
        LoadApplictionProgressBar.Value = 0

        Dim combinedWordList As String = vbNullString
        Dim frontMatterMaterial As String
        Dim bodyMaterial As String

        'Create copy of the source document
        Dim copyDocument As Word.Document = gAppWord.Documents.Add(gSourceDocument.FullName)

        'Go through each row...
        For Each item As DataGridViewRow In Me.DataGridView2.Rows
            LoadApplictionProgressBar.Value += 2 : LoadApplictionProgressBar.Value -= 1 'bug in progressbar
            System.Windows.Forms.Application.DoEvents()

            Dim acroString As String = Trim(item.Cells(0).Value)
            Dim definationString As String = Trim(item.Cells(1).Value)

            Dim intdefinationString As Integer  '05082018
            intdefinationString = CInt(definationString.Length)  '05082018

            If definationString.Length = 0 Then Continue For

            Dim findRange As Word.Range = copyDocument.Range

            '0 = total count/1=number in tables/2= not in tables/3 = in front material/4 = in body
            Dim CountAndBodyOrTable(IN_BODY) As Long

            Dim definationRange As Word.Range = copyDocument.Range
            findRange.Find.Text = Trim(item.Cells(0).Value)
            findRange.Find.MatchWholeWord = True
            findRange.Find.MatchCase = True

            'Default filler
            'update ACRONYM_LINE_COUNT if the number of vbCrLf changes
            combinedWordList &= Trim(item.Cells(0).Value) & " - " & Trim(item.Cells(1).Value) & vbCrLf
            combinedWordList &= "Page #" & vbCrLf & "Where?" & vbCrLf & "Context" & vbCrLf

            frontMatterMaterial = vbNullString & vbCrLf & "Front Matter" & vbCrLf & "NA" & vbCrLf
            bodyMaterial = vbNullString & vbCrLf & "Body" & vbCrLf & "NA" & vbCrLf

            If findRange.Find.Text = "CTBE" Then
                Dim test As String = "test"
            End If

            Dim sectionCount As Integer = findRange.Sections.Count

            Dim strpageNumberTextFirstCheck As String  '05/08/2018
            Dim strpageNumberTextSecondCheck As String  '05/08/2018
            Dim strLowerPageNumber As String  '05/08/2018
            Dim strpageNumberText As String  '05/08/2018
            strpageNumberTextFirstCheck = ""
            strpageNumberTextSecondCheck = ""
            strLowerPageNumber = ""
            strpageNumberText = ""

            'Go through each term
            Do While findRange.Find.Execute() = True
                'findRange.Select()

                'Dim pageNumberText As String = findRange.Sections(sectionNumber).Footers(Word.WdHeaderFooterIndex.wdHeaderFooterPrimary).Range.Text
                'Dim pageNumberText As String = findRange.Sections(1).Footers(Word.WdHeaderFooterIndex.wdHeaderFooterPrimary).Range.Text
                Dim pageNumberText As String = findRange.Sections(1).Footers(WdHeaderFooterIndex.wdHeaderFooterPrimary).Range.Text
                pageNumberText = Trim(pageNumberText.Replace(Chr(13), vbNullString))

                'strpageNumberText = Mid(pageNumberText, 1, 4)  '05/08/2018
                strpageNumberText = pageNumberText  '05/08/2018

                If strpageNumberTextFirstCheck = "" Then  '05/08/2018
                    strpageNumberTextFirstCheck = strpageNumberText  '05/08/2018
                    strLowerPageNumber = pageNumberText
                Else  '05/08/2018
                    strpageNumberTextSecondCheck = strpageNumberText  '05/08/2018
                    'strLowerPageNumber = pageNumberText
                End If  '05/08/2018

                If strpageNumberTextSecondCheck <> "" Then  '05/09/2018
                    If strpageNumberTextFirstCheck < strpageNumberTextSecondCheck Then  '05/09/2018
                        strpageNumberText = strpageNumberTextFirstCheck  '05/09/2018
                        strLowerPageNumber = pageNumberText
                    ElseIf strpageNumberTextSecondCheck < strpageNumberTextFirstCheck Then  '05/09/2018
                        strpageNumberTextFirstCheck = pageNumberText
                        strpageNumberText = strpageNumberTextSecondCheck  '05/09/2018
                        strLowerPageNumber = pageNumberText
                    End If  '05/09/2018
                End If  '05/09/2018
                pageNumberText = strLowerPageNumber  '05/09/2018

                'If it doesn't have a page number, then skip it
                If pageNumberText.Length = 0 Then Continue Do

                'If it's in a table, skip it
                If findRange.Information(Word.WdInformation.wdWithInTable) = True Then
                    'In table
                    CountAndBodyOrTable(IN_TABLE) += 1
                    Continue Do
                Else
                    'Not in table
                    CountAndBodyOrTable(NOT_IN_TABLE) += 1  'OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
                End If

                CountAndBodyOrTable(TOTAL_COUNT) += 1   'OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

                WaitDialogStatuslbl.Text = "Creating First Occurrences report. Finding (" & findRange.Find.Text & ") " & CountAndBodyOrTable(0).ToString

                Dim term As String = findRange.Find.Text

                definationRange.Start = findRange.Start - (definationString.Length * 3)
                If definationRange.Start < 0 Then definationRange.Start = 0

                definationRange.End = findRange.End + (definationString.Length * 3)
                If definationRange.End > copyDocument.Range.End Then definationRange.End = copyDocument.Range.End

                'Is in Front Matter?
                If findRange.Sections(1).Footers(Word.WdHeaderFooterIndex.wdHeaderFooterPrimary).PageNumbers.NumberStyle = Word.WdPageNumberStyle.wdPageNumberStyleLowercaseRoman Then
                    CountAndBodyOrTable(IN_FRONT_MATERIAL) += 1
                    If CountAndBodyOrTable(IN_FRONT_MATERIAL) > 1 Then Continue Do

                    frontMatterMaterial = pageNumberText & vbCrLf & "Front Matter" & vbCrLf & ReplaceCertainSymbols(definationRange.Text) & vbCrLf
                End If

                'Is in Body?
                Dim blnFoundInBody As Boolean
                If findRange.Sections(1).Footers(Word.WdHeaderFooterIndex.wdHeaderFooterPrimary).PageNumbers.NumberStyle = Word.WdPageNumberStyle.wdPageNumberStyleArabic Then

                    '***************************************************************************************************************
                    '''THIS IS THE OLD CODE.  IT WAS ADDING PAGE NUMBERS THAT WAS INCORRECT BECAUSE IT WAS ADDING THE FIRST PAGE
                    '''THE CODE FOUND.  EVEN IF THE DATA WAS NOT ON THAT PAGE.  I ADDED THE CODE BELOW TO SEE IF THAT WORKS BETTER.
                    'CountAndBodyOrTable(IN_BODY) += 1
                    'If CountAndBodyOrTable(IN_BODY) > 1 Then Continue Do
                    'Goes to Private Function ReplaceCertainSymbols(ByVal definationRangeString As String) As String
                    'bodyMaterial = pageNumberText & vbCrLf & "Body" & vbCrLf & ReplaceCertainSymbols(definationRange.Text) & vbCrLf
                    '******************************************************************************************************************
                    blnFoundInBody = True
                    'pageNumberText = strLowerPageNumber
                    bodyMaterial = pageNumberText & vbCrLf & "Body" & vbCrLf & ReplaceCertainSymbols(definationRange.Text) & vbCrLf
                    pageNumberText = ""
                    Continue Do
                End If
            Loop

            combinedWordList &= frontMatterMaterial
            combinedWordList &= bodyMaterial
            combinedWordList &= vbCrLf

            System.Windows.Forms.Application.DoEvents()
        Next

        copyDocument.Close(Word.WdSaveOptions.wdDoNotSaveChanges)
        copyDocument = Nothing

        Panel1.Show()

        SaverFirstOccurenceWordList("First Occurrences Report", combinedWordList)
    End Sub
**********************************************************************************************
    Private Function ReplaceCertainSymbols(ByVal definationRangeString As String) As String
        definationRangeString = definationRangeString.Replace(Chr(9), ChrW(&H25BA)) 'Add the tab symbol
        definationRangeString = definationRangeString.Replace(Chr(11), ChrW(&HAC)) 'Add the tab symbol
        definationRangeString = definationRangeString.Replace(Chr(12), ChrW(&H2557)) 'Add the tab symbol
        definationRangeString = definationRangeString.Replace(Chr(13), ChrW(&HB6)) 'Add the pilcrow symbol

        Return definationRangeString
    End Function

Open in new window

Here is an attempt to do in VBA what you describe:
Sub FindPageNumbers()
    Dim strFindText As String
    Dim strPageNumbers() As String
    Dim rng As Range
    Dim p As Integer
    Dim i As Integer
    
    strFindText = "ABCM"
    Set rng = ActiveDocument.Range
    With rng.Find
        .Text = strFindText
        Do While .Execute()
            ReDim Preserve strPageNumbers(p)
            strPageNumbers(p) = rng.Information(wdActiveEndSectionNumber) & "-" & rng.Information(wdActiveEndPageNumber)
            p = p + 1
        Loop
    End With
    For i = 0 To p - 1
        MsgBox strPageNumbers(i)
    Next i
End Sub

Open in new window

Sorry. cross-posted. You can ignore my snippet for now, I will put your code in a snippet box,
I am not sure about what the code expects to happen,

It finds the text string the document body, then looks at the text in the primary header of the section. Assuming that the document only has primary headers and that the only text in the header is the page number, then you will get the page number display from one of the pages in the section or one of the previous sections. One primary header per section is the upper limit because a sections header might be linked to that of a previous section.

The page number as printed or displayed on the screen is calculated there and then. It is not held in any accessible text.

Can I suggest that you use the logic in my VBA macro?. It finds the section and the page number of the found range and assembles them into what I guess the desired format.  Try the macro in the VBA IDE first to see if it lists the right pages.
Avatar of WonHop

ASKER

I will give your snippet a try.  Below is a little explanation of the the code is doing at certain spots.

The code is searching a query in a List Box and getting the Terms.  Such as "ACBM:
        For Each item As DataGridViewRow In Me.DataGridView2.Rows
            Dim acroString As String = Trim(item.Cells(0).Value)

This is where it starts searching the Word Document.
            Dim findRange As Word.Range = copyDocument.Range
            '0 = total count/1=number in tables/2= not in tables/3 = in front material/4 = in body
            Dim CountAndBodyOrTable(IN_BODY) As Long
            Dim definationRange As Word.Range = copyDocument.Range
            findRange.Find.Text = Trim(item.Cells(0).Value)
            findRange.Find.MatchWholeWord = True
            findRange.Find.MatchCase = True

This is where it captures the Page Number.
            Do While findRange.Find.Execute() = True

                Dim pageNumberText As String = findRange.Sections(1).Footers(WdHeaderFooterIndex.wdHeaderFooterPrimary).Range.Text
                pageNumberText = Trim(pageNumberText.Replace(Chr(13), vbNullString))

That is where the problem is.  The VERY FIRST find is usually incorrect.  After that, it will actually find all of the ones in the document and collect ALL the correct page numbers.  
The code was written use the very first page number found, because that is what is needed.  That is why I added some code to try to find and keep the lowest page found as it looped thru the document.

When the incorrect page number is lower than the first correct page number, the code still adds the incorrect page number to the final report.

Thanks
Avatar of WonHop

ASKER

When I add this code to the Visual Basic Editor, I get the Error Messages below

Error      BC30807      'Let' and 'Set' assignment statements are no longer supported.      
Error      BC30451      'wdActiveEndSectionNumber' is not declared. It may be inaccessible due to its protection level.
Error      BC30451      'wdActiveEndSectionNumber' is not declared. It may be inaccessible due to its protection level.

The path to the Word Document is C:\Users\nneal\Documents\Acrofinder_TestFiles\05022018.docx
The code is VBA, not VB.Net. The idea is to test it to see if it works properly on your document, before converting it to .Net.

Manually open the target document in Word . Open the VBA editor using Alt + F11. Paste the macro into the code pane. Make sure the hard-coded acronym is correct. Put the text cursor somewhere in the code and press F5. You should get a series of message boxes (one for each finding ). If that is satisfactory, we can try the conversion.
Avatar of WonHop

ASKER

OOhhh.  OK great.  I will try that.  It will be tomorrow before I can get back with you with a response.
Thank you very much.
Avatar of WonHop

ASKER

Hello GrahamSkanRetired

I tried the code in MS Word.  It did not work.
It goes from:
Do While .Execute()
To:
End With.

Then it goes from:
 For i = 0 To p - 1
To:
Exit Sub

In the Watch Window, the code below did  get the data from the file.
Set rng = ActiveDocument.Range
It looks for "ABCM" (without quotes). are there some occurrences of that in the active document?
I see that you quoted "ACBM". I accidentally transposed the first two letters. I suggest that you correct it in the code.
Avatar of WonHop

ASKER

Hello.  
It is looping thru the Document.  It finding and capturing the MS Word Page Number.  i.e. 76, 78, 70 and 87.  That part is working great.  It is going to the correct first page
They are showing in the Watch Window as "20-76" and so on.  What is the "20" used for?

Those are the correct pages that the data on.  But what is need is for it to capture the Page Number in the Footer.
How is the page numbering set up? See Insert tab. Header and Footer group, Page Number dropdown, Page Number format.

I assumed the the prefix was the Section number, but it might be done using Heading-defined chapter numbers. You can toggle the field view between result and code using Alt + F9 on any selected text. I was expecting it to look like { SectionPages } - { Page }.


 So my code expects it to be achieved using the Section number and the Page number, with the page numbering restarting as each Section.

(I worked late last night, bit I'm a bit tired now - it's19:30 here -  so I can't guarantee to stay with it until the end of your work day)
Avatar of WonHop

ASKER

This is the Page Number Setup photo.
I am glad for any help you give, whenever you can give it.
Tomorrow is fine.

Thanks
Page_Number_SetUp.PNG
Avatar of WonHop

ASKER

Here is another one.
Page_Number_SetUp_2.PNG
Sorry, I didn't explain fully.

For the sake of keeping the page numbering consistent with changes in the text and environment (especially printer changes) ,  page numbering is normally not absolute, but is a calculation for the current circumstance, so that such things an Indexes and Tables of Content remain accurate.

In order to achieve this objective, page numbering is usually implemented by the use of one or more fields.  Thus the page number calculation is done dynamically at the time of printing or display

The default view of a document shows it with the fields calculation done and the results showing. It is possible to switch this view for the  Selected part of the document (Ctrl+A selects all), so that we can alternate between what you usually see, i.e. the results of the calculation, and seeing  the field codes that are used to do this calculation. In the field code display mode, fields usually look thisi; { Page } as opposed to simply showing the page number.
Avatar of WonHop

ASKER

Please excuse me.  I am not the greatest with MS Word.  I will show question to the lady I doing this for.
In the meantime, does the photo attached give you help?

Thanks
Nate
Page_Number_SetUp_3.PNG
Avatar of WonHop

ASKER

Here the result of the Alt + F9.

Thanks
Page_Number_SetUp_4.PNG
Avatar of WonHop

ASKER

For the 7-31 page number type,, it shows  {page}.

The other one I sent was for type  A-1 page numbers.
I have created a document using chapter numbers and Styleref fields so that I can test your logic, but I can't get any the correct footers for the found text. I note that your document uses SEQ fields and have yet to test that, but I don't think the way that the section/chapter numbers are created will affect the result.
I cannot insert an SEQ field into the footer. It shows 'Main document only' instead. This is in Word 2016 and Word 2003 does the same.
Avatar of WonHop

ASKER

I talked to lady who creates the documents.  She is going to create a fake document with none of our real data in it.  I am hoping you can use the format to get what you need.  She said it will have to be later this afternoon.  I might have to send it to you tomorrow.  We are using Word 2016.


Thanks
Nate
That sounds like a good idea
Avatar of WonHop

ASKER

Hello.  here is the test document.  In the example below, the page number is collect from the middle section of the footer as well the data from the left side of the data.

1-2Avengers April 27, 2018

Thanks
Avengers_Infinity_War_Test.docx
ASKER CERTIFIED SOLUTION
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of WonHop

ASKER

Hello.  I tested the code below on the fake and real documents.  It does find and go to the correct first page.  I commented out the stuff that was not needed.  At least i saw it nothing to do with what was trying to achieve.  If I am miss understanding something with the code that I commented out, please let me know.

Can we take this step by step from here.
Getting to the correct page is the first step.  Could you please help convert this to Visual Basic. The two lines below really gives me a good start.  Once I get to select the correct page in Visual Basic, then we can start looking at capturing the document page numbers.
The code that is already there does that just fine.  It will just not selecting the correct page number.  I am hoping if we can get the code you wrote to select the correct page, then we can use the code that is already there to capture the page number in the footer.

        pdPage = findRange.Information(wdActiveEndPageNumber)
        Selection.GoTo wdGoToPage, wdGoToAbsolute, pdPage  - I found this code online.  This goes to page number in the pdPage Variable.



Sub Q29099539()
Dim findRange As Word.Range
Dim strFindText As String
Dim pageNumberText As String
Dim sec As Section
Dim secRange As Range
Dim pdSecStartPage As Integer
Dim pdPage As Integer
Dim pSecPage As Integer
Dim strPageNumber As String
Dim intSearchResultsFound As Integer

Set findRange = ActiveDocument.Range
strFindText = "ACBM"
intSearchResultsFound = 0
With findRange.Find
    .Text = strFindText
    .MatchWholeWord = True
    .MatchCase = True
    Do While findRange.Find.Execute() = True
        Set sec = findRange.Sections(1)
       
       'get fields in footer
        Dim fld As Field
        Dim ftr As HeaderFooter
        Set ftr = sec.Footers(wdHeaderFooterPrimary)
       
        'Get section start page number in document
        Set secRange = sec.Range
        secRange.Collapse wdCollapseStart
        pdSecStartPage = secRange.Information(wdActiveEndPageNumber)
        pdPage = findRange.Information(wdActiveEndPageNumber)
        Selection.GoTo wdGoToPage, wdGoToAbsolute, pdPage
        intSearchResultsFound = intSearchResultsFound + 1
       
        'MsgBox "The selection is on page " & secRange.Information(wdActiveEndPageNumber) & " of page " & Selection.Information(wdNumberOfPagesInDocument)
'        MsgBox "The selection is on page " & Selection.Information(wdActiveEndPageNumber) & " of page " & Selection.Information(wdNumberOfPagesInDocument)
'       pSecPage = pdPage - pdSecStartPage + GetFirstPageNumberOfSection(ftr)
'
'
'        strPageNumber = ""
'        For Each fld In ftr.Range.Fields
'            If Len(strPageNumber) > 0 Then
'                strPageNumber = strPageNumber & "-"
'            End If
'            Select Case fld.Type
'                Case wdFieldPage
'                    strPageNumber = strPageNumber & pSecPage
'                Case wdFieldSequence
'                    strPageNumber = strPageNumber & "A"
'            End Select
'        Next fld
'
'        Debug.Print strPageNumber
    Loop
End With
'
'MsgBox intSearchResultsFound & vbCrLf & pdSecStartPage & vbCrLf & pdPage


End Sub
Avatar of WonHop

ASKER

Hey there.  I think I got it working.   The lady is going to check it for accuracy.  If she says everything is fine, then I will award you the points.

Thanks
Thank you for the update.

I had some difficulty in integrating with your code, so I was working on a Function so that you could call.

I'll stand that down and await your next comment.
Avatar of WonHop

ASKER

The line of code list below what the code that I needed to help me solve the problem I was having:

                pdPage = findRange.Information(wdActiveEndPageNumber)

It selected the correct page number in MS Word.  That allowed me to capture the correct page number in the footer.
Below is the final code that finds the correct page number the Term is found.  It then goes to that page.

                'THIS FINDS THE FIRST PAGE AND THEN GOES TO THE FIRST PAGE.  IF NEEDED, IT WILL LOOP THRU THE PAGES AND FIND THEM ALL.
                Dim pdPage As String  '05172018
                pdPage = findRange.Information(WdInformation.wdActiveEndPageNumber)  '05172018
                Dim wdGoToAbsolute As Object = Nothing  '05172018
                Dim wdGoToPage As Object = Nothing  '05172018
                gAppWord.Selection.GoTo(wdGoToPage, wdGoToAbsolute, pdPage)  '05172018

Thanks for all of your help.
Thanks.

Remember that the page is not a fixed concept in MS Word. It does not correspond to a Page object. This is because of the design objective of the application regards pagination as unfixed and always subject to recalculation. That is why it in necessary to return the current result of the calculation with the Information() Function