Extracting text reference and table reference

Dear experts,
I need a macro to be run in MS word which look for the specific word ‘Article’.
Now once the document identifies the above word. It will then extract the entire length ‘Like Article 1485 (2)’
(Let us say article = 7 length, space 1 and then 8 lengths after this. So in total the text to be extracted is 16 length)
The reference word ‘Article999999999’ is always in a box (as shown below) and the box will have a reference right above it.

So the macro should extract both the Reference table and reference word ‘Article999999999’.

The input and the output are shown below:
As a point:
1.      There could be repetitions of the same Article across the document, but the macro should still extract the table reference every time.
2.      The macro will have to perform the above task, tidying and cleaning it up will be done by myself.
Extracting-text-reference-and-table.docx
ExcellearnerAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

DrTribosCommented:
Add a reference to RegEx (regular expressions) - in the VBA IDE Tools > References then scroll down to Microsoft VBScript Regular Expressions 1.0

To find the patter you described you can (may be able to) use this expression:

Article\s{0,1}[1-9]{1,9}\s{0,1}\({0,1}[0-9]{1,9}\){0,1}

It will, for example, let you find the following:

Article111(222)
Article 111 (222)
Article111 222
Article 111 222

You can test the regular expression using an online tester such as https://regex101.com/
DrTribosCommented:
GrahamSkanRetiredCommented:
This creates a new document with a table. In the Active document, it looks for the word 'Article' in a table cell, and puts it and the rest of the cell's text into the first cell of a new row in the new table. It then puts the text of the prior paragraph into the second cell of the row.
Sub GetArticle()
    Dim InDoc As Document
    Dim OutDoc As Document
    Dim OutTable As Table
    Dim Intable As Table
    Dim rng As Range
    Dim rnga As Range
    Dim rw As Row
    
    Set InDoc = ActiveDocument
    Set OutDoc = Documents.Add
    Set OutTable = OutDoc.Tables.Add(OutDoc.Range, 1, 2)
    OutTable.Cell(1, 1).Range.Text = "Article"
    OutTable.Cell(1, 2).Range.Text = "Table"
    Set rng = InDoc.Range
    With rng.Find
        .Text = "Article"
        Do While .Execute()
            If rng.Tables.Count = 1 Then
                Set rnga = rng.Duplicate
                rnga.End = rnga.Cells(1).Range.End - 1
                Set rw = OutTable.Rows.Add
                rw.Cells(1).Range.Text = rnga.Text
                Set rnga = rng.Tables(1).Range
                rnga.Collapse wdCollapseStart
                rnga.Move wdCharacter, -1
                rnga.Expand wdParagraph
                rnga.MoveEnd wdCharacter, -1
                rw.Cells(2).Range.Text = rnga.Text
            End If
        Loop
    End With
End Sub

Open in new window

Why Diversity in Tech Matters

Kesha Williams, certified professional and software developer, explores the imbalance of diversity in the world of technology -- especially when it comes to hiring women. She showcases ways she's making a difference through the Colors of STEM program.

DrTribosCommented:
Adding to Grahams...
Function Finder() As Range

Dim regEx1 As Object
Dim matches As Object
          
10    On Error GoTo eh
20    Set regEx1 = CreateObject("VBScript.RegExp")
30        regEx1.Global = True
40        regEx1.Pattern = "Article\s{0,1}[1-9]{1,9}\s{0,1}\({0,1}[0-9]{1,9}\){0,1}"
50        regEx1.IgnoreCase = True
60    Set matches = regEx1.Execute(ActiveDocument.Range.Text)

70    If matches.Count = 0 Then
    ' No matches
80        GoTo lbl_Exit
90    End If

100   For i = 1 To matches.Count

110       With rng.Find
120     .Text = matches(i)
130     Do While .Execute()
140         If rng.Tables.Count = 1 Then
150             Set rnga = rng.Duplicate
160             rnga.End = rnga.Cells(1).Range.End - 1
170             Set rw = OutTable.Rows.Add
180             rw.Cells(1).Range.Text = rnga.Text
190             Set rnga = rng.Tables(1).Range
200             rnga.Collapse wdCollapseStart
210             rnga.Move wdCharacter, -1
220             rnga.Expand wdParagraph
230             rnga.MoveEnd wdCharacter, -1
240             rw.Cells(2).Range.Text = rnga.Text
250         End If
260     Loop
280   End With
290   Next i




lbl_Exit:
300       Exit Function
          
eh:
310       GoTo lbl_Exit
          
End Function

Open in new window

DrTribosCommented:
Using the RegEx avoids finding the word Article elsewhere in the document...

I expect some tweaking might be required because the 'matches' object can store several identical matches, so these may need to be filtered out.

Graham - have a good one, I gotta go :-)
GrahamSkanRetiredCommented:
Hi Steve. Hope all is still OK for February
DrTribosCommented:
Thanks, full steam ahead. Code 'Pink' ;-)
ExcellearnerAuthor Commented:
Graham,

The macro extracted the date into two columns, the left hand side (column 1) had the reference to text string Article and the right hand side for the header.
Observations:
1.      The macro did not restrict to certain length of the text string, but it extract all the text in the same para
2.      The length of the header seems to be short and moreover the count appears to start from Right rather than left
3.      Secondly the pt 2 should be extract for every citation of Text string ‘Article’

Will you be able to make amendments to macro please.

Thanks for the quick turnaround
ExcellearnerAuthor Commented:
Dr Tribos, your macro has not yield any result.

Could you please tell me the exact steps to make sure that I have actually runit correctly.
GrahamSkanRetiredCommented:
The code emulates what you have shown in your sample. I described its action in the text in case I had misunderstood the problem.

1. You seem to be saying that the wrong number of characters are copied into the table, but it isn't clear what the right number of characters should be.
 
2. I don't understand why you say that the length of the header is short, or why the count appears to be in the wrong direction.

3. In your sample, the text was always in table cells. Are you saying that, in practice, we should look for all occurrences? If that is the case, what are the criteria for finding the 'Reference table' text. Is it just the text of the previous paragraph, or will there be more paragraphs in between?
ExcellearnerAuthor Commented:
Graham,

Thank you for your response.

I have attached an extract from my live sample. Sorry for not having been efficient.

Thank you and kindly help.
Live-file-example.docx
DrTribosCommented:
Hi, was missing some of the code (un-numbered lines below) from Graham's solution.  

Function Finder() As Range

Dim regEx1 As Object
Dim matches As Object
          
10    On Error GoTo eh
20    Set regEx1 = CreateObject("VBScript.RegExp")
30        regEx1.Global = True
40        regEx1.Pattern = "Article\s{0,1}[1-9]{1,9}\s{0,1}\({0,1}[0-9]{1,9}\){0,1}"
50        regEx1.IgnoreCase = True
60    Set matches = regEx1.Execute(ActiveDocument.Range.Text)

70    If matches.Count = 0 Then
    ' No matches
80        GoTo lbl_Exit
90    End If
  
Dim InDoc As Document
Dim OutDoc As Document
Dim OutTable As Table
Dim Intable As Table
Dim rng As Range
Dim rnga As Range
Dim rw As Row

Set InDoc = ActiveDocument
Set OutDoc = Documents.Add
Set OutTable = OutDoc.Tables.Add(OutDoc.Range, 1, 2)
OutTable.Cell(1, 1).Range.Text = "Article"
OutTable.Cell(1, 2).Range.Text = "Table"
Set rng = InDoc.Range


100   For i = 1 To matches.Count
110       With rng.Find
120     .Text = matches(i)
130     Do While .Execute()
140         If rng.Tables.Count = 1 Then
150             Set rnga = rng.Duplicate
160             rnga.End = rnga.Cells(1).Range.End - 1
170             Set rw = OutTable.Rows.Add
180             rw.Cells(1).Range.Text = rnga.Text
190             Set rnga = rng.Tables(1).Range
200             rnga.Collapse wdCollapseStart
210             rnga.Move wdCharacter, -1
220             rnga.Expand wdParagraph
230             rnga.MoveEnd wdCharacter, -1
240             rw.Cells(2).Range.Text = rnga.Text
250         End If
260     Loop
280   End With
290   Next i




lbl_Exit:
300       Exit Function
          
eh:
310       GoTo lbl_Exit
          
End Function

Open in new window


This is the output from your Live-file-example
Article :: Table
article 400 (4):“ or repayment difficulties. :: 1.1 Description

BTW - the 'box' is actually a single cell table
GrahamSkanRetiredCommented:
This uses Word's wildcarding instead of Regex, but the principle is the same.

It finds the 'article' text, and uses its length to start the extraction that number of characters before the text.
Sub GetArticle()
    Dim InDoc As Document
    Dim OutDoc As Document
    Dim OutTable As Table
    Dim Intable As Table
    Dim rng As Range
    Dim rnga As Range
    Dim rw As Row
    
    Set InDoc = ActiveDocument
    Set OutDoc = Documents.Add
    Set OutTable = OutDoc.Tables.Add(OutDoc.Range, 1, 2)
    OutTable.Cell(1, 2).Range.Text = "Article"
    OutTable.Cell(1, 1).Range.Text = "Table"
    Set rng = InDoc.Range
    With rng.Find
        .MatchWildcards = True
        .Text = "article [0-9]{1,} \([0-9]{1,}\)"
        Do While .Execute()
            If rng.Tables.Count = 1 Then
                Set rnga = rng.Duplicate
                rnga.Start = rnga.Start - Len(rnga)
                Set rw = OutTable.Rows.Add
                rw.Cells(2).Range.Text = rnga.Text
                Set rnga = rng.Tables(1).Range
                rnga.Collapse wdCollapseStart
                rnga.Move wdCharacter, -1
                rnga.Expand wdParagraph
                rnga.MoveEnd wdCharacter, -1
                rw.Cells(1).Range.Text = rnga.Text
            End If
        Loop
    End With
End Sub

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
DrTribosCommented:
Nice one Mr Skan - feeling humbled.
GrahamSkanRetiredCommented:
That wasn't my intention Steve.

Actually, I though that you had a better understanding of the requirement than I did (do)
DrTribosCommented:
No... tis cool. It never occurred to me to use wild cards. I think I had a similar problem once! 😉
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Web Browsers

From novice to tech pro — start learning today.