Link to home
Start Free TrialLog in
Avatar of excel learner
excel learnerFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Extracting text reference and table reference

Dear experts,
I need a macro to be run in MS word which look for the specific word ‘Article’.
Now once the document identifies the above word. It will then extract the entire length ‘Like Article 1485 (2)’
(Let us say article = 7 length, space 1 and then 8 lengths after this. So in total the text to be extracted is 16 length)
The reference word ‘Article999999999’ is always in a box (as shown below) and the box will have a reference right above it.

So the macro should extract both the Reference table and reference word ‘Article999999999’.

The input and the output are shown below:
As a point:
1.      There could be repetitions of the same Article across the document, but the macro should still extract the table reference every time.
2.      The macro will have to perform the above task, tidying and cleaning it up will be done by myself.
Extracting-text-reference-and-table.docx
Avatar of DrTribos
DrTribos
Flag of Australia image

Add a reference to RegEx (regular expressions) - in the VBA IDE Tools > References then scroll down to Microsoft VBScript Regular Expressions 1.0

To find the patter you described you can (may be able to) use this expression:

Article\s{0,1}[1-9]{1,9}\s{0,1}\({0,1}[0-9]{1,9}\){0,1}

It will, for example, let you find the following:

Article111(222)
Article 111 (222)
Article111 222
Article 111 222

You can test the regular expression using an online tester such as https://regex101.com/
Avatar of GrahamSkan
This creates a new document with a table. In the Active document, it looks for the word 'Article' in a table cell, and puts it and the rest of the cell's text into the first cell of a new row in the new table. It then puts the text of the prior paragraph into the second cell of the row.
Sub GetArticle()
    Dim InDoc As Document
    Dim OutDoc As Document
    Dim OutTable As Table
    Dim Intable As Table
    Dim rng As Range
    Dim rnga As Range
    Dim rw As Row
    
    Set InDoc = ActiveDocument
    Set OutDoc = Documents.Add
    Set OutTable = OutDoc.Tables.Add(OutDoc.Range, 1, 2)
    OutTable.Cell(1, 1).Range.Text = "Article"
    OutTable.Cell(1, 2).Range.Text = "Table"
    Set rng = InDoc.Range
    With rng.Find
        .Text = "Article"
        Do While .Execute()
            If rng.Tables.Count = 1 Then
                Set rnga = rng.Duplicate
                rnga.End = rnga.Cells(1).Range.End - 1
                Set rw = OutTable.Rows.Add
                rw.Cells(1).Range.Text = rnga.Text
                Set rnga = rng.Tables(1).Range
                rnga.Collapse wdCollapseStart
                rnga.Move wdCharacter, -1
                rnga.Expand wdParagraph
                rnga.MoveEnd wdCharacter, -1
                rw.Cells(2).Range.Text = rnga.Text
            End If
        Loop
    End With
End Sub

Open in new window

Adding to Grahams...
Function Finder() As Range

Dim regEx1 As Object
Dim matches As Object
          
10    On Error GoTo eh
20    Set regEx1 = CreateObject("VBScript.RegExp")
30        regEx1.Global = True
40        regEx1.Pattern = "Article\s{0,1}[1-9]{1,9}\s{0,1}\({0,1}[0-9]{1,9}\){0,1}"
50        regEx1.IgnoreCase = True
60    Set matches = regEx1.Execute(ActiveDocument.Range.Text)

70    If matches.Count = 0 Then
    ' No matches
80        GoTo lbl_Exit
90    End If

100   For i = 1 To matches.Count

110       With rng.Find
120     .Text = matches(i)
130     Do While .Execute()
140         If rng.Tables.Count = 1 Then
150             Set rnga = rng.Duplicate
160             rnga.End = rnga.Cells(1).Range.End - 1
170             Set rw = OutTable.Rows.Add
180             rw.Cells(1).Range.Text = rnga.Text
190             Set rnga = rng.Tables(1).Range
200             rnga.Collapse wdCollapseStart
210             rnga.Move wdCharacter, -1
220             rnga.Expand wdParagraph
230             rnga.MoveEnd wdCharacter, -1
240             rw.Cells(2).Range.Text = rnga.Text
250         End If
260     Loop
280   End With
290   Next i




lbl_Exit:
300       Exit Function
          
eh:
310       GoTo lbl_Exit
          
End Function

Open in new window

Using the RegEx avoids finding the word Article elsewhere in the document...

I expect some tweaking might be required because the 'matches' object can store several identical matches, so these may need to be filtered out.

Graham - have a good one, I gotta go :-)
Hi Steve. Hope all is still OK for February
Thanks, full steam ahead. Code 'Pink' ;-)
Avatar of excel learner

ASKER

Graham,

The macro extracted the date into two columns, the left hand side (column 1) had the reference to text string Article and the right hand side for the header.
Observations:
1.      The macro did not restrict to certain length of the text string, but it extract all the text in the same para
2.      The length of the header seems to be short and moreover the count appears to start from Right rather than left
3.      Secondly the pt 2 should be extract for every citation of Text string ‘Article’

Will you be able to make amendments to macro please.

Thanks for the quick turnaround
Dr Tribos, your macro has not yield any result.

Could you please tell me the exact steps to make sure that I have actually runit correctly.
The code emulates what you have shown in your sample. I described its action in the text in case I had misunderstood the problem.

1. You seem to be saying that the wrong number of characters are copied into the table, but it isn't clear what the right number of characters should be.
 
2. I don't understand why you say that the length of the header is short, or why the count appears to be in the wrong direction.

3. In your sample, the text was always in table cells. Are you saying that, in practice, we should look for all occurrences? If that is the case, what are the criteria for finding the 'Reference table' text. Is it just the text of the previous paragraph, or will there be more paragraphs in between?
Graham,

Thank you for your response.

I have attached an extract from my live sample. Sorry for not having been efficient.

Thank you and kindly help.
Live-file-example.docx
SOLUTION
Avatar of DrTribos
DrTribos
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Nice one Mr Skan - feeling humbled.
That wasn't my intention Steve.

Actually, I though that you had a better understanding of the requirement than I did (do)
No... tis cool. It never occurred to me to use wild cards. I think I had a similar problem once! 😉