excel learner
asked on
Extracting text reference and table reference
Dear experts,
I need a macro to be run in MS word which look for the specific word ‘Article’.
Now once the document identifies the above word. It will then extract the entire length ‘Like Article 1485 (2)’
(Let us say article = 7 length, space 1 and then 8 lengths after this. So in total the text to be extracted is 16 length)
The reference word ‘Article999999999’ is always in a box (as shown below) and the box will have a reference right above it.
So the macro should extract both the Reference table and reference word ‘Article999999999’.
The input and the output are shown below:
As a point:
1. There could be repetitions of the same Article across the document, but the macro should still extract the table reference every time.
2. The macro will have to perform the above task, tidying and cleaning it up will be done by myself.
Extracting-text-reference-and-table.docx
I need a macro to be run in MS word which look for the specific word ‘Article’.
Now once the document identifies the above word. It will then extract the entire length ‘Like Article 1485 (2)’
(Let us say article = 7 length, space 1 and then 8 lengths after this. So in total the text to be extracted is 16 length)
The reference word ‘Article999999999’ is always in a box (as shown below) and the box will have a reference right above it.
So the macro should extract both the Reference table and reference word ‘Article999999999’.
The input and the output are shown below:
As a point:
1. There could be repetitions of the same Article across the document, but the macro should still extract the table reference every time.
2. The macro will have to perform the above task, tidying and cleaning it up will be done by myself.
Extracting-text-reference-and-table.docx
To implement, Patrick Matthews article might be of interest: https://www.experts-exchange.com/articles/1336/Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html
This creates a new document with a table. In the Active document, it looks for the word 'Article' in a table cell, and puts it and the rest of the cell's text into the first cell of a new row in the new table. It then puts the text of the prior paragraph into the second cell of the row.
Sub GetArticle()
Dim InDoc As Document
Dim OutDoc As Document
Dim OutTable As Table
Dim Intable As Table
Dim rng As Range
Dim rnga As Range
Dim rw As Row
Set InDoc = ActiveDocument
Set OutDoc = Documents.Add
Set OutTable = OutDoc.Tables.Add(OutDoc.Range, 1, 2)
OutTable.Cell(1, 1).Range.Text = "Article"
OutTable.Cell(1, 2).Range.Text = "Table"
Set rng = InDoc.Range
With rng.Find
.Text = "Article"
Do While .Execute()
If rng.Tables.Count = 1 Then
Set rnga = rng.Duplicate
rnga.End = rnga.Cells(1).Range.End - 1
Set rw = OutTable.Rows.Add
rw.Cells(1).Range.Text = rnga.Text
Set rnga = rng.Tables(1).Range
rnga.Collapse wdCollapseStart
rnga.Move wdCharacter, -1
rnga.Expand wdParagraph
rnga.MoveEnd wdCharacter, -1
rw.Cells(2).Range.Text = rnga.Text
End If
Loop
End With
End Sub
Adding to Grahams...
Function Finder() As Range
Dim regEx1 As Object
Dim matches As Object
10 On Error GoTo eh
20 Set regEx1 = CreateObject("VBScript.RegExp")
30 regEx1.Global = True
40 regEx1.Pattern = "Article\s{0,1}[1-9]{1,9}\s{0,1}\({0,1}[0-9]{1,9}\){0,1}"
50 regEx1.IgnoreCase = True
60 Set matches = regEx1.Execute(ActiveDocument.Range.Text)
70 If matches.Count = 0 Then
' No matches
80 GoTo lbl_Exit
90 End If
100 For i = 1 To matches.Count
110 With rng.Find
120 .Text = matches(i)
130 Do While .Execute()
140 If rng.Tables.Count = 1 Then
150 Set rnga = rng.Duplicate
160 rnga.End = rnga.Cells(1).Range.End - 1
170 Set rw = OutTable.Rows.Add
180 rw.Cells(1).Range.Text = rnga.Text
190 Set rnga = rng.Tables(1).Range
200 rnga.Collapse wdCollapseStart
210 rnga.Move wdCharacter, -1
220 rnga.Expand wdParagraph
230 rnga.MoveEnd wdCharacter, -1
240 rw.Cells(2).Range.Text = rnga.Text
250 End If
260 Loop
280 End With
290 Next i
lbl_Exit:
300 Exit Function
eh:
310 GoTo lbl_Exit
End Function
Using the RegEx avoids finding the word Article elsewhere in the document...
I expect some tweaking might be required because the 'matches' object can store several identical matches, so these may need to be filtered out.
Graham - have a good one, I gotta go :-)
I expect some tweaking might be required because the 'matches' object can store several identical matches, so these may need to be filtered out.
Graham - have a good one, I gotta go :-)
Hi Steve. Hope all is still OK for February
Thanks, full steam ahead. Code 'Pink' ;-)
ASKER
Graham,
The macro extracted the date into two columns, the left hand side (column 1) had the reference to text string Article and the right hand side for the header.
Observations:
1. The macro did not restrict to certain length of the text string, but it extract all the text in the same para
2. The length of the header seems to be short and moreover the count appears to start from Right rather than left
3. Secondly the pt 2 should be extract for every citation of Text string ‘Article’
Will you be able to make amendments to macro please.
Thanks for the quick turnaround
The macro extracted the date into two columns, the left hand side (column 1) had the reference to text string Article and the right hand side for the header.
Observations:
1. The macro did not restrict to certain length of the text string, but it extract all the text in the same para
2. The length of the header seems to be short and moreover the count appears to start from Right rather than left
3. Secondly the pt 2 should be extract for every citation of Text string ‘Article’
Will you be able to make amendments to macro please.
Thanks for the quick turnaround
ASKER
Dr Tribos, your macro has not yield any result.
Could you please tell me the exact steps to make sure that I have actually runit correctly.
Could you please tell me the exact steps to make sure that I have actually runit correctly.
The code emulates what you have shown in your sample. I described its action in the text in case I had misunderstood the problem.
1. You seem to be saying that the wrong number of characters are copied into the table, but it isn't clear what the right number of characters should be.
2. I don't understand why you say that the length of the header is short, or why the count appears to be in the wrong direction.
3. In your sample, the text was always in table cells. Are you saying that, in practice, we should look for all occurrences? If that is the case, what are the criteria for finding the 'Reference table' text. Is it just the text of the previous paragraph, or will there be more paragraphs in between?
1. You seem to be saying that the wrong number of characters are copied into the table, but it isn't clear what the right number of characters should be.
2. I don't understand why you say that the length of the header is short, or why the count appears to be in the wrong direction.
3. In your sample, the text was always in table cells. Are you saying that, in practice, we should look for all occurrences? If that is the case, what are the criteria for finding the 'Reference table' text. Is it just the text of the previous paragraph, or will there be more paragraphs in between?
ASKER
Graham,
Thank you for your response.
I have attached an extract from my live sample. Sorry for not having been efficient.
Thank you and kindly help.
Live-file-example.docx
Thank you for your response.
I have attached an extract from my live sample. Sorry for not having been efficient.
Thank you and kindly help.
Live-file-example.docx
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Nice one Mr Skan - feeling humbled.
That wasn't my intention Steve.
Actually, I though that you had a better understanding of the requirement than I did (do)
Actually, I though that you had a better understanding of the requirement than I did (do)
No... tis cool. It never occurred to me to use wild cards. I think I had a similar problem once! 😉
To find the patter you described you can (may be able to) use this expression:
Article\s{0,1}[1-9]{1,9}\s
It will, for example, let you find the following:
Article111(222)
Article 111 (222)
Article111 222
Article 111 222
You can test the regular expression using an online tester such as https://regex101.com/