Link to home
Start Free TrialLog in
Avatar of marrick13
marrick13

asked on

Word macro to extract text by style and capture numbering if style contains list numbering

I have a macro I adapted from another question on Experts Exchange (https://www.experts-exchange.com/questions/27585165/Microsoft-Word-Macro-to-Find-Copy-and-Paste-Text-with-Specific-Style-to-a-new-Document.html) that I am trying to enhance.

I want it to extract all text that has a user-defined style to a new document but also to include the numbering if the style is a numbering style. I added these lines to achieve the latter, but they do not do the job:
Set oRange = oDoc.Range(Start:=rText.Start, End:=rText.End)
oRange.ListFormat.ConvertNumbersToText wdNumberParagraph
'Then undo so that numbers revert to the original in oDoc
 oDoc.Undo

Open in new window

The idea was to convert the numbers to text to preserve them in the output document but revert to their dynamic state in the source document (this macro would be a great tool to check to see where the list numbering is off wherever it occurs in the document).

The macro also goes into an endless loop when entering ‘Body Text’, ‘Body Text 2’, ‘List Number’, List Number 2, but not ‘Body Text 3’, ‘List Number 3’ (of those I tested). I added a counter to catch the endless loop at some point and display a message box to that effect, but I don’t understand why the macro does this on some styles and not on others.

Can someone help prevent the endless loop and get the macro to properly include the numbering when the style contains numbering?
Extract_Text_by_Style-Test--Mar-30-.docm
Avatar of Jacques Geday
Jacques Geday
Flag of Canada image

I see this question has been un-answered. Are you still interested in a solution ?
Gowflow
Avatar of marrick13
marrick13

ASKER

Gowflow,

Thanks for answering. Yes, I am interested in a solution. Since no one has responded in 3 weeks, I’ve been trying to come up with a single macro that does the job.  The original task was simple to extract all text that has a user-defined style to a new document, and if that style is a numbering style, to include the numbering in the output document. Same for bullet styles – I want the bullets to be included in the output document. This would be very useful for large documents because Word has a tendency for numbered lists to become skewed even if using properly structured list number styles, and extracting such styles with their actual numbering would make it a lot easier to correct than going through the whole document). I also wanted the option of having the page and line numbers of each instance of the style test to appear above that instance in the output document (the attached macros all automatically include the page and line numbers; I am planning to add a user option to shut that off).

The original macro, which I now call “Extract_Text_by_Style_Old”, went into an endless loop when entering ‘Body Text’, ‘Body Text 2’, ‘List Number’, List Number 2, but not ‘Body Text 3’, ‘List Number 3’ (of those I tested). I added a counter to catch the endless loop at some point and display a message box to that effect, but I don’t understand why the macro does this on some styles and not on others.

I have since found that this macro does not perform equally on all styles. With some it produces no output, with others the output is inaccurate or incomplete. So I began looking at alternatives. I ended up with 3 new macros, which I have placed in the attached ‘Extract_Text_by_Style-Test-Apr 20’ file. This file also includes the test data and a table with the results of the latest tests.

One of them, the ‘Extract_Text_by_Style_Copy’, is my attempt to capture the numbers when a list number style is chosen. The only way I could achieve this was to copy the original document and remove all text with styles other than the target list style. It works, albeit a bit slowly. When I ran this for a list bullet style, the page and line number text lines became formatted as “symbol” because I was inserting a row above the style text, and the bullet style was getting carried over. I solved this by shutting off the bullets for the page and line number rows – I had to retain the style for that text so the macro would keep it while removing the text with other styles.  Not slick, not pretty, but I’m not a professional programmer (and “advanced beginner” at best), but it does work. The ‘Extract_Text_by_Style_Copy’ macro, in fact, seems to work properly with most the styles I tested – but not all. You can see where it works and doesn’t in the “Final Testing” table in the ‘Extract_Text_by_Style-Test-Apr 20’ document.

I found a macro that exports text by style to Excel that is very fast, but while it works well for Heading and Caption styles, doesn’t do much of anything with any other style.

I found that no single macro produces the desired results for all or any style, so I decided that I would build a listbox with only those styles I’ve tested. These appear in the attached doc – 46 altogether, 19 of which I’ve tested.

My next idea was to finish testing the 46 styles and seeing which macros produced the desired results for which styles, then creating a userform with a listbox of only those styles and buttons for the 3 main macros (others for the Extract EndNotes and Footnotes or Extract Footnotes to Excel and List_Cross_References macros that will extract text for styles such as Endnote Reference, Endnote Text, Footnote Text and Footnote Reference, and Intense Emphasis and Intense Reference, as shown in the testing table. I would then enable or disable the buttons depending on the selected style. One thing I don’t want is to run a macro and have it hang Word or open a new document with nothing it or inaccurate data.

The subject macros are:

Extract_Text_by_Style_Old – the original that works with some styles, is fast, but goes into an endless loop with some styles
Extract_Text_by_Style_Copy – a version that copies the source document to a new one and removes all data with all but the desired style
Extract_Text_by_Style_Excel – a version that extracts text by style to Excel but seems to work only for Heading and Caption styles
Extract_Text_by_Style_Export – a version that copies the target style text to an array but seems to work properly only with “Strong” and “Emphasis” styles

Is it possible to have one macro handle all style scenarios and quickly produce an extract in a new Word (or Excel) file? I believe so, but need help in getting there. If you are up to the challenge, please let me know.
Extract_Text_by_Style-Test-Apr-20.docm
Tks ur detailed explanation. I strongly suggest you put in your question fields word as u only have VBA there and myself being an expert in Excel i am almost novice in word and terribly sorry cannot extent any help.

I strongly suggest you ask modeeators to help you get some attention to your question by asking help to word experts.

I wish all the good luck and case u have anything in excel or vb pls feel free to msg me directly i ll be more glad to help

Regards
Gowflow
Thank you. Not sure how to ask moderator to do that or to edit and add Word key words. Perhaps a moderator will see this and step in....
I will ask help for you
Gowflow
Thanks!
Welcome. I asked for help. Lets hope someone will attend.
Gowflow
I took a look at the Extract_Text_by_Style_Old routine and believe I found/fixed the issue of the "endless loop" problem

You have this section in the code which sets the rText.End value to something other than what the value was when the style was last found

            If rText.End < oDoc.Range.End - 1 Then '-1 to ignore final paragraph
                rText.End = oDoc.Range.End
                Set oRange = oDoc.Range(Start:=Selection.Range.Start, End:=Selection.Range.End)
                oRange.ListFormat.ConvertNumbersToText wdNumberParagraph
            Else
                Exit Do
            End If

Open in new window


So, I created a variable to store the last value position and then reset the rText.End value to that afterwards:

            If rText.End < oDoc.Range.End - 1 Then '-1 to ignore final paragraph
                intTextEnd = rText.End 'intTextEnd is the new variable
                rText.End = oDoc.Range.End
                Set oRange = oDoc.Range(Start:=Selection.Range.Start, End:=Selection.Range.End)
                oRange.ListFormat.ConvertNumbersToText wdNumberParagraph
                rText.End = intTextEnd 'reset the end position to the last found position
            Else
                Exit Do
            End If

Open in new window


So the routine is now set as:

Sub Extract_Text_by_Style_Old()
'Nov 15 2018 - adapted from https://www.experts-exchange.com/questions/27585165/Microsoft-Word-Macro-to-Find-Copy-and-Paste-Text-with-Specific-Style-to-a-new-Document.html
'Works great!
'Jan 30 2019 - found some data issue in document can throw macro into endless loop. Added code to stop macro if this occurs.
'Original
Dim rText As Range, oRange As Range
Dim oDoc As Document
Dim oNewDoc As Document
Dim uStyle As String, oRng As Range
Dim pResult As String, lngStyleCount As Long, DefStyle As String
Dim vbmFlags As VbMsgBoxStyle, vbmResult As VbMsgBoxResult, strMsg As String
Dim PageNum As Integer, LineNum As String
Dim i As Integer
Dim intTextEnd As Long

DefStyle = "Heading 1"
   
uInput:
 pResult = InputBox("Enter style whose text you want to extract to a new document.", "Style Name", DefStyle)
 pResult = RealInput(pResult)
 If pResult = "Input cancelled." Then
   Exit Sub
   End If
 If pResult = "" Then GoTo uInput
  uStyle = pResult 'assign psresult to ustyle if latter is not blank

Set oDoc = ActiveDocument
'Get count of style instances
On Error GoTo StyleError

  Set oRng = oDoc.Range
  With oRng.Find
    .ClearFormatting
    .Wrap = wdFindStop
    .Forward = True
    .Format = True
    .MatchWildcards = False
    .Text = ""
    .Style = oDoc.Styles(uStyle)
    Do While .Execute
      lngStyleCount = lngStyleCount + 1
      If oRng.End = oDoc.Range.End Then Exit Do
      oRng.Collapse wdCollapseEnd
    Loop
  End With

If lngStyleCount = 0 Then
    MsgBox "No text was found to extract for the style '" & uStyle & "'.", vbOKOnly, "Style Use Search"
    Exit Sub
End If

strMsg = "Instances of '" & uStyle & "' found: " & lngStyleCount & "." & vbCr & vbCr _
& "Do you want to export the text using this style to a new document?"

vbmFlags = vbYesNo
If lngStyleCount > 0 Then
    vbmResult = MsgBox(strMsg, vbmFlags, "Instance count")
End If

Select Case vbmResult
Case vbNo
    Exit Sub
End Select
    
    Set oNewDoc = Documents.Add
    Set rText = oDoc.Range
    
    'Insert page number and count in footer of output document
oNewDoc.Sections(oNewDoc.Sections.Count) _
.Footers(wdHeaderFooterPrimary).Range.Select
With Selection
    .Paragraphs(1).Alignment = wdAlignParagraphCenter
    .TypeText Text:="p. "
    .Fields.Add Range:=Selection.Range, Type:=wdFieldEmpty, Text:="PAGE ", preserveFormatting:=True
    .TypeText Text:=" of "
    .Fields.Add Range:=Selection.Range, Type:=wdFieldEmpty, Text:="NUMPAGES ", preserveFormatting:=True
End With

If ActiveWindow.View.SplitSpecial <> wdPaneNone Then
    ActiveWindow.Panes(2).Close
End If

If ActiveWindow.ActivePane.View.Type = wdNormalView Or ActiveWindow. _
    ActivePane.View.Type = wdOutlineView Then
    ActiveWindow.ActivePane.View.Type = wdPrintView
End If

ActiveWindow.ActivePane.View.SeekView = wdSeekCurrentPageHeader
Selection.EscapeKey
ActiveWindow.ActivePane.View.SeekView = wdSeekMainDocument
If ActiveWindow.View.SplitSpecial = wdPaneNone Then
    ActiveWindow.ActivePane.View.Type = wdPrintView
Else
    ActiveWindow.View.Type = wdPrintView
End If
    'Write style name at top
    oNewDoc.Range.InsertAfter "Style: '" & uStyle & "'" & vbCr
    With rText.Find
        .Style = oDoc.Styles(uStyle)
        Do While .Execute
            PageNum = rText.Information(wdActiveEndAdjustedPageNumber)
            LineNum = rText.Information(wdFirstCharacterLineNumber)
            i = i + 1
            oNewDoc.Range.InsertAfter "Page: " & PageNum & "/" & "Line: " & LineNum & " " & rText & vbCr
            rText.Collapse wdCollapseEnd
            If rText.End < oDoc.Range.End - 1 Then '-1 to ignore final paragraph
                intTextEnd = rText.End
                rText.End = oDoc.Range.End
                Set oRange = oDoc.Range(Start:=Selection.Range.Start, End:=Selection.Range.End)
                oRange.ListFormat.ConvertNumbersToText wdNumberParagraph
                rText.End = intTextEnd
            Else
                Exit Do
            End If
            If i > lngStyleCount Then
                MsgBox "There is a data problem in this document that is causing an endless loop. Program will stop now. Look for data that is unnecessarily repeating in the extract and remove it in the source document and try again.", vbOKOnly, "Endless loop message"
            Exit Sub
            End If
        Loop
    End With



    
StyleError:
    If Err.Number <> 0 Then
        MsgBox "Style doesn't exist.", vbOKOnly, "Style Name Check"
    Exit Sub
End If
End Sub

Open in new window


Line 14 above defines the new variable which are used in Line 107 and Line 111 above

Give that a try with what you are now trying to do to see if that get you further along.
irudyk,

Thanks so much. I ran some random tests with the revised code and it never got hung up or went into an endless loop. From what I've tested so far, it looks as though it will do the job EXCEPT if the style contains Word's dynamic numbering, such as a list. Even if I add numbering to a Heading style, it extracts the heading text but not the number. So far, the only macro that does include the number is my klugey 'Extract_Text_by_Style_Copy ', because it makes a copy of the whole source doc and then removes all text with every style other than the target one (entered in the input box).

If you can figure out how to get 'Extract_Text_by_Style_Old’ to include the numbering, that would be great. The 'Copy' one can be slow and falls down on some of the other styles, which is why I've been contemplating testing the main styles to see which macros worked for them and then setting up a userform that selects the 'right' macro, depending on the style. That's not a good approach but the only one I can think of that might be useful. There must be another way of extracting the numbering with its associated text besides the copy-source-delete-other-styles method...
ASKER CERTIFIED SOLUTION
Avatar of irudyk
irudyk
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
irudyk,
This loos great! I ran a few tests and they look very good. I will test more when i have time but I will accept this revision as the solution. Thanks so much!
That should be "looks great"...