marrick13
asked on
Word macro to extract text by style and capture numbering if style contains list numbering
I have a macro I adapted from another question on Experts Exchange (https://www.experts-exchange.com/questions/27585165/Microsoft-Word-Macro-to-Find-Copy-and-Paste-Text-with-Specific-Style-to-a-new-Document.html) that I am trying to enhance.
I want it to extract all text that has a user-defined style to a new document but also to include the numbering if the style is a numbering style. I added these lines to achieve the latter, but they do not do the job:
The macro also goes into an endless loop when entering ‘Body Text’, ‘Body Text 2’, ‘List Number’, List Number 2, but not ‘Body Text 3’, ‘List Number 3’ (of those I tested). I added a counter to catch the endless loop at some point and display a message box to that effect, but I don’t understand why the macro does this on some styles and not on others.
Can someone help prevent the endless loop and get the macro to properly include the numbering when the style contains numbering?
Extract_Text_by_Style-Test--Mar-30-.docm
I want it to extract all text that has a user-defined style to a new document but also to include the numbering if the style is a numbering style. I added these lines to achieve the latter, but they do not do the job:
Set oRange = oDoc.Range(Start:=rText.Start, End:=rText.End)
oRange.ListFormat.ConvertNumbersToText wdNumberParagraph
'Then undo so that numbers revert to the original in oDoc
oDoc.Undo
The idea was to convert the numbers to text to preserve them in the output document but revert to their dynamic state in the source document (this macro would be a great tool to check to see where the list numbering is off wherever it occurs in the document).The macro also goes into an endless loop when entering ‘Body Text’, ‘Body Text 2’, ‘List Number’, List Number 2, but not ‘Body Text 3’, ‘List Number 3’ (of those I tested). I added a counter to catch the endless loop at some point and display a message box to that effect, but I don’t understand why the macro does this on some styles and not on others.
Can someone help prevent the endless loop and get the macro to properly include the numbering when the style contains numbering?
Extract_Text_by_Style-Test--Mar-30-.docm
ASKER
Gowflow,
Thanks for answering. Yes, I am interested in a solution. Since no one has responded in 3 weeks, I’ve been trying to come up with a single macro that does the job. The original task was simple to extract all text that has a user-defined style to a new document, and if that style is a numbering style, to include the numbering in the output document. Same for bullet styles – I want the bullets to be included in the output document. This would be very useful for large documents because Word has a tendency for numbered lists to become skewed even if using properly structured list number styles, and extracting such styles with their actual numbering would make it a lot easier to correct than going through the whole document). I also wanted the option of having the page and line numbers of each instance of the style test to appear above that instance in the output document (the attached macros all automatically include the page and line numbers; I am planning to add a user option to shut that off).
The original macro, which I now call “Extract_Text_by_Style_Old ”, went into an endless loop when entering ‘Body Text’, ‘Body Text 2’, ‘List Number’, List Number 2, but not ‘Body Text 3’, ‘List Number 3’ (of those I tested). I added a counter to catch the endless loop at some point and display a message box to that effect, but I don’t understand why the macro does this on some styles and not on others.
I have since found that this macro does not perform equally on all styles. With some it produces no output, with others the output is inaccurate or incomplete. So I began looking at alternatives. I ended up with 3 new macros, which I have placed in the attached ‘Extract_Text_by_Style-Tes t-Apr 20’ file. This file also includes the test data and a table with the results of the latest tests.
One of them, the ‘Extract_Text_by_Style_Cop y’, is my attempt to capture the numbers when a list number style is chosen. The only way I could achieve this was to copy the original document and remove all text with styles other than the target list style. It works, albeit a bit slowly. When I ran this for a list bullet style, the page and line number text lines became formatted as “symbol” because I was inserting a row above the style text, and the bullet style was getting carried over. I solved this by shutting off the bullets for the page and line number rows – I had to retain the style for that text so the macro would keep it while removing the text with other styles. Not slick, not pretty, but I’m not a professional programmer (and “advanced beginner” at best), but it does work. The ‘Extract_Text_by_Style_Cop y’ macro, in fact, seems to work properly with most the styles I tested – but not all. You can see where it works and doesn’t in the “Final Testing” table in the ‘Extract_Text_by_Style-Tes t-Apr 20’ document.
I found a macro that exports text by style to Excel that is very fast, but while it works well for Heading and Caption styles, doesn’t do much of anything with any other style.
I found that no single macro produces the desired results for all or any style, so I decided that I would build a listbox with only those styles I’ve tested. These appear in the attached doc – 46 altogether, 19 of which I’ve tested.
My next idea was to finish testing the 46 styles and seeing which macros produced the desired results for which styles, then creating a userform with a listbox of only those styles and buttons for the 3 main macros (others for the Extract EndNotes and Footnotes or Extract Footnotes to Excel and List_Cross_References macros that will extract text for styles such as Endnote Reference, Endnote Text, Footnote Text and Footnote Reference, and Intense Emphasis and Intense Reference, as shown in the testing table. I would then enable or disable the buttons depending on the selected style. One thing I don’t want is to run a macro and have it hang Word or open a new document with nothing it or inaccurate data.
The subject macros are:
Extract_Text_by_Style_Old – the original that works with some styles, is fast, but goes into an endless loop with some styles
Extract_Text_by_Style_Copy – a version that copies the source document to a new one and removes all data with all but the desired style
Extract_Text_by_Style_Exce l – a version that extracts text by style to Excel but seems to work only for Heading and Caption styles
Extract_Text_by_Style_Expo rt – a version that copies the target style text to an array but seems to work properly only with “Strong” and “Emphasis” styles
Is it possible to have one macro handle all style scenarios and quickly produce an extract in a new Word (or Excel) file? I believe so, but need help in getting there. If you are up to the challenge, please let me know.
Extract_Text_by_Style-Test-Apr-20.docm
Thanks for answering. Yes, I am interested in a solution. Since no one has responded in 3 weeks, I’ve been trying to come up with a single macro that does the job. The original task was simple to extract all text that has a user-defined style to a new document, and if that style is a numbering style, to include the numbering in the output document. Same for bullet styles – I want the bullets to be included in the output document. This would be very useful for large documents because Word has a tendency for numbered lists to become skewed even if using properly structured list number styles, and extracting such styles with their actual numbering would make it a lot easier to correct than going through the whole document). I also wanted the option of having the page and line numbers of each instance of the style test to appear above that instance in the output document (the attached macros all automatically include the page and line numbers; I am planning to add a user option to shut that off).
The original macro, which I now call “Extract_Text_by_Style_Old
I have since found that this macro does not perform equally on all styles. With some it produces no output, with others the output is inaccurate or incomplete. So I began looking at alternatives. I ended up with 3 new macros, which I have placed in the attached ‘Extract_Text_by_Style-Tes
One of them, the ‘Extract_Text_by_Style_Cop
I found a macro that exports text by style to Excel that is very fast, but while it works well for Heading and Caption styles, doesn’t do much of anything with any other style.
I found that no single macro produces the desired results for all or any style, so I decided that I would build a listbox with only those styles I’ve tested. These appear in the attached doc – 46 altogether, 19 of which I’ve tested.
My next idea was to finish testing the 46 styles and seeing which macros produced the desired results for which styles, then creating a userform with a listbox of only those styles and buttons for the 3 main macros (others for the Extract EndNotes and Footnotes or Extract Footnotes to Excel and List_Cross_References macros that will extract text for styles such as Endnote Reference, Endnote Text, Footnote Text and Footnote Reference, and Intense Emphasis and Intense Reference, as shown in the testing table. I would then enable or disable the buttons depending on the selected style. One thing I don’t want is to run a macro and have it hang Word or open a new document with nothing it or inaccurate data.
The subject macros are:
Extract_Text_by_Style_Old – the original that works with some styles, is fast, but goes into an endless loop with some styles
Extract_Text_by_Style_Copy
Extract_Text_by_Style_Exce
Extract_Text_by_Style_Expo
Is it possible to have one macro handle all style scenarios and quickly produce an extract in a new Word (or Excel) file? I believe so, but need help in getting there. If you are up to the challenge, please let me know.
Extract_Text_by_Style-Test-Apr-20.docm
Tks ur detailed explanation. I strongly suggest you put in your question fields word as u only have VBA there and myself being an expert in Excel i am almost novice in word and terribly sorry cannot extent any help.
I strongly suggest you ask modeeators to help you get some attention to your question by asking help to word experts.
I wish all the good luck and case u have anything in excel or vb pls feel free to msg me directly i ll be more glad to help
Regards
Gowflow
I strongly suggest you ask modeeators to help you get some attention to your question by asking help to word experts.
I wish all the good luck and case u have anything in excel or vb pls feel free to msg me directly i ll be more glad to help
Regards
Gowflow
ASKER
Thank you. Not sure how to ask moderator to do that or to edit and add Word key words. Perhaps a moderator will see this and step in....
I will ask help for you
Gowflow
Gowflow
ASKER
Thanks!
Welcome. I asked for help. Lets hope someone will attend.
Gowflow
Gowflow
I took a look at the Extract_Text_by_Style_Old routine and believe I found/fixed the issue of the "endless loop" problem
You have this section in the code which sets the rText.End value to something other than what the value was when the style was last found
So, I created a variable to store the last value position and then reset the rText.End value to that afterwards:
So the routine is now set as:
Line 14 above defines the new variable which are used in Line 107 and Line 111 above
Give that a try with what you are now trying to do to see if that get you further along.
You have this section in the code which sets the rText.End value to something other than what the value was when the style was last found
If rText.End < oDoc.Range.End - 1 Then '-1 to ignore final paragraph
rText.End = oDoc.Range.End
Set oRange = oDoc.Range(Start:=Selection.Range.Start, End:=Selection.Range.End)
oRange.ListFormat.ConvertNumbersToText wdNumberParagraph
Else
Exit Do
End If
So, I created a variable to store the last value position and then reset the rText.End value to that afterwards:
If rText.End < oDoc.Range.End - 1 Then '-1 to ignore final paragraph
intTextEnd = rText.End 'intTextEnd is the new variable
rText.End = oDoc.Range.End
Set oRange = oDoc.Range(Start:=Selection.Range.Start, End:=Selection.Range.End)
oRange.ListFormat.ConvertNumbersToText wdNumberParagraph
rText.End = intTextEnd 'reset the end position to the last found position
Else
Exit Do
End If
So the routine is now set as:
Sub Extract_Text_by_Style_Old()
'Nov 15 2018 - adapted from https://www.experts-exchange.com/questions/27585165/Microsoft-Word-Macro-to-Find-Copy-and-Paste-Text-with-Specific-Style-to-a-new-Document.html
'Works great!
'Jan 30 2019 - found some data issue in document can throw macro into endless loop. Added code to stop macro if this occurs.
'Original
Dim rText As Range, oRange As Range
Dim oDoc As Document
Dim oNewDoc As Document
Dim uStyle As String, oRng As Range
Dim pResult As String, lngStyleCount As Long, DefStyle As String
Dim vbmFlags As VbMsgBoxStyle, vbmResult As VbMsgBoxResult, strMsg As String
Dim PageNum As Integer, LineNum As String
Dim i As Integer
Dim intTextEnd As Long
DefStyle = "Heading 1"
uInput:
pResult = InputBox("Enter style whose text you want to extract to a new document.", "Style Name", DefStyle)
pResult = RealInput(pResult)
If pResult = "Input cancelled." Then
Exit Sub
End If
If pResult = "" Then GoTo uInput
uStyle = pResult 'assign psresult to ustyle if latter is not blank
Set oDoc = ActiveDocument
'Get count of style instances
On Error GoTo StyleError
Set oRng = oDoc.Range
With oRng.Find
.ClearFormatting
.Wrap = wdFindStop
.Forward = True
.Format = True
.MatchWildcards = False
.Text = ""
.Style = oDoc.Styles(uStyle)
Do While .Execute
lngStyleCount = lngStyleCount + 1
If oRng.End = oDoc.Range.End Then Exit Do
oRng.Collapse wdCollapseEnd
Loop
End With
If lngStyleCount = 0 Then
MsgBox "No text was found to extract for the style '" & uStyle & "'.", vbOKOnly, "Style Use Search"
Exit Sub
End If
strMsg = "Instances of '" & uStyle & "' found: " & lngStyleCount & "." & vbCr & vbCr _
& "Do you want to export the text using this style to a new document?"
vbmFlags = vbYesNo
If lngStyleCount > 0 Then
vbmResult = MsgBox(strMsg, vbmFlags, "Instance count")
End If
Select Case vbmResult
Case vbNo
Exit Sub
End Select
Set oNewDoc = Documents.Add
Set rText = oDoc.Range
'Insert page number and count in footer of output document
oNewDoc.Sections(oNewDoc.Sections.Count) _
.Footers(wdHeaderFooterPrimary).Range.Select
With Selection
.Paragraphs(1).Alignment = wdAlignParagraphCenter
.TypeText Text:="p. "
.Fields.Add Range:=Selection.Range, Type:=wdFieldEmpty, Text:="PAGE ", preserveFormatting:=True
.TypeText Text:=" of "
.Fields.Add Range:=Selection.Range, Type:=wdFieldEmpty, Text:="NUMPAGES ", preserveFormatting:=True
End With
If ActiveWindow.View.SplitSpecial <> wdPaneNone Then
ActiveWindow.Panes(2).Close
End If
If ActiveWindow.ActivePane.View.Type = wdNormalView Or ActiveWindow. _
ActivePane.View.Type = wdOutlineView Then
ActiveWindow.ActivePane.View.Type = wdPrintView
End If
ActiveWindow.ActivePane.View.SeekView = wdSeekCurrentPageHeader
Selection.EscapeKey
ActiveWindow.ActivePane.View.SeekView = wdSeekMainDocument
If ActiveWindow.View.SplitSpecial = wdPaneNone Then
ActiveWindow.ActivePane.View.Type = wdPrintView
Else
ActiveWindow.View.Type = wdPrintView
End If
'Write style name at top
oNewDoc.Range.InsertAfter "Style: '" & uStyle & "'" & vbCr
With rText.Find
.Style = oDoc.Styles(uStyle)
Do While .Execute
PageNum = rText.Information(wdActiveEndAdjustedPageNumber)
LineNum = rText.Information(wdFirstCharacterLineNumber)
i = i + 1
oNewDoc.Range.InsertAfter "Page: " & PageNum & "/" & "Line: " & LineNum & " " & rText & vbCr
rText.Collapse wdCollapseEnd
If rText.End < oDoc.Range.End - 1 Then '-1 to ignore final paragraph
intTextEnd = rText.End
rText.End = oDoc.Range.End
Set oRange = oDoc.Range(Start:=Selection.Range.Start, End:=Selection.Range.End)
oRange.ListFormat.ConvertNumbersToText wdNumberParagraph
rText.End = intTextEnd
Else
Exit Do
End If
If i > lngStyleCount Then
MsgBox "There is a data problem in this document that is causing an endless loop. Program will stop now. Look for data that is unnecessarily repeating in the extract and remove it in the source document and try again.", vbOKOnly, "Endless loop message"
Exit Sub
End If
Loop
End With
StyleError:
If Err.Number <> 0 Then
MsgBox "Style doesn't exist.", vbOKOnly, "Style Name Check"
Exit Sub
End If
End Sub
Line 14 above defines the new variable which are used in Line 107 and Line 111 above
Give that a try with what you are now trying to do to see if that get you further along.
ASKER
irudyk,
Thanks so much. I ran some random tests with the revised code and it never got hung up or went into an endless loop. From what I've tested so far, it looks as though it will do the job EXCEPT if the style contains Word's dynamic numbering, such as a list. Even if I add numbering to a Heading style, it extracts the heading text but not the number. So far, the only macro that does include the number is my klugey 'Extract_Text_by_Style_Cop y ', because it makes a copy of the whole source doc and then removes all text with every style other than the target one (entered in the input box).
If you can figure out how to get 'Extract_Text_by_Style_Old ’ to include the numbering, that would be great. The 'Copy' one can be slow and falls down on some of the other styles, which is why I've been contemplating testing the main styles to see which macros worked for them and then setting up a userform that selects the 'right' macro, depending on the style. That's not a good approach but the only one I can think of that might be useful. There must be another way of extracting the numbering with its associated text besides the copy-source-delete-other-s tyles method...
Thanks so much. I ran some random tests with the revised code and it never got hung up or went into an endless loop. From what I've tested so far, it looks as though it will do the job EXCEPT if the style contains Word's dynamic numbering, such as a list. Even if I add numbering to a Heading style, it extracts the heading text but not the number. So far, the only macro that does include the number is my klugey 'Extract_Text_by_Style_Cop
If you can figure out how to get 'Extract_Text_by_Style_Old
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
irudyk,
This loos great! I ran a few tests and they look very good. I will test more when i have time but I will accept this revision as the solution. Thanks so much!
This loos great! I ran a few tests and they look very good. I will test more when i have time but I will accept this revision as the solution. Thanks so much!
ASKER
That should be "looks great"...
Gowflow