Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Extract particular section from word files

Posted on 2016-08-30
18
Medium Priority
?
53 Views
Last Modified: 2016-09-27
I have about 100 word (Doc, docx) files and need to extract a particular section. 2.9: Address. This "section" is in the Table of Contents as well as within the document.  How can i find and extract this section using vba, macro or python etc.
0
Comment
Question by:forumware
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 7
  • 4
18 Comments
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 41776990
Presumably the 'section' begins with an automatically-numbered Heading 2 paragraph with the text "Address". What defines the section end?
0
 

Author Comment

by:forumware
ID: 41777049
The next section also starts with Heading 2 but varies, some is general information, interconnections, logical diagrams.
0
 
LVL 18

Assisted Solution

by:xtermie
xtermie earned 800 total points (awarded by participants)
ID: 41777653
we can write a macro that finds the topic 2.9: Address and extracts anything between this heading and the next one of the same level and saves in a new document.

That would mean that you would have something like the following code:
Sub OpenAllDocumentsNow()
Dim file
Dim DocPara As Paragraph
Dim path As String
path = "C:\Test\"           'your folder here

file = Dir(path & "*.docx")
Do While file <> ""
Documents.Open fileName:=path & file
   For Each DocPara In ActiveDocument.Paragraphs
     If Left(DocPara.Range.Style, Len("Heading")) = "Heading2" Then
       'CALL THE COPY PASTE FUNCTION or CODE to select and copy to a new document
     End If
    Next
ActiveDocument.Close
file = Dir()
Loop
End Sub

Open in new window


Is that what you want?  Can you post a sample of the document with the heading?
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 76

Accepted Solution

by:
GrahamSkan earned 1200 total points (awarded by participants)
ID: 41777806
This code checks the text and the paragraph number. after finding it, the range is extended to the start of the next Heading 2 paragraph or the end of the document.

I have plugged it into a tweaked version xtermie's macro.

Sub OpenAllDocumentsNow()
Dim strFile As String
Dim rng As Range
Dim strPath As String
Dim docA As Document
Dim docB As Document

strPath = "C:\Test\"           'your folder here

Set docB = Documents.Add
strFile = Dir(strPath & "*.docx")
Do While strFile <> ""
    Set docA = Documents.Open(FileName:=path & strFile)
    
    Set rng = GetBlockRange(docA, "Address:^p", "2.9.")
    rng.Copy
    docB.Bookmarks("\EndOfDoc").Range.Paste
    docA.Close wdDoNotSaveChanges
    
    strFile = Dir()
Loop
docB.SaveAs strPath & "Extracts.docx"
docB.Close
End Sub


Function GetBlockRange(doc As Document, strParatext As String, strListString As String) As Range
    Dim rng As Range
    Dim rngBlock As Range
    
    Set doc = ActiveDocument
    Set rng = doc.Range
    With rng.Find
        .Text = strParatext
        .Style = "Heading 2"
        Do While .Execute()
            DoEvents
            If rng.Paragraphs.First.Range.ListFormat.ListString = strListString Then
                Exit Do
            Else
                rng.Start = rng.End
                rng.End = doc.Range.End
            End If
        Loop
        Set rngBlock = rng.Duplicate 'rngBlock will span the sought paragraph to the end of the document
    End With
    Set rng = rngBlock.Duplicate
    rng.Start = rng.End
    rng.End = doc.Range.End
    With rng.Find
        .Style = "Heading 2"
        If .Execute() Then
            rngBlock.End = rng.Start
        Else
            rngBlock.End = doc.Range.End 'no more level 2 paragraphs, so extend to end of document
        End If
    End With
End Function

Open in new window

1
 
LVL 18

Expert Comment

by:xtermie
ID: 41777818
forumware, let us know about GrahamSkan's macro, I think it will work just fine :)
0
 

Author Comment

by:forumware
ID: 41778185
Thanks to both, will give it a try shortly. I forgot to mention that I wanted to extract this information to a csv instead of another word document.
0
 
LVL 18

Assisted Solution

by:xtermie
xtermie earned 800 total points (awarded by participants)
ID: 41778194
No sure you can save as csv in Word
If we can, we will add a SaveAs at the end of the macro to do it
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 41778276
xtermie is right. There is no built-in procedure to export a Word document to a CSV file. It does exist in Excel and Access because those applications deal exclusively with data in a table format.
The new task, therefore is to fit the data to a table. Then it can be converted to comma-separated text and saved as a text file.
0
 
LVL 18

Expert Comment

by:xtermie
ID: 41779333
Graham, I think it would it be easier to copy-paste the particular text into an Excel file and then save the Excel file in a .csv format. What do you think?  If we change that in your macro, everything should work exactly as forumware wants.
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 41779703
xtermie,
Sorry, I'm a few hours behind you.
Yes, that seems like a good idea.
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 41779854
We would need to know the rules for splitting the text into fields (columns) - by words, sentences, paragraphs or something else.
0
 

Author Comment

by:forumware
ID: 41786362
All apologies on the delay. I've tried running the script and get Run time error '5174'

This file could not be found. But i see it listed in the c:\test directory.
0
 
LVL 18

Expert Comment

by:xtermie
ID: 41786473
Check if it is .docx or .doc
If it is .doc change script accordingly
0
 

Author Comment

by:forumware
ID: 41786520
Did that, left one DOCX in c:\test and got the same runtime error.
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 41786651
Which line does it fail on?
0
 
LVL 18

Assisted Solution

by:xtermie
xtermie earned 800 total points (awarded by participants)
ID: 41787525
This problem occurs when the following conditions are true:

The file name extension is not included as part of the FileName argument (for example: Documents.Open FileName:="mydocument")
-and-
The file name extension of the file you want to open is something other than ".doc".
0
 
LVL 76

Assisted Solution

by:GrahamSkan
GrahamSkan earned 1200 total points (awarded by participants)
ID: 41789259
I have augmented the code to look for both ,doc and .docx extensions. I have also commented out the line: Set doc = ActiveDocument in the GetBlockRange procedure which was a hangover from testing and less than helpful.

If you still have errors raised, please tell us the failing line. Other problems may require a (non-sensitive) sample document.

Sub OpenAllDocumentsNow()
Dim strFile As String
Dim rng As Range
Dim strPath As String
Dim docA As Document
Dim docB As Document

strPath = "C:\Test\"           'your folder here

Set docB = Documents.Add
strFile = Dir(strPath & "*.doc*")
Do While strFile <> ""
    Select Case GetFileExtension(strFile)
        Case "doc", "docx"
            Set docA = Documents.Open(FileName:=strPath & strFile)
            Set rng = GetBlockRange(docA, "Address:^p", "2.9.")
            If Not rng Is Nothing Then
                rng.Copy
                docB.Bookmarks("\EndOfDoc").Range.Paste
            End If
            docA.Close wdDoNotSaveChanges
        End Select
    strFile = Dir()
Loop
docB.SaveAs strPath & "Extracts.docx"
docB.Close
End Sub

Function GetFileExtension(strFileName As String) As String
Dim strParts() As String
Dim u As Integer

If Len(strFileName) > 0 Then
    strParts() = Split(strFileName, ".")
    u = UBound(strParts)
    If u > 0 Then
        GetFileExtension = LCase(strParts(u))
    End If
End If
End Function

Function GetBlockRange(doc As Document, strParatext As String, strListString As String) As Range
    Dim rng As Range
    Dim rngBlock As Range
    
    'Set doc = ActiveDocument
    Set rng = doc.Range
    With rng.Find
        .Text = strParatext
        .Style = "Heading 2"
        Do While .Execute()
            DoEvents
            If rng.Paragraphs.First.Range.ListFormat.ListString = strListString Then
                Exit Do
            Else
                rng.Start = rng.End
                rng.End = doc.Range.End
            End If
        Loop
        Set rngBlock = rng.Duplicate 'rngBlock will span the sought paragraph to the end of the document
    End With
    Set rng = rngBlock.Duplicate
    rng.Start = rng.End
    rng.End = doc.Range.End
    With rng.Find
        .Style = "Heading 2"
        If .Execute() Then
            rngBlock.End = rng.Start
        Else
            rngBlock.End = doc.Range.End 'no more level 2 paragaphs, so extend to end of document
        End If
    End With
End Function

Open in new window

1
 
LVL 18

Expert Comment

by:xtermie
ID: 41817516
experts provided solid collaborative working solution to author
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
We live in a world of interfaces like the one in the title picture. VBA also allows to use interfaces which offers a lot of possibilities. This article describes how to use interfaces in VBA and how to work around their bugs.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
Progress
Suggested Courses

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question