• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 59
  • Last Modified:

Extract particular section from word files

I have about 100 word (Doc, docx) files and need to extract a particular section. 2.9: Address. This "section" is in the Table of Contents as well as within the document.  How can i find and extract this section using vba, macro or python etc.
0
forumware
Asked:
forumware
  • 7
  • 7
  • 4
5 Solutions
 
GrahamSkanRetiredCommented:
Presumably the 'section' begins with an automatically-numbered Heading 2 paragraph with the text "Address". What defines the section end?
0
 
forumwareAuthor Commented:
The next section also starts with Heading 2 but varies, some is general information, interconnections, logical diagrams.
0
 
xtermieCommented:
we can write a macro that finds the topic 2.9: Address and extracts anything between this heading and the next one of the same level and saves in a new document.

That would mean that you would have something like the following code:
Sub OpenAllDocumentsNow()
Dim file
Dim DocPara As Paragraph
Dim path As String
path = "C:\Test\"           'your folder here

file = Dir(path & "*.docx")
Do While file <> ""
Documents.Open fileName:=path & file
   For Each DocPara In ActiveDocument.Paragraphs
     If Left(DocPara.Range.Style, Len("Heading")) = "Heading2" Then
       'CALL THE COPY PASTE FUNCTION or CODE to select and copy to a new document
     End If
    Next
ActiveDocument.Close
file = Dir()
Loop
End Sub

Open in new window


Is that what you want?  Can you post a sample of the document with the heading?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
GrahamSkanRetiredCommented:
This code checks the text and the paragraph number. after finding it, the range is extended to the start of the next Heading 2 paragraph or the end of the document.

I have plugged it into a tweaked version xtermie's macro.

Sub OpenAllDocumentsNow()
Dim strFile As String
Dim rng As Range
Dim strPath As String
Dim docA As Document
Dim docB As Document

strPath = "C:\Test\"           'your folder here

Set docB = Documents.Add
strFile = Dir(strPath & "*.docx")
Do While strFile <> ""
    Set docA = Documents.Open(FileName:=path & strFile)
    
    Set rng = GetBlockRange(docA, "Address:^p", "2.9.")
    rng.Copy
    docB.Bookmarks("\EndOfDoc").Range.Paste
    docA.Close wdDoNotSaveChanges
    
    strFile = Dir()
Loop
docB.SaveAs strPath & "Extracts.docx"
docB.Close
End Sub


Function GetBlockRange(doc As Document, strParatext As String, strListString As String) As Range
    Dim rng As Range
    Dim rngBlock As Range
    
    Set doc = ActiveDocument
    Set rng = doc.Range
    With rng.Find
        .Text = strParatext
        .Style = "Heading 2"
        Do While .Execute()
            DoEvents
            If rng.Paragraphs.First.Range.ListFormat.ListString = strListString Then
                Exit Do
            Else
                rng.Start = rng.End
                rng.End = doc.Range.End
            End If
        Loop
        Set rngBlock = rng.Duplicate 'rngBlock will span the sought paragraph to the end of the document
    End With
    Set rng = rngBlock.Duplicate
    rng.Start = rng.End
    rng.End = doc.Range.End
    With rng.Find
        .Style = "Heading 2"
        If .Execute() Then
            rngBlock.End = rng.Start
        Else
            rngBlock.End = doc.Range.End 'no more level 2 paragraphs, so extend to end of document
        End If
    End With
End Function

Open in new window

1
 
xtermieCommented:
forumware, let us know about GrahamSkan's macro, I think it will work just fine :)
0
 
forumwareAuthor Commented:
Thanks to both, will give it a try shortly. I forgot to mention that I wanted to extract this information to a csv instead of another word document.
0
 
xtermieCommented:
No sure you can save as csv in Word
If we can, we will add a SaveAs at the end of the macro to do it
0
 
GrahamSkanRetiredCommented:
xtermie is right. There is no built-in procedure to export a Word document to a CSV file. It does exist in Excel and Access because those applications deal exclusively with data in a table format.
The new task, therefore is to fit the data to a table. Then it can be converted to comma-separated text and saved as a text file.
0
 
xtermieCommented:
Graham, I think it would it be easier to copy-paste the particular text into an Excel file and then save the Excel file in a .csv format. What do you think?  If we change that in your macro, everything should work exactly as forumware wants.
0
 
GrahamSkanRetiredCommented:
xtermie,
Sorry, I'm a few hours behind you.
Yes, that seems like a good idea.
0
 
GrahamSkanRetiredCommented:
We would need to know the rules for splitting the text into fields (columns) - by words, sentences, paragraphs or something else.
0
 
forumwareAuthor Commented:
All apologies on the delay. I've tried running the script and get Run time error '5174'

This file could not be found. But i see it listed in the c:\test directory.
0
 
xtermieCommented:
Check if it is .docx or .doc
If it is .doc change script accordingly
0
 
forumwareAuthor Commented:
Did that, left one DOCX in c:\test and got the same runtime error.
0
 
GrahamSkanRetiredCommented:
Which line does it fail on?
0
 
xtermieCommented:
This problem occurs when the following conditions are true:

The file name extension is not included as part of the FileName argument (for example: Documents.Open FileName:="mydocument")
-and-
The file name extension of the file you want to open is something other than ".doc".
0
 
GrahamSkanRetiredCommented:
I have augmented the code to look for both ,doc and .docx extensions. I have also commented out the line: Set doc = ActiveDocument in the GetBlockRange procedure which was a hangover from testing and less than helpful.

If you still have errors raised, please tell us the failing line. Other problems may require a (non-sensitive) sample document.

Sub OpenAllDocumentsNow()
Dim strFile As String
Dim rng As Range
Dim strPath As String
Dim docA As Document
Dim docB As Document

strPath = "C:\Test\"           'your folder here

Set docB = Documents.Add
strFile = Dir(strPath & "*.doc*")
Do While strFile <> ""
    Select Case GetFileExtension(strFile)
        Case "doc", "docx"
            Set docA = Documents.Open(FileName:=strPath & strFile)
            Set rng = GetBlockRange(docA, "Address:^p", "2.9.")
            If Not rng Is Nothing Then
                rng.Copy
                docB.Bookmarks("\EndOfDoc").Range.Paste
            End If
            docA.Close wdDoNotSaveChanges
        End Select
    strFile = Dir()
Loop
docB.SaveAs strPath & "Extracts.docx"
docB.Close
End Sub

Function GetFileExtension(strFileName As String) As String
Dim strParts() As String
Dim u As Integer

If Len(strFileName) > 0 Then
    strParts() = Split(strFileName, ".")
    u = UBound(strParts)
    If u > 0 Then
        GetFileExtension = LCase(strParts(u))
    End If
End If
End Function

Function GetBlockRange(doc As Document, strParatext As String, strListString As String) As Range
    Dim rng As Range
    Dim rngBlock As Range
    
    'Set doc = ActiveDocument
    Set rng = doc.Range
    With rng.Find
        .Text = strParatext
        .Style = "Heading 2"
        Do While .Execute()
            DoEvents
            If rng.Paragraphs.First.Range.ListFormat.ListString = strListString Then
                Exit Do
            Else
                rng.Start = rng.End
                rng.End = doc.Range.End
            End If
        Loop
        Set rngBlock = rng.Duplicate 'rngBlock will span the sought paragraph to the end of the document
    End With
    Set rng = rngBlock.Duplicate
    rng.Start = rng.End
    rng.End = doc.Range.End
    With rng.Find
        .Style = "Heading 2"
        If .Execute() Then
            rngBlock.End = rng.Start
        Else
            rngBlock.End = doc.Range.End 'no more level 2 paragaphs, so extend to end of document
        End If
    End With
End Function

Open in new window

1
 
xtermieCommented:
experts provided solid collaborative working solution to author
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 7
  • 7
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now