Solved

Extract particular section from word files

Posted on 2016-08-30
18
30 Views
Last Modified: 2016-09-27
I have about 100 word (Doc, docx) files and need to extract a particular section. 2.9: Address. This "section" is in the Table of Contents as well as within the document.  How can i find and extract this section using vba, macro or python etc.
0
Comment
Question by:forumware
  • 7
  • 7
  • 4
18 Comments
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 41776990
Presumably the 'section' begins with an automatically-numbered Heading 2 paragraph with the text "Address". What defines the section end?
0
 

Author Comment

by:forumware
ID: 41777049
The next section also starts with Heading 2 but varies, some is general information, interconnections, logical diagrams.
0
 
LVL 17

Assisted Solution

by:xtermie
xtermie earned 200 total points (awarded by participants)
ID: 41777653
we can write a macro that finds the topic 2.9: Address and extracts anything between this heading and the next one of the same level and saves in a new document.

That would mean that you would have something like the following code:
Sub OpenAllDocumentsNow()
Dim file
Dim DocPara As Paragraph
Dim path As String
path = "C:\Test\"           'your folder here

file = Dir(path & "*.docx")
Do While file <> ""
Documents.Open fileName:=path & file
   For Each DocPara In ActiveDocument.Paragraphs
     If Left(DocPara.Range.Style, Len("Heading")) = "Heading2" Then
       'CALL THE COPY PASTE FUNCTION or CODE to select and copy to a new document
     End If
    Next
ActiveDocument.Close
file = Dir()
Loop
End Sub

Open in new window


Is that what you want?  Can you post a sample of the document with the heading?
0
 
LVL 76

Accepted Solution

by:
GrahamSkan earned 300 total points (awarded by participants)
ID: 41777806
This code checks the text and the paragraph number. after finding it, the range is extended to the start of the next Heading 2 paragraph or the end of the document.

I have plugged it into a tweaked version xtermie's macro.

Sub OpenAllDocumentsNow()
Dim strFile As String
Dim rng As Range
Dim strPath As String
Dim docA As Document
Dim docB As Document

strPath = "C:\Test\"           'your folder here

Set docB = Documents.Add
strFile = Dir(strPath & "*.docx")
Do While strFile <> ""
    Set docA = Documents.Open(FileName:=path & strFile)
    
    Set rng = GetBlockRange(docA, "Address:^p", "2.9.")
    rng.Copy
    docB.Bookmarks("\EndOfDoc").Range.Paste
    docA.Close wdDoNotSaveChanges
    
    strFile = Dir()
Loop
docB.SaveAs strPath & "Extracts.docx"
docB.Close
End Sub


Function GetBlockRange(doc As Document, strParatext As String, strListString As String) As Range
    Dim rng As Range
    Dim rngBlock As Range
    
    Set doc = ActiveDocument
    Set rng = doc.Range
    With rng.Find
        .Text = strParatext
        .Style = "Heading 2"
        Do While .Execute()
            DoEvents
            If rng.Paragraphs.First.Range.ListFormat.ListString = strListString Then
                Exit Do
            Else
                rng.Start = rng.End
                rng.End = doc.Range.End
            End If
        Loop
        Set rngBlock = rng.Duplicate 'rngBlock will span the sought paragraph to the end of the document
    End With
    Set rng = rngBlock.Duplicate
    rng.Start = rng.End
    rng.End = doc.Range.End
    With rng.Find
        .Style = "Heading 2"
        If .Execute() Then
            rngBlock.End = rng.Start
        Else
            rngBlock.End = doc.Range.End 'no more level 2 paragraphs, so extend to end of document
        End If
    End With
End Function

Open in new window

1
 
LVL 17

Expert Comment

by:xtermie
ID: 41777818
forumware, let us know about GrahamSkan's macro, I think it will work just fine :)
0
 

Author Comment

by:forumware
ID: 41778185
Thanks to both, will give it a try shortly. I forgot to mention that I wanted to extract this information to a csv instead of another word document.
0
 
LVL 17

Assisted Solution

by:xtermie
xtermie earned 200 total points (awarded by participants)
ID: 41778194
No sure you can save as csv in Word
If we can, we will add a SaveAs at the end of the macro to do it
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 41778276
xtermie is right. There is no built-in procedure to export a Word document to a CSV file. It does exist in Excel and Access because those applications deal exclusively with data in a table format.
The new task, therefore is to fit the data to a table. Then it can be converted to comma-separated text and saved as a text file.
0
 
LVL 17

Expert Comment

by:xtermie
ID: 41779333
Graham, I think it would it be easier to copy-paste the particular text into an Excel file and then save the Excel file in a .csv format. What do you think?  If we change that in your macro, everything should work exactly as forumware wants.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 76

Expert Comment

by:GrahamSkan
ID: 41779703
xtermie,
Sorry, I'm a few hours behind you.
Yes, that seems like a good idea.
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 41779854
We would need to know the rules for splitting the text into fields (columns) - by words, sentences, paragraphs or something else.
0
 

Author Comment

by:forumware
ID: 41786362
All apologies on the delay. I've tried running the script and get Run time error '5174'

This file could not be found. But i see it listed in the c:\test directory.
0
 
LVL 17

Expert Comment

by:xtermie
ID: 41786473
Check if it is .docx or .doc
If it is .doc change script accordingly
0
 

Author Comment

by:forumware
ID: 41786520
Did that, left one DOCX in c:\test and got the same runtime error.
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 41786651
Which line does it fail on?
0
 
LVL 17

Assisted Solution

by:xtermie
xtermie earned 200 total points (awarded by participants)
ID: 41787525
This problem occurs when the following conditions are true:

The file name extension is not included as part of the FileName argument (for example: Documents.Open FileName:="mydocument")
-and-
The file name extension of the file you want to open is something other than ".doc".
0
 
LVL 76

Assisted Solution

by:GrahamSkan
GrahamSkan earned 300 total points (awarded by participants)
ID: 41789259
I have augmented the code to look for both ,doc and .docx extensions. I have also commented out the line: Set doc = ActiveDocument in the GetBlockRange procedure which was a hangover from testing and less than helpful.

If you still have errors raised, please tell us the failing line. Other problems may require a (non-sensitive) sample document.

Sub OpenAllDocumentsNow()
Dim strFile As String
Dim rng As Range
Dim strPath As String
Dim docA As Document
Dim docB As Document

strPath = "C:\Test\"           'your folder here

Set docB = Documents.Add
strFile = Dir(strPath & "*.doc*")
Do While strFile <> ""
    Select Case GetFileExtension(strFile)
        Case "doc", "docx"
            Set docA = Documents.Open(FileName:=strPath & strFile)
            Set rng = GetBlockRange(docA, "Address:^p", "2.9.")
            If Not rng Is Nothing Then
                rng.Copy
                docB.Bookmarks("\EndOfDoc").Range.Paste
            End If
            docA.Close wdDoNotSaveChanges
        End Select
    strFile = Dir()
Loop
docB.SaveAs strPath & "Extracts.docx"
docB.Close
End Sub

Function GetFileExtension(strFileName As String) As String
Dim strParts() As String
Dim u As Integer

If Len(strFileName) > 0 Then
    strParts() = Split(strFileName, ".")
    u = UBound(strParts)
    If u > 0 Then
        GetFileExtension = LCase(strParts(u))
    End If
End If
End Function

Function GetBlockRange(doc As Document, strParatext As String, strListString As String) As Range
    Dim rng As Range
    Dim rngBlock As Range
    
    'Set doc = ActiveDocument
    Set rng = doc.Range
    With rng.Find
        .Text = strParatext
        .Style = "Heading 2"
        Do While .Execute()
            DoEvents
            If rng.Paragraphs.First.Range.ListFormat.ListString = strListString Then
                Exit Do
            Else
                rng.Start = rng.End
                rng.End = doc.Range.End
            End If
        Loop
        Set rngBlock = rng.Duplicate 'rngBlock will span the sought paragraph to the end of the document
    End With
    Set rng = rngBlock.Duplicate
    rng.Start = rng.End
    rng.End = doc.Range.End
    With rng.Find
        .Style = "Heading 2"
        If .Execute() Then
            rngBlock.End = rng.Start
        Else
            rngBlock.End = doc.Range.End 'no more level 2 paragaphs, so extend to end of document
        End If
    End With
End Function

Open in new window

1
 
LVL 17

Expert Comment

by:xtermie
ID: 41817516
experts provided solid collaborative working solution to author
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

You can of course define an array to hold data that is of a particular type like an array of Strings to hold customer names or an array of Doubles to hold customer sales, but what do you do if you want to coordinate that data? This article describes…
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
This video walks the viewer through the process of creating envelopes and labels, with multiple names and addresses. Navigate to the “Start Mail Merge” button in the Mailings tab: Follow the step-by-step process until asked to find the address doc…
In a previous video Micro Tutorial here at Experts Exchange (http://www.experts-exchange.com/videos/1358/How-to-get-a-free-trial-of-Office-365-with-the-Office-2016-desktop-applications.html), I explained how to get a free, one-month trial of Office …

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now