Link to home
Start Free TrialLog in
Avatar of Natashar7
Natashar7

asked on

VBA Code to extract a table that consists the heading "Critical Controls" from multiple word documents into Excel

HI I need your help as the below code doesn't work.  Thanks

Sub wordScrape()

Dim wrdDoc As Object, objFiles As Object, fso As Object, wordApp As Object
Dim sh1 As Worksheet
Dim x As Integer

FolderName = "C:\code" ' Change this to the folder containing your word documents

Set sh1 = ThisWorkbook.Sheets(1)
Set fso = CreateObject("Scripting.FileSystemObject")
Set wordApp = CreateObject("Word.application")
Set objFiles = fso.GetFolder(FolderName).Files

x = 1
For Each wd In objFiles
    If InStr(wd, ".docx") And InStr(wd, "~") = 0 Then
        Set wrdDoc = wordApp.Documents.Open(wd.Path, ReadOnly = True)
        sh1.Cells(x, 1) = wd.Name
        sh1.Cells(x, 2) = Application.WorksheetFunction.Clean(wrdDoc.Tables(3).Cell(Row:=3, Column:=2).Range)
        'sh1.Cells(x, 3) = ....more extracted data....
        x = x + 1
    wrdDoc.Close
    End If

Next wd
wordApp.Quit
End Sub

Open in new window

Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

What happens when you try it?
I declared a couple of undeclared variables because I have Option Explicit set, and it seemed to work.
Avatar of Natashar7
Natashar7

ASKER

Where do I indicate in the code to extract the table that has the heading critical controls.  If u can please show me where I indicate the folder to access the word documents and where to include the excel path.  Thanks.
Can I email u the word doc that has the table so u can see ?  

I need a vba code that will extract the tables and put Into an excel worksheet.
It would certainly help if we had a sample document to work with.

Make sure that it doesn't have any confidential data and attach it:
Attach File>Browse> Upload file
I am getting an error. Compile error variable not defined.  For folder name = I put the path to the word doc folder. Eg c:
Here is the word doc file what it looks like.  Thanks
Sorry for the delay.
I added this line to overcome that error:
Dim FolderName As String

Open in new window


Actually you didn't succeed in attaching any files.
Thank you for taking the time to answer. Where do I indicate for it to search only table headings that consist of "Critical Controls" heading?  Did you get the attachment I sent you?
confused do I paste the code in excel or word developer tab?
still getting an error it says variable not defined.  

For Each wd In ObjFiles

please help
i still have not heard back and got a response to my question?
As I indicated in comment 40950752, the attachement is not there.
You need to declare the wd variable:

Dim wd As Object

It would be easier if you could post a sample input document.
You have used the EE message system to send me two messages. Since they do not appear to be private, I will explain here that the first asks a supplementary question, while the second seems to be another attempt to attach a document.
 
The supplementary question is
"Can u pls edit my code and tell me where I should indicate the folder path for the word and excel?  Also the word docs are tables it has to extract the heading that says critical controls on to excel.  I will send u a snapshot later today "

For that supplementary, you have used the variable 'FolderName' to hold the path for the word documents. For the Excel path, you seem to be working with an active worksheet, so its path would be that of the ActiveWorkbook
 
There was (still) no attachment to illustrate exactly what the document looks like, especially what is meant by the table heading. It could be that there is a paragraph with a heading style just before the table. Or it could refer to one of the cells in the first row.

In the morning (GMT + 1) I will try to create some code that assumes the latter.
how do I send you a private message? please explain as I am new to this site thanks
Sorry. I have confused me. You did send me private messages, but you asked me technical questions that didn't have to be private and could have been asked in this thread.
Here is some code that should do the job. It doesn't use FileSystemObject, but relies in the intrinsic Dir()  function instead. Don't forget to set the reference to the Word object library or you will get a 'User-defined tyoe not defined' error.
Sub wordScrape()
'Early binding for so set a reference (Tools/References) to the Microsoft Word Object LIbrary
Dim wrdDoc As Word.Document, wordApp As Word.Application
Dim sh1 As Worksheet
Dim bWordAppStartedHere As Boolean
Dim x As Integer
Dim strFolderName As String
Dim strFileName As String
Dim tbl As Word.Table

strFolderName = "C:\code" ' Change this to the folder containing your word documents

Set sh1 = ThisWorkbook.Sheets(1)

'Try to use existing Word Application
On Error Resume Next 'supress error checking
    Set wordApp = GetObject(, "Word.Application")
On Error GoTo 0 'Resume error checking
If wordApp Is Nothing Then 'word application wasn't already running
    Set wordApp = CreateObject("Word.application")
    bWordAppStartedHere = True
End If

x = 1
strFileName = Dir(strFolderName & "\*.docx")
Do While Len(strFileName) > 0
    Set wrdDoc = wordApp.Documents.Open(strFolderName & "\" & strFileName)
    For Each tbl In wrdDoc.Tables
        If InStr(tbl.Cell(1, 1).Range.Text, "Critical Controls") > 0 Then
            sh1.Cells(x, 1).Value = strFileName
            sh1.Cells(x, 2).Value = Application.WorksheetFunction.Clean(tbl.Cell(3, 2).Range.Text)
            'sh1.Cells(x, 3).Value = ....more extracted data....
            x = x + 1
            Exit For
        End If
    Next tbl
    wrdDoc.Close wdDoNotSaveChanges
    strFileName = Dir() 'get name of next file in foldere
Loop
'only close Word if it wasn't already running
If bWordAppStartedHere Then
    wordApp.Quit
End If
End Sub

Open in new window

There is a typo in my penultimat comment:
'Sorry. I have confused me.'
Should (obviously?) read
'Sorry. I have confused you.
did you get my private message
I tried it and pasted the code in excel and it doesn't work? It showed it was running but there is nothing in sheet 1?
You have now sent me a sample document via the EE message system. Unless there is a good reason not to, please post in the question.
I have attached a copy here.
Critical-Controls.docx
The document has a paragraph in Normal style with the text 'Critical Controls', followed by an empty paragraph, and then by a table.
This version of the code looks for any table in the document that is preceded by the 'Critical Controls' paragraph. There can be any number of empty paragraphs in between.
Sub wordScrape()
'Early binding for so set a reference (Tools/References) to the Microsoft Word Object LIbrary
Dim wrdDoc As Word.Document, wordApp As Word.Application
Dim sh1 As Worksheet
Dim bWordAppStartedHere As Boolean
Dim x As Integer
Dim strFolderName As String
Dim strFileName As String
Dim tbl As Word.Table
Dim rng As Word.Range
Dim para As Word.Paragraph

Dim p As Integer
strFolderName = "C:\code" ' Change this to the folder containing your word documents

Set sh1 = ThisWorkbook.Sheets(1)

'Try to use existing Word Application
On Error Resume Next 'supress error checking
    Set wordApp = GetObject(, "Word.Application")
On Error GoTo 0 'Resume error checking
If wordApp Is Nothing Then 'word application wasn't already running
    Set wordApp = CreateObject("Word.application")
    bWordAppStartedHere = True
End If

x = 1
strFileName = Dir(strFolderName & "\*.docx") 'get first file name
Do While Len(strFileName) > 0
    Set wrdDoc = wordApp.Documents.Open(strFolderName & "\" & strFileName)
    For Each tbl In wrdDoc.Tables 'walk throught the document tables
        Set rng = tbl.Range
        rng.Collapse wdCollapseStart
        rng.Start = wrdDoc.Range.Start 'range now goes from start of document to start of table
        For p = rng.Paragraphs.Count To 1 Step -1 'move backwards through the range
            Set para = rng.Paragraphs(p)
            If Len(para.Range) > 1 Then 'skip empty paragraphs
                If para.Range.Text = "Critical Controls" & vbCr Then
                    sh1.Cells(x, 1).Value = strFileName
                    sh1.Cells(x, 2).Value = Application.WorksheetFunction.Clean(tbl.Cell(3, 2).Range.Text)
                    'sh1.Cells(x, 3).Value = ....more extracted data....
                    x = x + 1
                Else
                    Exit For 'text in prior paragraph is not "Critical Controls"
                End If
            End If
        Next p
    Next tbl
    wrdDoc.Close wdDoNotSaveChanges
    strFileName = Dir() 'get name of next file in folder
Loop
'only close Word if it wasn't already running
If bWordAppStartedHere Then
    wordApp.Quit
End If
End Sub

Open in new window

Thank you it worked part of it but not all.  It didn't bring in the Control ID and corresponding description?  Each control id has a unique description.  Please check and let me know.  

Thanks for your hardwork.
i haven't heard back from you on a corrected code?
This forum is to help anyone who is stuck on what they are trying to do and to show how to do it. You are tryng to write some macro code to do a particular job, and seemed to be stuck at several points.  Therefore I have tidied up your original code so that it works in your context.

You now have an extra requirement that should be fairly easy to achieve even to someone who is only just beginning to learn VBA programming.

Your original macro has a commented-out line that acknowledges the fact that you may need to augment the code to collect more data from the document and it shows where it should go.
 
                  'sh1.Cells(x, 3).Value = ....more extracted data....

Open in new window

The previous line shows how to get the data from one cell in the table. All you need to do is to do something very similar with another cell.

We are here to show you how - not to do your work for you.
your code still doesn't work.
Previously, you said that it was working, but yoou now wanted to transfer some more data.
Has it stopped working because you tried to make changes, or does the code that I last posted stopped?
If the former can you post the code with your changes? If the latter can you say exactly what happens?
your code doesn't bring in all the critical controls it brought it just the first line but there are many in one table and in all word docs.  I didn't make any changes to the code just used your code
"your code doesn't bring in all the critical controls it brought it just the first line but there are many in one table and in all word docs.  I didn't make any changes to the code just used your code"

Please see GrahamSkan's comment (ID: 40956075) above.

You are expected to continue to meet your requirements.

If you are unsure how to do this, that is a different issue, but Graham has advised you what needs to happen to complete your task.

It is unfair to say that his code "does not work".
it brings only the first line but not all the lines stacked below each other
it brings only the first line but not all the lines stacked below each other

Are you still using the sample file you provided to Graham'; (re-)posted in ID: 40954449?

If so, what has changed since you said the code you were using was working (in ID: 40954649)?
ASKER CERTIFIED SOLUTION
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial