Solved

How do I export data from a word document into excel?

Posted on 2014-12-02
20
261 Views
Last Modified: 2014-12-09
Hi,

I have many word documents in a fixed format. All documents have a set of 32 questions and their answers. The questions are identical across all documents and are in the same order, numbered from 1 to 32. The answers vary in length and formatting (some are blocks of text while some are bulleted lists). I want to automate the upload of this questionnaire into a web form and for that, I would like to export the data into excel. In excel, each question will be represented by one column. I am looking for a way to extract the answer to the first question and place it in Column A, 2nd question in Column B, and so on with one row representing one document. Can someone help?
0
Comment
Question by:Mayank Lhila
  • 5
  • 5
  • 5
  • +2
20 Comments
 
LVL 15

Expert Comment

by:Haris Djulic
ID: 40475653
Can you post sample file?
0
 
LVL 18

Expert Comment

by:SimonAdept
ID: 40475655
Please upload a sample document with sensitive information removed or replaced by dummy data.

Are you only looking at historic Word documents, or can you set up a Word form to suit this process?
0
 
LVL 14

Expert Comment

by:John-Charles-Herzberg
ID: 40475657
0
 

Author Comment

by:Mayank Lhila
ID: 40476215
Thank you for all the comments. Yes - I am looking to extract historic word files only. I am attaching a sample word file with data.
Wildlife.docx
0
 
LVL 18

Expert Comment

by:SimonAdept
ID: 40484484
Hi Mayank,

Some of the answers are extremely long.

e.g. 21. Day by day itinerary with description

Some of the answers include tabular data that cannot be fitted into Excel cells
e.g. 28. Please list the visas, travel documents and special permits required for this itinerary along with information on how to arrange for them.

To me, it's one of those cases that almost anything is do-able, but does the output warrant the time to develop it and is this the only available methodology?

Please could you answer these questions to help us suggest the best solution:
Do you have thousands of these documents?
Is Excel the only format that is acceptable?
Do you need the full answer to every question (including formatting - e.g. bullets, tables)?
0
 

Author Comment

by:Mayank Lhila
ID: 40485703
Hi Simon,

Thanks for your response. The end goal of this is to upload this data into my website (running on Drupal 7 with a custom form for uploading this data). I know of a way to upload data from excel into the web form, but could not figure out how to move data from this word file to excel.

Right now I have a few hundred of these documents only but the number keeps increasing. The formatting etc. need not be carried over. Even if some fields have too much data, they can be skipped and then uploaded manually. Even that should reduce my data uploading effort by 80-90%. What do you think of this plan?

Thanks,
Mayank
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 40485855
Here is some code to do something of what you need. Is is written as a Word macro. It uses early binding, so there needs to be a reference  (Tools/References) to the Microsoft Excel Object Library.

The Word documents need to be in one folder.  I have hard-coded that path, so you will need to take care of that.
It does not upload the text of the questions.
Carriage returns have been removed
Option Explicit

Sub OpenFiles()

Dim strInFile As String
Dim strDocFile As String
Dim strInfolder As String
Dim doc As Word.Document
Dim xlApp As Excel.Application
Dim xlWbk As Excel.Workbook
Dim xlWks As Excel.Worksheet

Set xlApp = CreateObject("Excel.Application")
xlApp.Visible = True
Set xlWbk = xlApp.Workbooks.Add
Set xlWks = xlWbk.Sheets(1)

strInfolder = "C:\MyFolder"
strInFile = Dir(strInfolder & "\*.doc*")

Do Until Len(strInFile) < 1
    Set doc = Documents.Open(strInfolder & "\" & strInFile)
    ExtractData xlWks, doc
    doc.Close wdDoNotSaveChanges
    strInFile = Dir() ' gets next file name
Loop
End Sub

Sub ExtractData(xlWks As Excel.Worksheet, doc As Word.Document)
Dim rngStart As Range
Dim rngEnd As Range
Dim c As Integer 'for sheet column number
Dim r As Integer 'for sheet row number
Dim strText As String

'remove any spaces in front of the numbering
With doc.Range.Find
    .Text = "^13[ ]{1,}([0-9]{1,2})"
    .Replacement.Text = "^p\1"
    .MatchWildcards = True
    .Execute Replace:=wdReplaceAll
End With

Set rngStart = doc.Range
Set rngEnd = doc.Range
r = 1

'look for first empty row
Do Until xlWks.Cells(r, 1).Value = ""
    r = r + 1
Loop
For c = 1 To 35
    rngStart.End = doc.Range.End
    rngEnd.End = doc.Range.End

    With rngStart.Find
        .Text = "^p" & c & "." 'find paragraph mark, the question number, full stop
        If .Execute() Then
            rngEnd.Start = rngStart.End
            If c < 35 Then
                With rngEnd.Find
                    .Text = "^p" & c + 1 & "."
                    If .Execute() Then
                        rngStart.End = rngEnd.Start
                    Else
                        MsgBox "Question #" & c + 1 & " not found"
                        Stop
                    End If
                End With
            Else
                rngStart.End = doc.Range.End
            End If
            Debug.Print rngStart.Text
            rngStart.MoveStart wdCharacter, 1
            rngStart.MoveStart wdParagraph, 1 'drop question text
            rngStart.MoveEnd wdCharacter, -1
            If Len(rngStart) > 0 Then
                strText = Replace(rngStart.Text, vbCr, "")
                xlWks.Cells(r, c).Value = strText
            Else
                rngStart.MoveStart wdCharacter, -2
            End If
        Else
            MsgBox "Question #" & c & " not found"
            Stop
        End If
    End With
Next c
End Sub

Open in new window

0
 
LVL 18

Expert Comment

by:SimonAdept
ID: 40485881
Doh, Graham, you posted while I was still testing. Your method looks good.

I'm still going to post mine anyway as I've taken a slightly different approach - mine is in Excel with a reference to Word, and operates only on documents open in the current Word session.

EE-GetAnswersFromWordDocs.xlsm

Sub processWordDocs()
Dim objWord As Word.Application
Dim objWDoc As Word.Document
Dim CaptureSheet As Excel.Worksheet
Dim activeRow As Integer
Dim sourceName As String

Set CaptureSheet = ActiveWorkbook.Worksheets.Add 'add new worksheet to capture data

CaptureSheet.Range(Cells(1, 1), Cells(1, 36)) = Array("Source", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
activeRow = 2

Set objWord = GetObject(, "Word.application")
For Each objWDoc In objWord.documents
    sourceName = objWDoc.Name
    CaptureSheet.Cells(activeRow, 1) = sourceName
    With objWDoc

        Dim rng1 As Word.Range
        Dim rng2 As Word.Range
        Dim strTheText As String
    
        Dim QuestionNo As Integer
        QuestionNo = 1
    
        Set rng1 = objWDoc.Range
        Do Until QuestionNo > 35
        If rng1.Find.Execute(FindText:="^p" & QuestionNo & ". ") Then
            rng1.EndOf Unit:=wdParagraph, Extend:=wdMove
            Set rng2 = objWDoc.Range(rng1.End, objWDoc.Range.End)
            If QuestionNo < 35 Then
                If rng2.Find.Execute(FindText:="^p" & QuestionNo + 1 & ". ") Then
                    strTheText = objWDoc.Range(rng1.End, rng2.Start).Text
                    CaptureSheet.Cells(activeRow, QuestionNo + 1) = strTheText
                End If
            Else
                strTheText = objWDoc.Range(rng1.End, objWDoc.Range.End).Text
                CaptureSheet.Cells(activeRow, QuestionNo + 1) = strTheText
            End If
        End If
        QuestionNo = QuestionNo + 1
        DoEvents
        Loop
    End With
activeRow = activeRow + 1
objWDoc.Close savechanges:=False
Next objWDoc

Set objWord = Nothing
MsgBox "Done"
End Sub

Open in new window

0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:Mayank Lhila
ID: 40486237
Thanks a lot for the scripts guys.

@Graham - Sorry I'm a total newbie with macros. Could you please explain how to do the referencing? I am using Office 2011 on Mac. Also, how will the hardcoding work on Mac - is it "$HOME\User\Desktop\..." ?

@Simon - The script works but some fields are being skipped. (7,8,9,11,16, 35). Do you think you could look into that please?

Appreciate the help!

Mayank
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 40486336
@ Simon
Sorry about that. I had been working on the code for a while before Mayank's last posting (#40485703)
 
@Mayank
I have no experience with a Mac, but in the Windows VBA IDE there is a menu heading called 'Tools' which contains the item 'References'. This opens a dialogue which lists all the libraries which can be added by ticking their checkboxes.
0
 
LVL 18

Accepted Solution

by:
SimonAdept earned 500 total points
ID: 40486346
Hi,
There are some inconsistencies in the formatting of the document. The questions that are being missed have newline characters instead of return characters, or a space in front of the question number

Graham's code handles the space in front of the number, but I don't think either of our suggested solutions deal with the newlines in place of return characters.

In the workbook I previously posted, insert these lines immediately under the
"  With objWDoc" line:
        'remove any spaces in front of the numbering 'from Graham Skan's code
        With objWDoc.Range.Find
            .Text = "^13[ ]{1,}([0-9]{1,2})"
            .Replacement.Text = "^p\1"
            .MatchWildcards = True
            .Execute Replace:=wdReplaceAll
        End With
        
        'replace any manual line breaks with paragraph marks
        With objWDoc.Range.Find
            .Text = "^l"
            .Replacement.Text = "^p"
            .MatchWildcards = False
            .Execute Replace:=wdReplaceAll
        End With

Open in new window


I also attach an updated workbook, but this has been compiled on a Windows version of Excel, so you'll probably have to redo the reference to the Word application if you use this rather than modifying the code in the workbook I previously sent (which was done on the Mac 2011 version of Excel).

EE-GetAnswersFromWordDocs-v2.xlsm
I can't do any further work on this for the next few hours. If the above is not effective, Graham may be able to jump in and sort it for you.
0
 

Author Comment

by:Mayank Lhila
ID: 40487349
Thanks a lot Simon - this solves it.

I'm sure Graham cracked it too but I just couldn't figure out how to make it work on my mac. My bad for not letting you know earlier.

Thanks again!

Cheers,
Mayank
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 40487388
Mayank,
Are you the Boney M fan who contributes to FaceBook?
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 40487390
Oops. I meant 'You tube'
0
 
LVL 18

Expert Comment

by:SimonAdept
ID: 40487452
Thanks. Glad to help. Graham's wildcard search/replace was a very neat way of overcoming the inconsistencies in the document layout. Credit to him for thinking of that approach.
0
 

Author Comment

by:Mayank Lhila
ID: 40488349
@Graham - Haha - I like Boney M but wouldn't call myself a die hard fan. I take it you are one?

Thanks again for your help on this!
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 40488512
Not really - though I did buy a vinyl  disc in the 70's before the days of CDs or MP3s
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Microsoft Word is a program we have all encountered at some point, but very few of us have dug deep into its full scope of features, let alone customized it to suit our needs. Luckily making the ribbon (aka toolbar, first introduced in Word 2007) wo…
Outlook Free & Paid Tools
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial demonstrates in Microsoft Excel how to consolidate your marketing data by creating an interactive charts using form controls. This creates cool drop-downs for viewers of your chart to choose from.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now