Solved

Transfer randomly organized table data from Word to Excel

Posted on 2014-01-09
10
541 Views
Last Modified: 2014-01-28
In the attached file;

I have a number of tables with yellow titles. All cells in the table are subsets of the titles. I need to get this data to excel such that Column A contains the titles (repeated for each cell entry) and column B contains the cell contents one per cell.

The problem is that some of the tables have titles as the first cell/row of the table whereas the other tables have the title above the table.

I have coded the table data. Where the title is coded it means it is part of the table. Where the title is the original text the title is not part of the table but is normal text just above the table.

I am looking for a macro to transfer this data from word to excel as described  in the first para.

Or at least a macro to make all the tables consistent in the layout of the title.
TELEPHONE-DIRECTORY.doc
0
Comment
Question by:Saqib Husain, Syed
  • 5
  • 2
10 Comments
 
LVL 11

Expert Comment

by:Angelp1ay
ID: 39788599
I have an almost working version (attached).

You'll need to add Microsoft Scripting Runtime and Microsoft Excel Object Library under tools > references.

This is the macro:
Sub a()
    ' Setup Excel
    Dim exApp As Excel.Application, exWb As Excel.Workbook, exWs As Excel.Worksheet

    Set exApp = New Excel.Application
    Set exWb = exApp.Workbooks.Add
    Set exWs = exWb.ActiveSheet
    exApp.Visible = True
    
    ' Loop all paragraphs
    Dim para As Paragraph, paraIndex As Integer, paras As Scripting.Dictionary, content As String
    
    Set paras = New Scripting.Dictionary
    paraIndex = 1
    For Each para In ActiveDocument.Paragraphs
        content = para.Range.Text
        If Len(content) > 2 Then
            paras.Add paraIndex, content
            paraIndex = paraIndex + 1
        End If
    Next
        
    ' Loop all tables
    Dim t As Table, c As Cell, exRow As Integer
    exRow = 1
    
    paraIndex = 1
    For Each t In ActiveDocument.Tables
        exWs.Cells(exRow, 1).Value = paras.Item(paraIndex)
        paraIndex = paraIndex + 1
        exRow = exRow + 1
        
        For Each c In t.Range.Cells
            content = c.Range.Text
            exWs.Cells(exRow, 2).Value = Left(content, Len(content) - 1)
            paraIndex = paraIndex + 1
            exRow = exRow + 1
        Next
    Next
End Sub

Open in new window


It works up until about Itm400. Something in the structure of document seems to change somewhere around there. If this is a one off perhaps you can export up to there in one batch and after there in another?!
Almost.doc
Export.xls
0
 
LVL 11

Expert Comment

by:Angelp1ay
ID: 39788618
A little explanation:

- The first loop iterates over all paragraphs in the document and puts them into a dictionary with incrementing integer keys
(both those outside tables AND those in tables)

- The second loop iterates through tables and their cells
(i.e. only paragraphs in tables)

- I make the assumption there is one paragraph outside a table for each table

Key             Paragraphs            Table Cells
--------------------------------------------------
1               WILDLIFE...
2               Itm1                  Itm1
3               Itm2                  Itm2
4               Itm3                  Itm3
5               Itm4                  Itm4
6               Itm5                  Itm5
7               Itm6                  Itm6
8               Itm7                  Itm7
9               Itm8                  Itm8
10              Itm9                  Itm9
11              LAHORE ZOO
12              Itm10                 Itm10
...

Open in new window


During the second loop, for each new table, I:
- Dump a paragraph from the dictionary
- Dump out the cells of the table, each time incrementing my paragraph counter and hence skipping items in the dictionary matching these cells

So in the above example I hit table 1, I write out "WILDLIFE...", then I loop the cells writing Itm1-9, and skipping passed them in the dictionary. Then I hit table 2, I write out the next paragraph from the dictionary which is LAHORE ZOO.
0
 
LVL 43

Author Comment

by:Saqib Husain, Syed
ID: 39791037
As I mentioned in the question, the items which are colored, eg 187, 190, 193 are titles and not data.
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 
LVL 11

Accepted Solution

by:
Angelp1ay earned 500 total points
ID: 39814365
Ok, this one recognises colored titles within the tables too:
Sub ExtractData()
    ' Setup Excel
    Dim exApp As Excel.Application, exWb As Excel.Workbook, exWs As Excel.Worksheet

    Set exApp = New Excel.Application
    Set exWb = exApp.Workbooks.Add
    Set exWs = exWb.ActiveSheet
    exApp.Visible = True
    
    ' Loop all paragraphs
    Dim para As Paragraph, paraIndex As Integer, paras As Scripting.Dictionary, content As String
    
    Set paras = New Scripting.Dictionary
    paraIndex = 1
    For Each para In ActiveDocument.Paragraphs
        content = para.Range.Text
        If Len(content) > 2 Then
            paras.Add paraIndex, content
            paraIndex = paraIndex + 1
        End If
    Next
        
    ' Loop all tables
    Dim t As Table, c As Cell, exRow As Integer
    exRow = 1
    
    paraIndex = 1
    For Each t In ActiveDocument.Tables
        content = paras.Item(paraIndex)
        paraIndex = paraIndex + 1
        
        exWs.Cells(exRow, 1).Value = content
        exRow = exRow + 1
        
        For Each c In t.Range.Cells
            content = c.Range.Text
            content = Left(content, Len(content) - 2)
            
            If c.Shading.BackgroundPatternColorIndex = 7 Then
                exWs.Cells(exRow, 1).Value = content ' Yellow >> Title
            Else
                exWs.Cells(exRow, 2).Value = content ' Not yellow >> Data
            End If
            
            paraIndex = paraIndex + 1
            exRow = exRow + 1
        Next
    Next
    
    MsgBox "Extract Complete"
End Sub

Open in new window


Issues:
- The title "CDA ISLAMABAD (CULTURE COMPLEX)" was split over 2 lines which caused a problem >> it works if you remove the forced line break

- There are some issues later on e.g. just before Itm578, I think due to blank lines between tables. I removed the line between the table starting Itm578 and the previous table and this solved the issue.

They are quite easy to spot in the Excel (see pic below)
Example issue due to differing line breaksAlmost.doc
0
 
LVL 11

Assisted Solution

by:Angelp1ay
Angelp1ay earned 500 total points
ID: 39814399
Final Version! :)
Please do the following:
1) Find-Replace paragraph marks
2) Run the macro "RemoveBlanks"
3) Run the macro "ExtractData"

Find-Replace Blank ParagraphsFinalVersion.doc
Extract.xls
0
 
LVL 43

Author Closing Comment

by:Saqib Husain, Syed
ID: 39816777
Perfect, Thanks.
0
 
LVL 11

Expert Comment

by:Angelp1ay
ID: 39817063
You're welcome!

At first I didn't think it was going to be possible. In the end it was an interesting challenge.

Have a great day :)
0

Featured Post

Does Powershell have you tied up in knots?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Dropbox in Windows Server 2008 4 30
Excel Spacing Anomaly 4 23
Openoffice or opensource excel/word/ppt for Mac OSX Mountain Lion 14 43
Msbbox Notice (4 days) 27 58
This article will guide you to convert a grid from a picture into Excel format using Microsoft OneNote and no other 3rd party application.
Some code to ensure data integrity when using macros within Excel. Also included code that helps secure your data within an Excel workbook.
This Micro Tutorial will demonstrate in Microsoft Excel how to add style and sexy appeal to horizontal bar charts.
In a previous video Micro Tutorial here at Experts Exchange (http://www.experts-exchange.com/videos/1358/How-to-get-a-free-trial-of-Office-365-with-the-Office-2016-desktop-applications.html), I explained how to get a free, one-month trial of Office …

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question