Solved

Transfer randomly organized table data from Word to Excel

Posted on 2014-01-09
10
548 Views
Last Modified: 2014-01-28
In the attached file;

I have a number of tables with yellow titles. All cells in the table are subsets of the titles. I need to get this data to excel such that Column A contains the titles (repeated for each cell entry) and column B contains the cell contents one per cell.

The problem is that some of the tables have titles as the first cell/row of the table whereas the other tables have the title above the table.

I have coded the table data. Where the title is coded it means it is part of the table. Where the title is the original text the title is not part of the table but is normal text just above the table.

I am looking for a macro to transfer this data from word to excel as described  in the first para.

Or at least a macro to make all the tables consistent in the layout of the title.
TELEPHONE-DIRECTORY.doc
0
Comment
Question by:Saqib Husain, Syed
  • 5
  • 2
10 Comments
 
LVL 11

Expert Comment

by:Angelp1ay
ID: 39788599
I have an almost working version (attached).

You'll need to add Microsoft Scripting Runtime and Microsoft Excel Object Library under tools > references.

This is the macro:
Sub a()
    ' Setup Excel
    Dim exApp As Excel.Application, exWb As Excel.Workbook, exWs As Excel.Worksheet

    Set exApp = New Excel.Application
    Set exWb = exApp.Workbooks.Add
    Set exWs = exWb.ActiveSheet
    exApp.Visible = True
    
    ' Loop all paragraphs
    Dim para As Paragraph, paraIndex As Integer, paras As Scripting.Dictionary, content As String
    
    Set paras = New Scripting.Dictionary
    paraIndex = 1
    For Each para In ActiveDocument.Paragraphs
        content = para.Range.Text
        If Len(content) > 2 Then
            paras.Add paraIndex, content
            paraIndex = paraIndex + 1
        End If
    Next
        
    ' Loop all tables
    Dim t As Table, c As Cell, exRow As Integer
    exRow = 1
    
    paraIndex = 1
    For Each t In ActiveDocument.Tables
        exWs.Cells(exRow, 1).Value = paras.Item(paraIndex)
        paraIndex = paraIndex + 1
        exRow = exRow + 1
        
        For Each c In t.Range.Cells
            content = c.Range.Text
            exWs.Cells(exRow, 2).Value = Left(content, Len(content) - 1)
            paraIndex = paraIndex + 1
            exRow = exRow + 1
        Next
    Next
End Sub

Open in new window


It works up until about Itm400. Something in the structure of document seems to change somewhere around there. If this is a one off perhaps you can export up to there in one batch and after there in another?!
Almost.doc
Export.xls
0
 
LVL 11

Expert Comment

by:Angelp1ay
ID: 39788618
A little explanation:

- The first loop iterates over all paragraphs in the document and puts them into a dictionary with incrementing integer keys
(both those outside tables AND those in tables)

- The second loop iterates through tables and their cells
(i.e. only paragraphs in tables)

- I make the assumption there is one paragraph outside a table for each table

Key             Paragraphs            Table Cells
--------------------------------------------------
1               WILDLIFE...
2               Itm1                  Itm1
3               Itm2                  Itm2
4               Itm3                  Itm3
5               Itm4                  Itm4
6               Itm5                  Itm5
7               Itm6                  Itm6
8               Itm7                  Itm7
9               Itm8                  Itm8
10              Itm9                  Itm9
11              LAHORE ZOO
12              Itm10                 Itm10
...

Open in new window


During the second loop, for each new table, I:
- Dump a paragraph from the dictionary
- Dump out the cells of the table, each time incrementing my paragraph counter and hence skipping items in the dictionary matching these cells

So in the above example I hit table 1, I write out "WILDLIFE...", then I loop the cells writing Itm1-9, and skipping passed them in the dictionary. Then I hit table 2, I write out the next paragraph from the dictionary which is LAHORE ZOO.
0
 
LVL 43

Author Comment

by:Saqib Husain, Syed
ID: 39791037
As I mentioned in the question, the items which are colored, eg 187, 190, 193 are titles and not data.
0
Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

 
LVL 11

Accepted Solution

by:
Angelp1ay earned 500 total points
ID: 39814365
Ok, this one recognises colored titles within the tables too:
Sub ExtractData()
    ' Setup Excel
    Dim exApp As Excel.Application, exWb As Excel.Workbook, exWs As Excel.Worksheet

    Set exApp = New Excel.Application
    Set exWb = exApp.Workbooks.Add
    Set exWs = exWb.ActiveSheet
    exApp.Visible = True
    
    ' Loop all paragraphs
    Dim para As Paragraph, paraIndex As Integer, paras As Scripting.Dictionary, content As String
    
    Set paras = New Scripting.Dictionary
    paraIndex = 1
    For Each para In ActiveDocument.Paragraphs
        content = para.Range.Text
        If Len(content) > 2 Then
            paras.Add paraIndex, content
            paraIndex = paraIndex + 1
        End If
    Next
        
    ' Loop all tables
    Dim t As Table, c As Cell, exRow As Integer
    exRow = 1
    
    paraIndex = 1
    For Each t In ActiveDocument.Tables
        content = paras.Item(paraIndex)
        paraIndex = paraIndex + 1
        
        exWs.Cells(exRow, 1).Value = content
        exRow = exRow + 1
        
        For Each c In t.Range.Cells
            content = c.Range.Text
            content = Left(content, Len(content) - 2)
            
            If c.Shading.BackgroundPatternColorIndex = 7 Then
                exWs.Cells(exRow, 1).Value = content ' Yellow >> Title
            Else
                exWs.Cells(exRow, 2).Value = content ' Not yellow >> Data
            End If
            
            paraIndex = paraIndex + 1
            exRow = exRow + 1
        Next
    Next
    
    MsgBox "Extract Complete"
End Sub

Open in new window


Issues:
- The title "CDA ISLAMABAD (CULTURE COMPLEX)" was split over 2 lines which caused a problem >> it works if you remove the forced line break

- There are some issues later on e.g. just before Itm578, I think due to blank lines between tables. I removed the line between the table starting Itm578 and the previous table and this solved the issue.

They are quite easy to spot in the Excel (see pic below)
Example issue due to differing line breaksAlmost.doc
0
 
LVL 11

Assisted Solution

by:Angelp1ay
Angelp1ay earned 500 total points
ID: 39814399
Final Version! :)
Please do the following:
1) Find-Replace paragraph marks
2) Run the macro "RemoveBlanks"
3) Run the macro "ExtractData"

Find-Replace Blank ParagraphsFinalVersion.doc
Extract.xls
0
 
LVL 43

Author Closing Comment

by:Saqib Husain, Syed
ID: 39816777
Perfect, Thanks.
0
 
LVL 11

Expert Comment

by:Angelp1ay
ID: 39817063
You're welcome!

At first I didn't think it was going to be possible. In the end it was an interesting challenge.

Have a great day :)
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Microsoft Office Picture Manager was included in Office 2003, 2007, and 2010, but not in Office 2013. Users had hopes that it would be in Office 2016/Office 365, but it is not. Fortunately, the same zero-cost technique that works to install it with …
This code takes an Excel list of URL’s and adds a header titled “URL List”. It then searches through all URL’s in column “A”, looking for duplicates. When a duplicate is found, it is moved to the top of the list. The duplicate URL’s are then highlig…
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question