Solved

Transfer randomly organized table data from Word to Excel

Posted on 2014-01-09
10
533 Views
Last Modified: 2014-01-28
In the attached file;

I have a number of tables with yellow titles. All cells in the table are subsets of the titles. I need to get this data to excel such that Column A contains the titles (repeated for each cell entry) and column B contains the cell contents one per cell.

The problem is that some of the tables have titles as the first cell/row of the table whereas the other tables have the title above the table.

I have coded the table data. Where the title is coded it means it is part of the table. Where the title is the original text the title is not part of the table but is normal text just above the table.

I am looking for a macro to transfer this data from word to excel as described  in the first para.

Or at least a macro to make all the tables consistent in the layout of the title.
TELEPHONE-DIRECTORY.doc
0
Comment
Question by:Saqib Husain, Syed
  • 5
  • 2
10 Comments
 
LVL 11

Expert Comment

by:Angelp1ay
ID: 39788599
I have an almost working version (attached).

You'll need to add Microsoft Scripting Runtime and Microsoft Excel Object Library under tools > references.

This is the macro:
Sub a()
    ' Setup Excel
    Dim exApp As Excel.Application, exWb As Excel.Workbook, exWs As Excel.Worksheet

    Set exApp = New Excel.Application
    Set exWb = exApp.Workbooks.Add
    Set exWs = exWb.ActiveSheet
    exApp.Visible = True
    
    ' Loop all paragraphs
    Dim para As Paragraph, paraIndex As Integer, paras As Scripting.Dictionary, content As String
    
    Set paras = New Scripting.Dictionary
    paraIndex = 1
    For Each para In ActiveDocument.Paragraphs
        content = para.Range.Text
        If Len(content) > 2 Then
            paras.Add paraIndex, content
            paraIndex = paraIndex + 1
        End If
    Next
        
    ' Loop all tables
    Dim t As Table, c As Cell, exRow As Integer
    exRow = 1
    
    paraIndex = 1
    For Each t In ActiveDocument.Tables
        exWs.Cells(exRow, 1).Value = paras.Item(paraIndex)
        paraIndex = paraIndex + 1
        exRow = exRow + 1
        
        For Each c In t.Range.Cells
            content = c.Range.Text
            exWs.Cells(exRow, 2).Value = Left(content, Len(content) - 1)
            paraIndex = paraIndex + 1
            exRow = exRow + 1
        Next
    Next
End Sub

Open in new window


It works up until about Itm400. Something in the structure of document seems to change somewhere around there. If this is a one off perhaps you can export up to there in one batch and after there in another?!
Almost.doc
Export.xls
0
 
LVL 11

Expert Comment

by:Angelp1ay
ID: 39788618
A little explanation:

- The first loop iterates over all paragraphs in the document and puts them into a dictionary with incrementing integer keys
(both those outside tables AND those in tables)

- The second loop iterates through tables and their cells
(i.e. only paragraphs in tables)

- I make the assumption there is one paragraph outside a table for each table

Key             Paragraphs            Table Cells
--------------------------------------------------
1               WILDLIFE...
2               Itm1                  Itm1
3               Itm2                  Itm2
4               Itm3                  Itm3
5               Itm4                  Itm4
6               Itm5                  Itm5
7               Itm6                  Itm6
8               Itm7                  Itm7
9               Itm8                  Itm8
10              Itm9                  Itm9
11              LAHORE ZOO
12              Itm10                 Itm10
...

Open in new window


During the second loop, for each new table, I:
- Dump a paragraph from the dictionary
- Dump out the cells of the table, each time incrementing my paragraph counter and hence skipping items in the dictionary matching these cells

So in the above example I hit table 1, I write out "WILDLIFE...", then I loop the cells writing Itm1-9, and skipping passed them in the dictionary. Then I hit table 2, I write out the next paragraph from the dictionary which is LAHORE ZOO.
0
 
LVL 43

Author Comment

by:Saqib Husain, Syed
ID: 39791037
As I mentioned in the question, the items which are colored, eg 187, 190, 193 are titles and not data.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 11

Accepted Solution

by:
Angelp1ay earned 500 total points
ID: 39814365
Ok, this one recognises colored titles within the tables too:
Sub ExtractData()
    ' Setup Excel
    Dim exApp As Excel.Application, exWb As Excel.Workbook, exWs As Excel.Worksheet

    Set exApp = New Excel.Application
    Set exWb = exApp.Workbooks.Add
    Set exWs = exWb.ActiveSheet
    exApp.Visible = True
    
    ' Loop all paragraphs
    Dim para As Paragraph, paraIndex As Integer, paras As Scripting.Dictionary, content As String
    
    Set paras = New Scripting.Dictionary
    paraIndex = 1
    For Each para In ActiveDocument.Paragraphs
        content = para.Range.Text
        If Len(content) > 2 Then
            paras.Add paraIndex, content
            paraIndex = paraIndex + 1
        End If
    Next
        
    ' Loop all tables
    Dim t As Table, c As Cell, exRow As Integer
    exRow = 1
    
    paraIndex = 1
    For Each t In ActiveDocument.Tables
        content = paras.Item(paraIndex)
        paraIndex = paraIndex + 1
        
        exWs.Cells(exRow, 1).Value = content
        exRow = exRow + 1
        
        For Each c In t.Range.Cells
            content = c.Range.Text
            content = Left(content, Len(content) - 2)
            
            If c.Shading.BackgroundPatternColorIndex = 7 Then
                exWs.Cells(exRow, 1).Value = content ' Yellow >> Title
            Else
                exWs.Cells(exRow, 2).Value = content ' Not yellow >> Data
            End If
            
            paraIndex = paraIndex + 1
            exRow = exRow + 1
        Next
    Next
    
    MsgBox "Extract Complete"
End Sub

Open in new window


Issues:
- The title "CDA ISLAMABAD (CULTURE COMPLEX)" was split over 2 lines which caused a problem >> it works if you remove the forced line break

- There are some issues later on e.g. just before Itm578, I think due to blank lines between tables. I removed the line between the table starting Itm578 and the previous table and this solved the issue.

They are quite easy to spot in the Excel (see pic below)
Example issue due to differing line breaksAlmost.doc
0
 
LVL 11

Assisted Solution

by:Angelp1ay
Angelp1ay earned 500 total points
ID: 39814399
Final Version! :)
Please do the following:
1) Find-Replace paragraph marks
2) Run the macro "RemoveBlanks"
3) Run the macro "ExtractData"

Find-Replace Blank ParagraphsFinalVersion.doc
Extract.xls
0
 
LVL 43

Author Closing Comment

by:Saqib Husain, Syed
ID: 39816777
Perfect, Thanks.
0
 
LVL 11

Expert Comment

by:Angelp1ay
ID: 39817063
You're welcome!

At first I didn't think it was going to be possible. In the end it was an interesting challenge.

Have a great day :)
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will guide you to convert a grid from a picture into Excel format using Microsoft OneNote and no other 3rd party application.
Some code to ensure data integrity when using macros within Excel. Also included code that helps secure your data within an Excel workbook.
This Micro Tutorial will demonstrate how to use longer labels with horizontal bar charts instead of the vertical column chart.
Many functions in Excel can make decisions. The most simple of these is the IF function: it returns a value depending on whether a condition you describe is true or false. Once you get the hang of using the IF function, you will find it easier to us…

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

27 Experts available now in Live!

Get 1:1 Help Now