Solved

Transfer randomly organized table data from Word to Excel

Posted on 2014-01-09
10
517 Views
Last Modified: 2014-01-28
In the attached file;

I have a number of tables with yellow titles. All cells in the table are subsets of the titles. I need to get this data to excel such that Column A contains the titles (repeated for each cell entry) and column B contains the cell contents one per cell.

The problem is that some of the tables have titles as the first cell/row of the table whereas the other tables have the title above the table.

I have coded the table data. Where the title is coded it means it is part of the table. Where the title is the original text the title is not part of the table but is normal text just above the table.

I am looking for a macro to transfer this data from word to excel as described  in the first para.

Or at least a macro to make all the tables consistent in the layout of the title.
TELEPHONE-DIRECTORY.doc
0
Comment
Question by:Saqib Husain, Syed
  • 5
  • 2
10 Comments
 
LVL 11

Expert Comment

by:Angelp1ay
Comment Utility
I have an almost working version (attached).

You'll need to add Microsoft Scripting Runtime and Microsoft Excel Object Library under tools > references.

This is the macro:
Sub a()
    ' Setup Excel
    Dim exApp As Excel.Application, exWb As Excel.Workbook, exWs As Excel.Worksheet

    Set exApp = New Excel.Application
    Set exWb = exApp.Workbooks.Add
    Set exWs = exWb.ActiveSheet
    exApp.Visible = True
    
    ' Loop all paragraphs
    Dim para As Paragraph, paraIndex As Integer, paras As Scripting.Dictionary, content As String
    
    Set paras = New Scripting.Dictionary
    paraIndex = 1
    For Each para In ActiveDocument.Paragraphs
        content = para.Range.Text
        If Len(content) > 2 Then
            paras.Add paraIndex, content
            paraIndex = paraIndex + 1
        End If
    Next
        
    ' Loop all tables
    Dim t As Table, c As Cell, exRow As Integer
    exRow = 1
    
    paraIndex = 1
    For Each t In ActiveDocument.Tables
        exWs.Cells(exRow, 1).Value = paras.Item(paraIndex)
        paraIndex = paraIndex + 1
        exRow = exRow + 1
        
        For Each c In t.Range.Cells
            content = c.Range.Text
            exWs.Cells(exRow, 2).Value = Left(content, Len(content) - 1)
            paraIndex = paraIndex + 1
            exRow = exRow + 1
        Next
    Next
End Sub

Open in new window


It works up until about Itm400. Something in the structure of document seems to change somewhere around there. If this is a one off perhaps you can export up to there in one batch and after there in another?!
Almost.doc
Export.xls
0
 
LVL 11

Expert Comment

by:Angelp1ay
Comment Utility
A little explanation:

- The first loop iterates over all paragraphs in the document and puts them into a dictionary with incrementing integer keys
(both those outside tables AND those in tables)

- The second loop iterates through tables and their cells
(i.e. only paragraphs in tables)

- I make the assumption there is one paragraph outside a table for each table

Key             Paragraphs            Table Cells
--------------------------------------------------
1               WILDLIFE...
2               Itm1                  Itm1
3               Itm2                  Itm2
4               Itm3                  Itm3
5               Itm4                  Itm4
6               Itm5                  Itm5
7               Itm6                  Itm6
8               Itm7                  Itm7
9               Itm8                  Itm8
10              Itm9                  Itm9
11              LAHORE ZOO
12              Itm10                 Itm10
...

Open in new window


During the second loop, for each new table, I:
- Dump a paragraph from the dictionary
- Dump out the cells of the table, each time incrementing my paragraph counter and hence skipping items in the dictionary matching these cells

So in the above example I hit table 1, I write out "WILDLIFE...", then I loop the cells writing Itm1-9, and skipping passed them in the dictionary. Then I hit table 2, I write out the next paragraph from the dictionary which is LAHORE ZOO.
0
 
LVL 43

Author Comment

by:Saqib Husain, Syed
Comment Utility
As I mentioned in the question, the items which are colored, eg 187, 190, 193 are titles and not data.
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 11

Accepted Solution

by:
Angelp1ay earned 500 total points
Comment Utility
Ok, this one recognises colored titles within the tables too:
Sub ExtractData()
    ' Setup Excel
    Dim exApp As Excel.Application, exWb As Excel.Workbook, exWs As Excel.Worksheet

    Set exApp = New Excel.Application
    Set exWb = exApp.Workbooks.Add
    Set exWs = exWb.ActiveSheet
    exApp.Visible = True
    
    ' Loop all paragraphs
    Dim para As Paragraph, paraIndex As Integer, paras As Scripting.Dictionary, content As String
    
    Set paras = New Scripting.Dictionary
    paraIndex = 1
    For Each para In ActiveDocument.Paragraphs
        content = para.Range.Text
        If Len(content) > 2 Then
            paras.Add paraIndex, content
            paraIndex = paraIndex + 1
        End If
    Next
        
    ' Loop all tables
    Dim t As Table, c As Cell, exRow As Integer
    exRow = 1
    
    paraIndex = 1
    For Each t In ActiveDocument.Tables
        content = paras.Item(paraIndex)
        paraIndex = paraIndex + 1
        
        exWs.Cells(exRow, 1).Value = content
        exRow = exRow + 1
        
        For Each c In t.Range.Cells
            content = c.Range.Text
            content = Left(content, Len(content) - 2)
            
            If c.Shading.BackgroundPatternColorIndex = 7 Then
                exWs.Cells(exRow, 1).Value = content ' Yellow >> Title
            Else
                exWs.Cells(exRow, 2).Value = content ' Not yellow >> Data
            End If
            
            paraIndex = paraIndex + 1
            exRow = exRow + 1
        Next
    Next
    
    MsgBox "Extract Complete"
End Sub

Open in new window


Issues:
- The title "CDA ISLAMABAD (CULTURE COMPLEX)" was split over 2 lines which caused a problem >> it works if you remove the forced line break

- There are some issues later on e.g. just before Itm578, I think due to blank lines between tables. I removed the line between the table starting Itm578 and the previous table and this solved the issue.

They are quite easy to spot in the Excel (see pic below)
Example issue due to differing line breaksAlmost.doc
0
 
LVL 11

Assisted Solution

by:Angelp1ay
Angelp1ay earned 500 total points
Comment Utility
Final Version! :)
Please do the following:
1) Find-Replace paragraph marks
2) Run the macro "RemoveBlanks"
3) Run the macro "ExtractData"

Find-Replace Blank ParagraphsFinalVersion.doc
Extract.xls
0
 
LVL 43

Author Closing Comment

by:Saqib Husain, Syed
Comment Utility
Perfect, Thanks.
0
 
LVL 11

Expert Comment

by:Angelp1ay
Comment Utility
You're welcome!

At first I didn't think it was going to be possible. In the end it was an interesting challenge.

Have a great day :)
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Introduction This Article briefly covers methods of calculating the NPV and IRR variants in Excel as well as the limitations in calculating and interpreting IRR results. Paraphrasing Richard Shockley, author of my favourite finance reference tex…
A few years ago I was very much a beginner at VBA, and that very much remains the case today.  I'll do my best to explain things as I go in the hope that other beginners can follow.  If you just want to check out a tool that creates a Select Case fu…
This Micro Tutorial demonstrates in Microsoft Excel how to consolidate your marketing data by creating an interactive charts using form controls. This creates cool drop-downs for viewers of your chart to choose from.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now