Transfer randomly organized table data from Word to Excel

In the attached file;

I have a number of tables with yellow titles. All cells in the table are subsets of the titles. I need to get this data to excel such that Column A contains the titles (repeated for each cell entry) and column B contains the cell contents one per cell.

The problem is that some of the tables have titles as the first cell/row of the table whereas the other tables have the title above the table.

I have coded the table data. Where the title is coded it means it is part of the table. Where the title is the original text the title is not part of the table but is normal text just above the table.

I am looking for a macro to transfer this data from word to excel as described  in the first para.

Or at least a macro to make all the tables consistent in the layout of the title.
TELEPHONE-DIRECTORY.doc
LVL 43
Saqib Husain, SyedEngineerAsked:
Who is Participating?
 
Angelp1ayConnect With a Mentor Commented:
Ok, this one recognises colored titles within the tables too:
Sub ExtractData()
    ' Setup Excel
    Dim exApp As Excel.Application, exWb As Excel.Workbook, exWs As Excel.Worksheet

    Set exApp = New Excel.Application
    Set exWb = exApp.Workbooks.Add
    Set exWs = exWb.ActiveSheet
    exApp.Visible = True
    
    ' Loop all paragraphs
    Dim para As Paragraph, paraIndex As Integer, paras As Scripting.Dictionary, content As String
    
    Set paras = New Scripting.Dictionary
    paraIndex = 1
    For Each para In ActiveDocument.Paragraphs
        content = para.Range.Text
        If Len(content) > 2 Then
            paras.Add paraIndex, content
            paraIndex = paraIndex + 1
        End If
    Next
        
    ' Loop all tables
    Dim t As Table, c As Cell, exRow As Integer
    exRow = 1
    
    paraIndex = 1
    For Each t In ActiveDocument.Tables
        content = paras.Item(paraIndex)
        paraIndex = paraIndex + 1
        
        exWs.Cells(exRow, 1).Value = content
        exRow = exRow + 1
        
        For Each c In t.Range.Cells
            content = c.Range.Text
            content = Left(content, Len(content) - 2)
            
            If c.Shading.BackgroundPatternColorIndex = 7 Then
                exWs.Cells(exRow, 1).Value = content ' Yellow >> Title
            Else
                exWs.Cells(exRow, 2).Value = content ' Not yellow >> Data
            End If
            
            paraIndex = paraIndex + 1
            exRow = exRow + 1
        Next
    Next
    
    MsgBox "Extract Complete"
End Sub

Open in new window


Issues:
- The title "CDA ISLAMABAD (CULTURE COMPLEX)" was split over 2 lines which caused a problem >> it works if you remove the forced line break

- There are some issues later on e.g. just before Itm578, I think due to blank lines between tables. I removed the line between the table starting Itm578 and the previous table and this solved the issue.

They are quite easy to spot in the Excel (see pic below)
Example issue due to differing line breaksAlmost.doc
0
 
Angelp1ayCommented:
I have an almost working version (attached).

You'll need to add Microsoft Scripting Runtime and Microsoft Excel Object Library under tools > references.

This is the macro:
Sub a()
    ' Setup Excel
    Dim exApp As Excel.Application, exWb As Excel.Workbook, exWs As Excel.Worksheet

    Set exApp = New Excel.Application
    Set exWb = exApp.Workbooks.Add
    Set exWs = exWb.ActiveSheet
    exApp.Visible = True
    
    ' Loop all paragraphs
    Dim para As Paragraph, paraIndex As Integer, paras As Scripting.Dictionary, content As String
    
    Set paras = New Scripting.Dictionary
    paraIndex = 1
    For Each para In ActiveDocument.Paragraphs
        content = para.Range.Text
        If Len(content) > 2 Then
            paras.Add paraIndex, content
            paraIndex = paraIndex + 1
        End If
    Next
        
    ' Loop all tables
    Dim t As Table, c As Cell, exRow As Integer
    exRow = 1
    
    paraIndex = 1
    For Each t In ActiveDocument.Tables
        exWs.Cells(exRow, 1).Value = paras.Item(paraIndex)
        paraIndex = paraIndex + 1
        exRow = exRow + 1
        
        For Each c In t.Range.Cells
            content = c.Range.Text
            exWs.Cells(exRow, 2).Value = Left(content, Len(content) - 1)
            paraIndex = paraIndex + 1
            exRow = exRow + 1
        Next
    Next
End Sub

Open in new window


It works up until about Itm400. Something in the structure of document seems to change somewhere around there. If this is a one off perhaps you can export up to there in one batch and after there in another?!
Almost.doc
Export.xls
0
 
Angelp1ayCommented:
A little explanation:

- The first loop iterates over all paragraphs in the document and puts them into a dictionary with incrementing integer keys
(both those outside tables AND those in tables)

- The second loop iterates through tables and their cells
(i.e. only paragraphs in tables)

- I make the assumption there is one paragraph outside a table for each table

Key             Paragraphs            Table Cells
--------------------------------------------------
1               WILDLIFE...
2               Itm1                  Itm1
3               Itm2                  Itm2
4               Itm3                  Itm3
5               Itm4                  Itm4
6               Itm5                  Itm5
7               Itm6                  Itm6
8               Itm7                  Itm7
9               Itm8                  Itm8
10              Itm9                  Itm9
11              LAHORE ZOO
12              Itm10                 Itm10
...

Open in new window


During the second loop, for each new table, I:
- Dump a paragraph from the dictionary
- Dump out the cells of the table, each time incrementing my paragraph counter and hence skipping items in the dictionary matching these cells

So in the above example I hit table 1, I write out "WILDLIFE...", then I loop the cells writing Itm1-9, and skipping passed them in the dictionary. Then I hit table 2, I write out the next paragraph from the dictionary which is LAHORE ZOO.
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
Saqib Husain, SyedEngineerAuthor Commented:
As I mentioned in the question, the items which are colored, eg 187, 190, 193 are titles and not data.
0
 
Angelp1ayConnect With a Mentor Commented:
Final Version! :)
Please do the following:
1) Find-Replace paragraph marks
2) Run the macro "RemoveBlanks"
3) Run the macro "ExtractData"

Find-Replace Blank ParagraphsFinalVersion.doc
Extract.xls
0
 
Saqib Husain, SyedEngineerAuthor Commented:
Perfect, Thanks.
0
 
Angelp1ayCommented:
You're welcome!

At first I didn't think it was going to be possible. In the end it was an interesting challenge.

Have a great day :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.