Solved

Parsing tables with VBA

Posted on 2004-10-12
30
1,743 Views
Last Modified: 2007-12-19
Hello experts,

I'm currently involved in a project where I need to convert word documents to seperate notes documents in a Lotus Notes Database.
To do this, I need to parse the word document using VBA OLE programming.

Currently my project is finished to a point that I can open the document and walk through it paragraph by paragraph and translate the text to a degree that I can import all regular text and titles and format it as I should.

Now I get to a point where I need to import the tables in the document.

When walking through the document paragraph by paragraph, when I enter a table, I can see this with the following :
Application.Selection.Information(wdWithinTable) will return true if the paragrap I'm looking at is inside a table.

Now it gets complicated, I can walk through all paragraphs, but I have no idea as to where I am inside the table.
Is there a way to parse the table, once I am inside the table, paragraph by paragraph, cell by cell and in advance see what cells are merged inside the table, using VBA code ?

Any help would be greatly appreciated.
0
Comment
Question by:Jean Marie Geeraerts
  • 16
  • 9
  • 2
30 Comments
 
LVL 3

Expert Comment

by:ehout
Comment Utility
Like paragraphs, tables are also available as part of a collection.

Take for example the following snippet.


Sub test()
  Debug.Print ActiveDocument.Tables.Count
  Debug.Print ActiveDocument.Tables(1).Cell(1, 1)
End Sub

(Debug.print prints to the direct window, if you're not familiar to this, change it to msgbox)
The first line gives the number of tables in the document.
The second and third line count the number of rows and columns in table 1. The last line reads the contents of the first cell of table 1.

Hope this helps,

Kind regards.
0
 
LVL 3

Expert Comment

by:ehout
Comment Utility
Sorry, Forgot to update the code.

Sub test()
  Debug.Print ActiveDocument.Tables.Count
  Debug.Print ActiveDocument.Tables(1).Columns.Count
  Debug.Print ActiveDocument.Tables(1).Rows.Count
  Debug.Print ActiveDocument.Tables(1).Cell(1, 1)
End Sub

kind regards
0
 
LVL 11

Expert Comment

by:Steiner
Comment Utility
Instead of checking whether the current paragraph is inside a table, you could just iterate over the tables-collection:
   Dim Tbl As Table
   
   For Each Tbl In ActiveDocument.Tables
      Debug.Print Tbl.Rows.Count
   Next Tbl

Each table again consists of several Cell-Objects which you can address using Row and Column:
Tbl.Cell(1, 1).Range.Text

Or again, iterate over the cells:
      For Each Cl In Tbl.Range.Cells
         Debug.Print "Cell (Row " & Cl.RowIndex & ", Col " & Cl.ColumnIndex & ") contains " & _
            Cl.Range.Text
      Next Cl

So as a whole this one goes over all tables in a documen, parses all Cells and writes their address and content to the debug window:
Sub Test()
   Dim Tbl As Table, Cl As Cell
   
   For Each Tbl In ActiveDocument.Tables
      Debug.Print Tbl.Rows.Count
      For Each Cl In Tbl.Range.Cells
         Debug.Print "Cell (Row " & Cl.RowIndex & ", Col " & Cl.ColumnIndex & ") contains " & _
            Cl.Range.Text
      Next Cl
   Next Tbl
End Sub

Maybe this helps you getting startet.

Daniel
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
Valuable hints, problem is that I need to parse the whole document from top to bottom, and cut the document in several small documents, depending on the headings that are defined.
Text between headings should be inserted in the current document. When I encounter a new heading, a new document is created and it's placed in a hierarchy.

So you see, I need to go through the document, paragraph by paragraph and when I get to a table, I need to be able to read the table and parse it.

As said, inserting the various text blocks poses no problem, it's when I get to the table, that I encounter a problem.

My body is like this :
For Each Paragraph in ActiveDocument.Paragraphs
   paragraph.Range.Select
   If Application.Selection.Information(wdWithinTable) Then
      Call InsertTable(Application.Selection)
   Else
      Call InsertParagraph(Application.Selection)
   End If
Next

Question:
If I in the routine InsertTable would :
1. Extend the selection to the whole table
2. Then parse the table inside the selection (with the tables property)
3. At the end delete the table

How would this impact the for each loop in the calling routine ?
0
 
LVL 3

Expert Comment

by:ehout
Comment Utility
If you delete the table, you may change the document-layout for the remaining part of the document. Especially when the document has different sections with different paper formats. (e.g. portrait vs landscape). Ild suggest not to delete anything until you've finished parsing the whole document.

By the way, why do you constantly select the paragraph you parse? No need for that unless you especially need it for something?

Maybe you could use the method .next inside the loop.
This looks forward to the next paragraph and may assist you in gathering some info before you enter or leave a table.

Look at this example:
Sub test()
  Dim paragraph As paragraph
  For Each paragraph In ActiveDocument.Paragraphs
    If paragraph.Range.Information(wdWithInTable) Then
     '  Call InsertTable(paragraph.Range)
    Else
     '  Call InsertParagraph(paragraph.Range)
    End If
    On Error Resume Next
    If paragraph.Range.Next.Information(wdWithInTable) Then
      Debug.Print "Next will be inside table"
    Else
      Debug.Print "Next will be outside table"
    End If
    On Error GoTo 0
  Next
End Sub

Hope this helps.
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
Well, I've written a small sample script and apparently it returns okay in the calling for each loop, so no problem with the deletion.
I'm nearly there now, only thing to do now is to get to the individual cell widths and contents.

I used to set the widths like this :
With table
   For i = 1 To .Columns.Count
      rtlibTable.Columns(i).CellWidth = .Columns(i).PreferredWidth * ONE_CM / POINTSPERCENTIMER
   Next
End With

where rtlibTable is my own structure to hold and manipulate the table inside the notes environment and table is the table object inside word.

This works fine until there are merged cells inside the table, then I get an error that you cannot access individual columns since there are mixed column widts.

I'll just have to see if I can read the properties cell per cell and adjust them cell per cell in the notes structure.

Thanks for your help so far!!
0
 
LVL 3

Expert Comment

by:ehout
Comment Utility
Hmm... tried it here, but I get no errors when referencing inidividual cells with ActiveDocument.Tables(1).Cell(2, 2) (--> where 2,2 = row , column)

So the problem is probably you are referencing the column as a whole.

Maybe you should set only the widths of individual cells and nog the columns?

Or use errorhandling to catch this. Try the column option first. If this fails goto to the errorhandler and process the individual cells.

Kind regards
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
You are absolutely correct.
It's when I try to set the column widths that I get the error. I guess this is to be expected, since you can't select a column with merged cells either.

I am getting very near to the solution now :-D

I'll see how things behave if I use individual cell widths.
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
Is there a way to see if the current cell is a merged cell and then to see from where to where it is merged ?
Like for example if in a table I merge cell from (2,2) to (3,3) ?
0
 
LVL 11

Expert Comment

by:Steiner
Comment Utility
I don't think there is way as in Word you can create the cells as you wish. In Excel you have the basic cells with a given width for each column that is the same for each row. In Word you can have different column widths in each row, so it is impossible to differ between a wide column 1 or a merge from column 1 and 2.

So I don't believe that Word stores information from where the cell obtained it's current form (maybe merging, maybe just increasing the width for that cell...) and therefore you might not be able to determine the difference.

But these are just my 2 cents...
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
I'm still looking into this, If I find a solution (or somebody else suggests one) i'll sure post it here for future reference.

At the moment I've got the code working as to get the text from the individual cells in my notes object, now I'm just trying to get the formatting over (don't need to help me here, I've got that covered, just takes some work) and have to figure out what cells were merged and how.

This final step will get me to the finish line.

I'm not looking into nested tables for the moment, that would probably get me in even more trouble ;-)
0
 
LVL 3

Accepted Solution

by:
ehout earned 500 total points
Comment Utility
Maybe you could use a loop to find out the number of cells in a row. But the only info you then have is the number of cells.
You still need to process each cell individually in order to calculate the width.

I think that's the best way to go. Just loop through the table and store the row and column number and width of each cell. This way you can calculate the dimensions of the table you need to create. If really want to create the layout as close as possible, you should compare each row with the previous one and use the split and merge methods on the cells to achieve what you want.

But keep in mind that tables in Word are creeps. Even the outside borders don't have to be straight.The table can be wider in one row that another, depending (for example) if a user deleted cells or not.

Kind regards
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
Sounds like this will be my best bet. I'll look into it and get back to you later. (Might be tomorrow)
Thanks for all the help!
0
A Knowledge Base That Stays Up-to-Date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
I'm sorry I didn't get back to you sooner, but another project had priority over this one.
I'm still working on it. I hope to get some progress this week...
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
Okay, here I am again.
I'm a bit snowed in with work at the moment and on Friday I'm leaving for my vacation to Costa Rica, so I was wondering if any of you guys have some spare time to write out a vba routine that will give me the following information :
- the width of each column in the table (we can assume that there are no missing cells in any column, only merged ones)
- a list of merged cells in a format like this x1,y1,x2,y2 where x1,y1 are the coördinates of where the merged cell starts and x2,y2 are the coördinates where the merged cell ends.

It turns out to be quite complicated to determin the width of columns if you have both merged cells, spanning columns and rows :(
Once you have any merged cells, you can no longer access an individual row / column, so you are stuck going through only cells in a table.

Any help would be greatly appreciated since I am a Lotus Notes Developer and not a MS Office VBA expert :(
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
Okay, I've gotten to the point where I have found an algorithm to determin the width of cells :
Here's the VBA-code for that part already :

Dim intCellWidths() As Integer
   Dim intColumnWidths() As Integer
   Dim intRows As Integer, intColumns As Integer
   Dim intRow As Integer, intColumn As Integer
   Dim intPrevRow As Integer, intPrevColumn As Integer
   
   Dim strMessage As String
   
   For Each Table In ActiveDocument.Tables
      intRows = Table.Rows.Count
      intColumns = Table.Columns.Count
      intPrevRow = 1
      ReDim intCellWidths(1 To intRows, 1 To intColumns)
      ReDim intColumnWidths(1 To intColumns)
      For Each vCell In Table.Range.Cells
         intRow = vCell.RowIndex
         intColumn = vCell.ColumnIndex
         intCellWidths(intRow, intColumn) = vCell.PreferredWidth
         If intPrevRow <> intRow Then
            If intPrevColumn < intColumns Then
               For i = 1 To intColumns
                  intCellWidths(intPrevRow, i) = 0
               Next
            End If
            intPrevRow = intRow
         End If
         intPrevColumn = intColumn
      Next
      For j = 1 To intColumns
         intColumnWidths(j) = intCellWidths(1, j)
         For i = 2 To intRows
            If intCellWidths(i, j) < intColumnWidths(j) And intCellWidths(i, j) <> 0 Then
               intColumnWidths(j) = intCellWidths(i, j)
            End If
         Next
      Next
   Next
   
   strMessage = ""
   For i = 1 To intColumns
      strMessage = strMessage & Chr(10) & intColumnWidths(i)
   Next
   MsgBox "Column widths :" & strMessage

Above code will display the individual widths in points for each column.
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
Small correction for above script.
In order to display the results per table, the msgbox part should be inside the for each table loop:

   Dim intCellWidths() As Integer
   Dim intColumnWidths() As Integer
   Dim intRows As Integer, intColumns As Integer
   Dim intRow As Integer, intColumn As Integer
   Dim intPrevRow As Integer, intPrevColumn As Integer
   
   Dim strMessage As String
   
   For Each Table In ActiveDocument.Tables
      intRows = Table.Rows.Count
      intColumns = Table.Columns.Count
      intPrevRow = 1
      ReDim intCellWidths(1 To intRows, 1 To intColumns)
      ReDim intColumnWidths(1 To intColumns)
      For Each vCell In Table.Range.Cells
         intRow = vCell.RowIndex
         intColumn = vCell.ColumnIndex
         intCellWidths(intRow, intColumn) = vCell.PreferredWidth
         If intPrevRow <> intRow Then
            If intPrevColumn < intColumns Then
               For i = 1 To intColumns
                  intCellWidths(intPrevRow, i) = 0
               Next
            End If
            intPrevRow = intRow
         End If
         intPrevColumn = intColumn
      Next
      For j = 1 To intColumns
         intColumnWidths(j) = intCellWidths(1, j)
         For i = 2 To intRows
            If intCellWidths(i, j) < intColumnWidths(j) And intCellWidths(i, j) <> 0 Then
               intColumnWidths(j) = intCellWidths(i, j)
            End If
         Next
      Next
      strMessage = ""
      For i = 1 To intColumns
         strMessage = strMessage & Chr(10) & intColumnWidths(i)
      Next
      MsgBox "Column widths :" & strMessage
   Next
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
I still didn't get a 100% answer to my question, thus I didn't award points yet.
I'd appreciate it if anybody could tell me how I can find out in word vba which cells in a table have been merged, so I can derive an algorithm from there to parse the table.
0
 
LVL 3

Expert Comment

by:ehout
Comment Utility
Sorry Jerrith,
Haven't had the time to dive into this topic any further.

If it can wait till the 2nd half of january I may have time to pick it up again. Maybe another expert will help in the meantime.

To Dalorean:
My recommendation would be, if possible keep this question open for a while. If it really has to be closed, then PAQ and refund. There is interesting info/code in this topic, so it may be helpful for others too. Refund seems fair considering jerriths own progression in this.

Kind regards
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
Thanks ehout,
It can probably wait until then since I have urged my users not to use merged tables for the time being and they are okay with that for now.
I have thought about the solution with counting the number of cells per column, but this would get very complicated to determin what cells are actually merged and then I'd still have to consider cells that are merged through rows or even worse for cells that are merged both for rows and columns.
It gave me too much trouble, so I abandoned the idea until I had more time to investigate :-)

Have a nice year end and see you in January \:-D
0
 
LVL 3

Expert Comment

by:ehout
Comment Utility
Hi,

Yeah, I know. It should be forbidden to use merge cells and stuff like that anyway. Would make life a lot easier.

Not to mention the fact that every single row,  column or cell can have it's very own dimensions and lay-out.

Happy new year for now!

Kind regards
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
My thoughs exactly :-)
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
Please leave this question open to allow ehout to get back to this when he has the time.
0
 
LVL 3

Expert Comment

by:ehout
Comment Utility
Hi,

Sorry, but couldn't taken the time to investigate further. My company is in the process of a merger so many conversion projects from one system to another has taken my attention.

Nonetheless my recommendation would still be PAQ with refund, but see jerrith thinks from this himself

Kind regards
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
Hm, Why not award you the points for the trouble you went through so far.
I managed to convince people to not use merged cells for the time being, so the problem is bypassed.
0
 
LVL 3

Expert Comment

by:ehout
Comment Utility
Well... I'm surprised. Thanks a lot.

It's good you convinced people not to use the merged cells.
Sometimes people need to learn that when a posibility in a program exists, this does NOT mean it's always handy to use it.

(I'm seeing this time after time in the conversion processes I'm going through at the moment. Some things could make you walk straight out of the window on the third floor, for example, a person who does not understand that his 1 GB mail archive does NOT fit in a 100 MB regular mailbox)

Good luck.
0
 
LVL 8

Author Comment

by:Jean Marie Geeraerts
Comment Utility
I know what you mean. End users, huh ;-D
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Introduction Authors who set out to write any sort of lengthy piece for online submission—be it a long question or comment on a technical form, an article, or a substantial blog entry—often find it useful to work up a draft in an editor other t…
It is often necessary in this forum and others to illustrate Word fields as text with the field delimiters replaced with the curly brackets that the delimiters resemble when field codes are being displayed on the document. This means that the text c…
In this video, we show how to convert an image-only PDF file into a PDF Searchable Image file, that is, a file with both the image (typically from scanning) and text, which is created in an automated fashion with Optical Character Recognition (OCR) …
This video shows the viewer how to set up and create Footnotes in their document. Click on the References tab: Select "Insert Footnote": Type in desired text:

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now