Solved

With VBA is there a faster way to extract data from a Word table

Posted on 2013-12-17
16
1,213 Views
Last Modified: 2013-12-20
I have a table in Word that has about 10,500 cells (15 columns, 700 rows). With VBA I want to extract the information in about 80% of those cells. Each time I do so, I have to chop the last two characters off of the string:

Sub Demo()
Dim x As String
Dim i As Integer
Dim j As Integer
Dim Tbl As Table
      Set Tbl = ActiveDocument.Tables(1)
      For i = 1 To Tbl.Rows.Count
            For j = 1 To Tbl.Columns.Count
                  x = Tbl.Cell(i, j).Range.Text     ' Get the content of the cell
                  x = Left(x, Len(x) - 2)              ' Get rid of useless stuff at the end
            Next j
      Next i
End Sub
     It takes a lot of time to go through each cell chopping off the last two characters. Is there any better way to get the content of a number of cells? Would it be faster to convert each row to a string and use Split to extract the information into an array? If so, then how?
Thanks!
--j.r. in Priddis, Alberta
0
Comment
Question by:JohnRobinAllen
  • 6
  • 6
  • 2
  • +1
16 Comments
 
LVL 13

Expert Comment

by:Alexander Eßer [Alex140181]
ID: 39724102
0
 
LVL 76

Assisted Solution

by:GrahamSkan
GrahamSkan earned 50 total points
ID: 39724154
I have a function which I use frequently to do the job, but I don't suppose that it is much faster:
Function GetCellText(cl As Word.Cell) As String
    Dim rng As Range
    Set rng = cl.Range
    
    'Drop cell delimiter
    rng.MoveEnd wdCharacter, -1
    GetCellText = rng.Text
End Function

Open in new window

When stepping through Word object collections is usually faster in to use the For Each construct instead of indexing, so try something like this:
Sub Demo()
    Dim strArray() As String
    Dim cl As Cell
    Dim tbl As Table
    Dim rw As Row
    
    Set tbl = ActiveDocument.Tables(1)
    ReDim strArray(tbl.Rows.Count, tbl.Columns.Count)
    For Each rw In tbl.Rows
        For Each cl In rw.Cells
            strArray(cl.RowIndex, cl.ColumnIndex) = GetCellText(cl)
        Next cl
    Next rw
End Sub

Open in new window


Depending on your situation, it might also be worth trying to work with (a copy of?) the table after converting it to text.
0
 
LVL 13

Assisted Solution

by:Alexander Eßer [Alex140181]
Alexander Eßer [Alex140181] earned 50 total points
ID: 39724196
As long as you iterate over/through the VBA objects (like Cell, Table etc...), you'll hit performance issues processing large/big data.
Copy your workset into memory (or at least chunks of it) and use them instead ;-)
0
 
LVL 59

Expert Comment

by:Chris Bottomley
ID: 39724324
John

This can be done to the same as your own snippets scope as:

Sub testTable()
Dim arr As Variant
Dim intcols As Integer
Dim lngRows As Long
Dim lngCounter As Long
    
    lngRows = ActiveDocument.Tables(1).Rows.Count
    intcols = ActiveDocument.Tables(1).Columns.Count
    arr = Split(Replace(ActiveDocument.Tables(1).Range.Text, Chr(7), ""), Chr(13))
    For rw = 1 To lngRows
        For col = 1 To intcols
            Debug.Print "Table 1, Row " & rw & ", column " & col; " data is " & arr(lngCounter)
            lngCounter = lngCounter + 1
        Next
        lngCounter = lngCounter + 1
    Next
End Sub

Open in new window


Chris
0
 
LVL 59

Expert Comment

by:Chris Bottomley
ID: 39724374
For info I placed a simple table and compared the timing (1s accuracy) and for a table of about 3000 cells I got 3s for my code and about 37s for yours.

Sub testTable()
Dim varStart As Variant
Dim varStop As Variant

Dim arr As Variant
Dim intcols As Integer
Dim lngRows As Long
Dim lngCounter As Long
    
    varStart = Now
    lngRows = ActiveDocument.Tables(1).Rows.Count
    intcols = ActiveDocument.Tables(1).Columns.Count
    arr = Split(Replace(ActiveDocument.Tables(1).Range.Text, Chr(7), ""), Chr(13))
    For rw = 1 To lngRows
        For col = 1 To intcols
            Debug.Print "Table 1, Row " & rw & ", column " & col; " data is " & arr(lngCounter)
            lngCounter = lngCounter + 1
        Next
        lngCounter = lngCounter + 1
    Next
    varStop = Now
    MsgBox ("Elapsed time was " & DateDiff("s", varStart, varStop) & " seconds.")
End Sub

Sub Demo()
Dim varStart As Variant
Dim varStop As Variant

Dim x As String
Dim i As Integer
Dim j As Integer
Dim Tbl As Table
            
    varStart = Now()
      
      Set Tbl = ActiveDocument.Tables(1)
      For i = 1 To Tbl.Rows.Count
            For j = 1 To Tbl.Columns.Count
                  x = Tbl.Cell(i, j).Range.Text     ' Get the content of the cell
                  x = Left(x, Len(x) - 2)              ' Get rid of useless stuff at the end
            Next j
      Next i
    varStop = Now()
    MsgBox ("Elapsed time was " & DateDiff("s", varStart, varStop) & " seconds.")
End Sub

Open in new window


Chris
0
 

Author Comment

by:JohnRobinAllen
ID: 39725392
Chris Bottomley's solution is elegant. I'm working on adapting it to my own needs which are to record into a document variable the data from selected columns in my table. The essence of Chris's solution is to convert the table as a whole into accessible data. I can use the same technique to convert a Row into an array, but when I try to use that with a column, the computer balks. Row.Range.Text is OK but Column.Range.Text is not.
     What I have to do is figure out how to get information from specific columns, which should not be too hard, but then to time whether it would be better to to store the entire table as a single document variable or take the time to extract only the needed columns. I should have that solved by tomorrow afternoon and then I'll report and assign credit.
    Thanks for all the help from Chris, Graham, and Alex. It is really helping me.
0
 
LVL 59

Accepted Solution

by:
Chris Bottomley earned 400 total points
ID: 39726572
ON a quick test i'd still grab the whole table and only process the columns if its one of those i'm interested in i.e. arrCols defines columns 2 and 5, skipping over the other columns will be quickly done in the array ....

Sub testTable2()
Dim arr As Variant
Dim intcols As Integer
Dim lngRows As Long
Dim lngCounter As Long
Dim arrCols() As Variant
Dim dict As Object
    
    Set dict = CreateObject("scripting.dictionary")
    arrCols = Array(2, 5)
    For Each itm In arrCols
        If Not dict.Exists(itm) Then
            dict.Add itm, itm
        End If
    Next
    lngRows = ActiveDocument.Tables(1).Rows.Count
    intcols = ActiveDocument.Tables(1).Columns.Count
    arr = Split(Replace(ActiveDocument.Tables(1).Range.Text, Chr(7), ""), Chr(13))
    For rw = 1 To lngRows
        For col = 1 To intcols
            If dict.Exists(col) Then
                Debug.Print "Table 1, Row " & rw & ", column " & col; " data is " & arr(lngCounter)
            End If
            lngCounter = lngCounter + 1
        Next
        lngCounter = lngCounter + 1
    Next
End Sub

Open in new window


Chris
0
 

Author Comment

by:JohnRobinAllen
ID: 39726871
That is perfect. I will test it in about three hours after I finish some other work, and then I think my problem will be solved. I'll confirm that soon.
          --j.r.
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:JohnRobinAllen
ID: 39727768
I admire the elegance of the solution that Chris proposed. It does the trick and is very simple. What I like best in all in his code is the line
   
     arr = Split(Replace(ActiveDocument.Tables(1).Range.Text, Chr(7), ""), Chr(13))


which instantly changes the entire table into an array without requiring one to chop the ends off of each cell.
      What I am less happy with is that when all one wants are data from certain columns, the code increments lngCounter as many times as there are cells in the table.
      I would suggest using a routine to set lngCounter to point to just those cells that have information needed. The code below is a slight variant of Chris's code. I added an integer variable, i, to count the column numbers stored in arrCols. (That lets us skip using “dict” and “itm.”)

Sub TestTable3()
Dim arr As Variant
Dim intcols As Integer
Dim lngRows As Long
Dim lngCounter As Long
Dim arrCols() As Variant
Dim i As Integer
Dim Col As Integer
Dim rw As Integer
      arrCols = Array(2, 5)
      lngRows = ActiveDocument.Tables(1).Rows.Count
      intcols = ActiveDocument.Tables(1).Columns.Count
      arr = Split(Replace(ActiveDocument.Tables(1).Range.Text, Chr(7), ""), Chr(13))
     
      For i = 0 To UBound(arrCols)
            '     arrCols is a zero-based array, but the columns listed in it count the first
            '     column as "1." Our calculation of lngCounter start from the previous
            '     column, so we set Col = current column minus 1.
            Col = arrCols(i) - 1
            For rw = 1 To lngRows
                  lngCounter = (rw - 1) * (intcols + 1) + (Col + 1) - 1
                  Debug.Print "Tbl 1, Row " & rw & ", column " & Col + 1 & " data is " & arr(lngCounter)
            Next rw
      Next i
End Sub

      My only problem now is that I need the data in a row / column order rather than the above column / row order. I’ll write that variant and post it in a subsequent message, but for now I am assigning full credit (400 points) to Chris plus an extra fifty points each to Graham and Alex for their contributions to the problem.

     Thanks for the help.
0
 

Author Closing Comment

by:JohnRobinAllen
ID: 39727778
My comments are in a different message. I will be posting another variant solution shortly. Many thanks to the generous help from all three magi who responded to my question.
    j.r.a. in Priddis, Alberta, Canada
0
 
LVL 59

Expert Comment

by:Chris Bottomley
ID: 39727826
It is perfectly viable to step through the array looking for specific columns doing away with the counter, the method chosen was to to be more flexible overall ... if you need help with removal of the counter just ask.

Chris
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 39728269
JRA,

It is always gratifying when the questioner, such as yourself,  takes a considered view and judgement of the suggestions offered. This adds considerable value to the question for later viewers. Thanks
0
 

Author Comment

by:JohnRobinAllen
ID: 39729816
In comment 39727768 above I posted some code that will get data from a Word table, restricting the data to certain columns. The data retrieved are in a column / row order.

I promised to post code that would do the same thing but in a row / column order. The following code goes through the rows of a table to get, for each row, the contents of the specified columns.
     
Sub TestTable3b()
'     Given a Word document table with at least five columns,  the following code
'     accesses information in the table on a row by row basis, but the data is limited
'     to columns indicated by arrCols (generated in the code below). The following is based
'     on code originally written by Chris Bottomley.
Dim arr As Variant
Dim intcols As Integer
Dim lngRows As Long
Dim lngCounter As Long
Dim arrCols() As Variant
Dim i As Integer
Dim Col As Integer
Dim rw As Integer
      arrCols = Array(2, 5) '    Specify here which columns to access
      lngRows = ActiveDocument.Tables(1).Rows.Count
      intcols = ActiveDocument.Tables(1).Columns.Count
      arr = Split(Replace(ActiveDocument.Tables(1).Range.Text, Chr(7), ""), Chr(13))
     
      For rw = 1 To lngRows
            For i = 0 To UBound(arrCols)
                  Col = arrCols(i)
                  lngCounter = (rw - 1) * (intcols + 1) + (Col) - 1
                  Debug.Print "Tbl 1, Row " & rw & ", column " & Col & " data is " & arr(lngCounter)
            Next i
      Next rw
End Sub
0
 
LVL 59

Expert Comment

by:Chris Bottomley
ID: 39730168
Much as I was thinking and avoided since getting the index for the columns was so 'fiddly'.  I must admit I was thinking more along the lines of: ((rw - 1) * intcols) + Col - 1 ... so its just as well I didn't post it up!

Thinking about it though I wasn't allowing for the end of row marker which is where you get the number of columns + 1.  Ho hum like I said - glad I didn't post it up ... OOPS!

Chris
0
 

Author Comment

by:JohnRobinAllen
ID: 39730407
A further comment:
     When I first posted the problem, it was taking my program a bit more than ten minutes to store the essential data from the table of 805 lines as a document variable. Using Chris's code as modified above, it is now taking five SECONDS!
     One other factor that probably helps speed up the time is that the document variable I was saving became too long to save. Graham Skan pointed out in a response to another query that the max length a document variable can be is 65,280 characters, but what I was trying to save was 113,490 characters long. Accordingly, now each time the variable I'm creating with Chris's code gets longer than 65,000 characters, I dump it to disk and start making a new variable. That must free up some memory and make the program run faster.
     Thanks again for all your help, both Graham and Chris. I hope others can use your discoveries. I certainly have and will do so in the future.

     J.R.A. in Priddis, Alberta, Canada
0
 
LVL 59

Expert Comment

by:Chris Bottomley
ID: 39731149
Like Graham said, your reflection and comments are most interesting and relevant for future readers of the thread making the slightly esoteric code meaningful in a real world context so it is more likely useful to other users because of it.

Always a pleasure to help you,
Chris
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Preface: When I started this series, I used the term CommandBars because that is the Office Object class that it discusses. Unfortunately, when Microsoft introduced Office 2007, they replaced the standard Commandbar menus with "The Ribbon" and rem…
Technology opened people to different means of presenting information, but PowerPoint remains to be above competition. Know why PPT still works today.
The viewer will learn how to simulate a series of coin tosses with the rand() function and learn how to make these “tosses” depend on a predetermined probability. Flipping Coins in Excel: Enter =RAND() into cell A2: Recalculate the random variable…
In a previous video Micro Tutorial here at Experts Exchange (http://www.experts-exchange.com/videos/1358/How-to-get-a-free-trial-of-Office-365-with-the-Office-2016-desktop-applications.html), I explained how to get a free, one-month trial of Office …

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now