Parse Huge Word Document by evaulating eash word

Ok, I have most of this figured out, but ran into a huge performance problem.  I am parsing a Word Document by looking at each word and sorting it depending on the font color.  It works GREAT until I hit around word 5000 then it really starts to slow down.  Here is an idea of how I am coding it.

Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim strFileName As String
Dim txtText As String
Dim lngColor As Long
Dim lngWord as long
strFileName = "D:test.doc"
Set appWord = New Word.Application
Set wrdDoc = appWord.Documents.Open(strFileName)

lngWord = 1

Do While lngWord < wrdDoc.Words.Count

    txtText = wrdDoc.Words(lngWord).Text
    lngColor = wrdDoc.Words(lngWord).Font.Color
   
    Select Case lngColor
          Case black
                     'do some code
     end select

     lngWord = lngWord +1
loop


Like I said, my code is working fine.  It slows down terribly when I get above 5000 words and the document contains over 28000, so I really need to figure something else out.  I am guessing that Word doesn't keep track of the last word I read, so it continuously starts from the very begining of the file to get to the next word specified.

Hope this makes sence and someone has a suggestion.

Thanks,
APlusComp247Asked:
Who is Participating?
 
GrahamSkanConnect With a Mentor RetiredCommented:
It's much faster to user For Each on the Words collection

Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim strFileName As String
Dim txtText As String
Dim lngColor As Long
Dim lngWord As Long
Dim rngWord As Range
strFileName = "D:test.doc"
Set appWord = New Word.Application
Set wrdDoc = appWord.Documents.Open(strFileName)

lngWord = 1

For Each rngWord In wrdDoc.Words

    txtText = rngWord.Text
    lngColor = rngWord.Font.Color
   
    Select Case lngColor
          Case black
                     'do some code
     End Select

     lngWord = lngWord + 1
Loop
0
 
jimbobmcgeeCommented:
As GrahamSkan says, it is better to use For Each/Next...

One thing I have noticed, however, is that you have not defined the variable 'black'.  Thankfully, the number for black text is 0, so it is still working but you may want to amend this if you intend to use other colors.

You may also want to check if the text colour is set to Automatic...

J.
0
 
APlusComp247Author Commented:
This looks like it can help, but I have a few questions.  

1.  How do I know when I get to the end of the document?  I would like to put some kind of progress bar to show how far along I am.
2.  I will need to exit the for each/next loop at some point and start it up again.  Does the rngWord know where it left off?

FYI  jimbobmcqee, thanks for pointing out the "black variable" not being defined.  That isn't actually part of my code.  I just used it for convience.  I actually have about 5 different numbers that represent the actuall colors.  Once again thanks for being observant.  :)
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
GrahamSkanRetiredCommented:
1. You are at the end when it stops, of course! Actually, you can still use the Words.count to define the target and keep track of the progress with the counter - lngWord, as you have called it. I left it in the example.
0
 
GrahamSkanRetiredCommented:
2. No. But you could save the index and create a new range based on it. You would obviously need to create ain inner and an outer loop.

Dim rngWords as Range
For Each rngWord In wrdDoc.Words

    txtText = rngWord.Text
    lngColor = rngWord.Font.Color
   
    Select Case lngColor
          Case black
                     exit for
     End Select

     lngWord = lngWord + 1
Loop
....

set rngWords = wrdDoc.Words(lngWord)    'last word processed
rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
rng.End = wrdDoc.End                               'set end of range to end of doc

For Each rngWord In rngWords

    txtText = rngWord.Text
 


0
 
APlusComp247Author Commented:
The initial code for the for each/next works great!!  That is the speed I was looking for.  Got a problem with creating the new range to start the loop again.  This is the code I have

Set rngWords = wrdDoc.Words(lngWord)
rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
rngWords.End = wrdDoc.Range.End                               'set end of range to end of doc

I had to modify the last line because "rng" was not a defined varialbe and I assumed you meant "rngWords" and "wrdDoc" does not have a "End" property.  The problem is that it still runs from the very first word, so I don't make any prosess after exiting the loop and try to enter another one.

Thanks in advance for all the help
0
 
GrahamSkanRetiredCommented:
Yes, you were right about my typo.

For your loop, I don't know exactly what your code is, but it could be that the Range object is going out of scope between calls.
In fact I now think that nested loops are inappropriate. I think you would need to separate the initiation from the parsing (single loop) procedure, which is provided with a variable starting point.


0
 
APlusComp247Author Commented:
I probably didn't properly explain my question.  Let me try again.  I have the following variables declared at the top of the form so they don't lose scope until the form is closed.

Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim lngWord, lngWordTotal As Long

I have a command button that I declare and use the range in and then enter the loop.

Dim rngWords As Range

Set rngWords = wrdDoc.Words(lngWord)
rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
rngWords.End = wrdDoc.Range.End                               'set end of range to end of doc

For Each rngWords In wrdDoc.Words

    txtText = rngWords.Text
    lngColor = rngWords.Font.Color
   
    Select Case lngColor
        Case 8388736
               ' do code.....
     end select
Next


I want to set the range each time the command button is pressed, so it will continue through the document.

Thanks
0
 
jimbobmcgeeCommented:
Maintain a counter to store the current position:

      lngWord = 0
      For Each rngWords in wrdDoc.Words
            lngWord = lngWord + 1
            ...

At any point that you want to exit the loop, with Exit For, store the index of the current word in the Tag property of the button:

      myForm.myCommandButton.Tag = lngWord

Now enclose the existing inner contents of the For Each/Loop block inside an if statement:

      If lngWord < myForm.myCommandButton.Tag Then
         txtText = rngWords.Text
         ...

This will loop through all words, only performing code once the new word counter exceeds the old one...

HTH

J.
0
 
APlusComp247Author Commented:
Yes, I have the lngWord incrementing as I go through it, but I can't seem get set the range to start at the last word (lngWord) once I click back on the command button.
0
 
APlusComp247Author Commented:
Well, ends up I didn't need to exit my loop.  It would have been helpful while testing, but no big deal.

Thanks for all the help.  It works great!!!
0
All Courses

From novice to tech pro — start learning today.