Link to home
Start Free TrialLog in
Avatar of APlusComp247
APlusComp247

asked on

Parse Huge Word Document by evaulating eash word

Ok, I have most of this figured out, but ran into a huge performance problem.  I am parsing a Word Document by looking at each word and sorting it depending on the font color.  It works GREAT until I hit around word 5000 then it really starts to slow down.  Here is an idea of how I am coding it.

Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim strFileName As String
Dim txtText As String
Dim lngColor As Long
Dim lngWord as long
strFileName = "D:test.doc"
Set appWord = New Word.Application
Set wrdDoc = appWord.Documents.Open(strFileName)

lngWord = 1

Do While lngWord < wrdDoc.Words.Count

    txtText = wrdDoc.Words(lngWord).Text
    lngColor = wrdDoc.Words(lngWord).Font.Color
   
    Select Case lngColor
          Case black
                     'do some code
     end select

     lngWord = lngWord +1
loop


Like I said, my code is working fine.  It slows down terribly when I get above 5000 words and the document contains over 28000, so I really need to figure something else out.  I am guessing that Word doesn't keep track of the last word I read, so it continuously starts from the very begining of the file to get to the next word specified.

Hope this makes sence and someone has a suggestion.

Thanks,
ASKER CERTIFIED SOLUTION
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
As GrahamSkan says, it is better to use For Each/Next...

One thing I have noticed, however, is that you have not defined the variable 'black'.  Thankfully, the number for black text is 0, so it is still working but you may want to amend this if you intend to use other colors.

You may also want to check if the text colour is set to Automatic...

J.
Avatar of APlusComp247
APlusComp247

ASKER

This looks like it can help, but I have a few questions.  

1.  How do I know when I get to the end of the document?  I would like to put some kind of progress bar to show how far along I am.
2.  I will need to exit the for each/next loop at some point and start it up again.  Does the rngWord know where it left off?

FYI  jimbobmcqee, thanks for pointing out the "black variable" not being defined.  That isn't actually part of my code.  I just used it for convience.  I actually have about 5 different numbers that represent the actuall colors.  Once again thanks for being observant.  :)
1. You are at the end when it stops, of course! Actually, you can still use the Words.count to define the target and keep track of the progress with the counter - lngWord, as you have called it. I left it in the example.
2. No. But you could save the index and create a new range based on it. You would obviously need to create ain inner and an outer loop.

Dim rngWords as Range
For Each rngWord In wrdDoc.Words

    txtText = rngWord.Text
    lngColor = rngWord.Font.Color
   
    Select Case lngColor
          Case black
                     exit for
     End Select

     lngWord = lngWord + 1
Loop
....

set rngWords = wrdDoc.Words(lngWord)    'last word processed
rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
rng.End = wrdDoc.End                               'set end of range to end of doc

For Each rngWord In rngWords

    txtText = rngWord.Text
 


The initial code for the for each/next works great!!  That is the speed I was looking for.  Got a problem with creating the new range to start the loop again.  This is the code I have

Set rngWords = wrdDoc.Words(lngWord)
rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
rngWords.End = wrdDoc.Range.End                               'set end of range to end of doc

I had to modify the last line because "rng" was not a defined varialbe and I assumed you meant "rngWords" and "wrdDoc" does not have a "End" property.  The problem is that it still runs from the very first word, so I don't make any prosess after exiting the loop and try to enter another one.

Thanks in advance for all the help
Yes, you were right about my typo.

For your loop, I don't know exactly what your code is, but it could be that the Range object is going out of scope between calls.
In fact I now think that nested loops are inappropriate. I think you would need to separate the initiation from the parsing (single loop) procedure, which is provided with a variable starting point.


I probably didn't properly explain my question.  Let me try again.  I have the following variables declared at the top of the form so they don't lose scope until the form is closed.

Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim lngWord, lngWordTotal As Long

I have a command button that I declare and use the range in and then enter the loop.

Dim rngWords As Range

Set rngWords = wrdDoc.Words(lngWord)
rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
rngWords.End = wrdDoc.Range.End                               'set end of range to end of doc

For Each rngWords In wrdDoc.Words

    txtText = rngWords.Text
    lngColor = rngWords.Font.Color
   
    Select Case lngColor
        Case 8388736
               ' do code.....
     end select
Next


I want to set the range each time the command button is pressed, so it will continue through the document.

Thanks
Maintain a counter to store the current position:

      lngWord = 0
      For Each rngWords in wrdDoc.Words
            lngWord = lngWord + 1
            ...

At any point that you want to exit the loop, with Exit For, store the index of the current word in the Tag property of the button:

      myForm.myCommandButton.Tag = lngWord

Now enclose the existing inner contents of the For Each/Loop block inside an if statement:

      If lngWord < myForm.myCommandButton.Tag Then
         txtText = rngWords.Text
         ...

This will loop through all words, only performing code once the new word counter exceeds the old one...

HTH

J.
Yes, I have the lngWord incrementing as I go through it, but I can't seem get set the range to start at the last word (lngWord) once I click back on the command button.
Well, ends up I didn't need to exit my loop.  It would have been helpful while testing, but no big deal.

Thanks for all the help.  It works great!!!