Parse Huge Word Document by evaulating eash word

Ok, I have most of this figured out, but ran into a huge performance problem.  I am parsing a Word Document by looking at each word and sorting it depending on the font color.  It works GREAT until I hit around word 5000 then it really starts to slow down.  Here is an idea of how I am coding it.

Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim strFileName As String
Dim txtText As String
Dim lngColor As Long
Dim lngWord as long
strFileName = "D:test.doc"
Set appWord = New Word.Application
Set wrdDoc = appWord.Documents.Open(strFileName)

lngWord = 1

Do While lngWord < wrdDoc.Words.Count

    txtText = wrdDoc.Words(lngWord).Text
    lngColor = wrdDoc.Words(lngWord).Font.Color
   
    Select Case lngColor
          Case black
                     'do some code
     end select

     lngWord = lngWord +1
loop


Like I said, my code is working fine.  It slows down terribly when I get above 5000 words and the document contains over 28000, so I really need to figure something else out.  I am guessing that Word doesn't keep track of the last word I read, so it continuously starts from the very begining of the file to get to the next word specified.

Hope this makes sence and someone has a suggestion.

Thanks,
APlusComp247Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

GrahamSkanRetiredCommented:
It's much faster to user For Each on the Words collection

Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim strFileName As String
Dim txtText As String
Dim lngColor As Long
Dim lngWord As Long
Dim rngWord As Range
strFileName = "D:test.doc"
Set appWord = New Word.Application
Set wrdDoc = appWord.Documents.Open(strFileName)

lngWord = 1

For Each rngWord In wrdDoc.Words

    txtText = rngWord.Text
    lngColor = rngWord.Font.Color
   
    Select Case lngColor
          Case black
                     'do some code
     End Select

     lngWord = lngWord + 1
Loop
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
jimbobmcgeeCommented:
As GrahamSkan says, it is better to use For Each/Next...

One thing I have noticed, however, is that you have not defined the variable 'black'.  Thankfully, the number for black text is 0, so it is still working but you may want to amend this if you intend to use other colors.

You may also want to check if the text colour is set to Automatic...

J.
0
APlusComp247Author Commented:
This looks like it can help, but I have a few questions.  

1.  How do I know when I get to the end of the document?  I would like to put some kind of progress bar to show how far along I am.
2.  I will need to exit the for each/next loop at some point and start it up again.  Does the rngWord know where it left off?

FYI  jimbobmcqee, thanks for pointing out the "black variable" not being defined.  That isn't actually part of my code.  I just used it for convience.  I actually have about 5 different numbers that represent the actuall colors.  Once again thanks for being observant.  :)
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

GrahamSkanRetiredCommented:
1. You are at the end when it stops, of course! Actually, you can still use the Words.count to define the target and keep track of the progress with the counter - lngWord, as you have called it. I left it in the example.
0
GrahamSkanRetiredCommented:
2. No. But you could save the index and create a new range based on it. You would obviously need to create ain inner and an outer loop.

Dim rngWords as Range
For Each rngWord In wrdDoc.Words

    txtText = rngWord.Text
    lngColor = rngWord.Font.Color
   
    Select Case lngColor
          Case black
                     exit for
     End Select

     lngWord = lngWord + 1
Loop
....

set rngWords = wrdDoc.Words(lngWord)    'last word processed
rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
rng.End = wrdDoc.End                               'set end of range to end of doc

For Each rngWord In rngWords

    txtText = rngWord.Text
 


0
APlusComp247Author Commented:
The initial code for the for each/next works great!!  That is the speed I was looking for.  Got a problem with creating the new range to start the loop again.  This is the code I have

Set rngWords = wrdDoc.Words(lngWord)
rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
rngWords.End = wrdDoc.Range.End                               'set end of range to end of doc

I had to modify the last line because "rng" was not a defined varialbe and I assumed you meant "rngWords" and "wrdDoc" does not have a "End" property.  The problem is that it still runs from the very first word, so I don't make any prosess after exiting the loop and try to enter another one.

Thanks in advance for all the help
0
GrahamSkanRetiredCommented:
Yes, you were right about my typo.

For your loop, I don't know exactly what your code is, but it could be that the Range object is going out of scope between calls.
In fact I now think that nested loops are inappropriate. I think you would need to separate the initiation from the parsing (single loop) procedure, which is provided with a variable starting point.


0
APlusComp247Author Commented:
I probably didn't properly explain my question.  Let me try again.  I have the following variables declared at the top of the form so they don't lose scope until the form is closed.

Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim lngWord, lngWordTotal As Long

I have a command button that I declare and use the range in and then enter the loop.

Dim rngWords As Range

Set rngWords = wrdDoc.Words(lngWord)
rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
rngWords.End = wrdDoc.Range.End                               'set end of range to end of doc

For Each rngWords In wrdDoc.Words

    txtText = rngWords.Text
    lngColor = rngWords.Font.Color
   
    Select Case lngColor
        Case 8388736
               ' do code.....
     end select
Next


I want to set the range each time the command button is pressed, so it will continue through the document.

Thanks
0
jimbobmcgeeCommented:
Maintain a counter to store the current position:

      lngWord = 0
      For Each rngWords in wrdDoc.Words
            lngWord = lngWord + 1
            ...

At any point that you want to exit the loop, with Exit For, store the index of the current word in the Tag property of the button:

      myForm.myCommandButton.Tag = lngWord

Now enclose the existing inner contents of the For Each/Loop block inside an if statement:

      If lngWord < myForm.myCommandButton.Tag Then
         txtText = rngWords.Text
         ...

This will loop through all words, only performing code once the new word counter exceeds the old one...

HTH

J.
0
APlusComp247Author Commented:
Yes, I have the lngWord incrementing as I go through it, but I can't seem get set the range to start at the last word (lngWord) once I click back on the command button.
0
APlusComp247Author Commented:
Well, ends up I didn't need to exit my loop.  It would have been helpful while testing, but no big deal.

Thanks for all the help.  It works great!!!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Visual Basic Classic

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.