Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Parse Huge Word Document by evaulating eash word

Posted on 2004-10-27
11
Medium Priority
?
220 Views
Last Modified: 2010-05-02
Ok, I have most of this figured out, but ran into a huge performance problem.  I am parsing a Word Document by looking at each word and sorting it depending on the font color.  It works GREAT until I hit around word 5000 then it really starts to slow down.  Here is an idea of how I am coding it.

Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim strFileName As String
Dim txtText As String
Dim lngColor As Long
Dim lngWord as long
strFileName = "D:test.doc"
Set appWord = New Word.Application
Set wrdDoc = appWord.Documents.Open(strFileName)

lngWord = 1

Do While lngWord < wrdDoc.Words.Count

    txtText = wrdDoc.Words(lngWord).Text
    lngColor = wrdDoc.Words(lngWord).Font.Color
   
    Select Case lngColor
          Case black
                     'do some code
     end select

     lngWord = lngWord +1
loop


Like I said, my code is working fine.  It slows down terribly when I get above 5000 words and the document contains over 28000, so I really need to figure something else out.  I am guessing that Word doesn't keep track of the last word I read, so it continuously starts from the very begining of the file to get to the next word specified.

Hope this makes sence and someone has a suggestion.

Thanks,
0
Comment
Question by:APlusComp247
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 2
11 Comments
 
LVL 76

Accepted Solution

by:
GrahamSkan earned 2000 total points
ID: 12430667
It's much faster to user For Each on the Words collection

Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim strFileName As String
Dim txtText As String
Dim lngColor As Long
Dim lngWord As Long
Dim rngWord As Range
strFileName = "D:test.doc"
Set appWord = New Word.Application
Set wrdDoc = appWord.Documents.Open(strFileName)

lngWord = 1

For Each rngWord In wrdDoc.Words

    txtText = rngWord.Text
    lngColor = rngWord.Font.Color
   
    Select Case lngColor
          Case black
                     'do some code
     End Select

     lngWord = lngWord + 1
Loop
0
 
LVL 16

Expert Comment

by:jimbobmcgee
ID: 12431950
As GrahamSkan says, it is better to use For Each/Next...

One thing I have noticed, however, is that you have not defined the variable 'black'.  Thankfully, the number for black text is 0, so it is still working but you may want to amend this if you intend to use other colors.

You may also want to check if the text colour is set to Automatic...

J.
0
 

Author Comment

by:APlusComp247
ID: 12432998
This looks like it can help, but I have a few questions.  

1.  How do I know when I get to the end of the document?  I would like to put some kind of progress bar to show how far along I am.
2.  I will need to exit the for each/next loop at some point and start it up again.  Does the rngWord know where it left off?

FYI  jimbobmcqee, thanks for pointing out the "black variable" not being defined.  That isn't actually part of my code.  I just used it for convience.  I actually have about 5 different numbers that represent the actuall colors.  Once again thanks for being observant.  :)
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 76

Expert Comment

by:GrahamSkan
ID: 12433190
1. You are at the end when it stops, of course! Actually, you can still use the Words.count to define the target and keep track of the progress with the counter - lngWord, as you have called it. I left it in the example.
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 12433326
2. No. But you could save the index and create a new range based on it. You would obviously need to create ain inner and an outer loop.

Dim rngWords as Range
For Each rngWord In wrdDoc.Words

    txtText = rngWord.Text
    lngColor = rngWord.Font.Color
   
    Select Case lngColor
          Case black
                     exit for
     End Select

     lngWord = lngWord + 1
Loop
....

set rngWords = wrdDoc.Words(lngWord)    'last word processed
rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
rng.End = wrdDoc.End                               'set end of range to end of doc

For Each rngWord In rngWords

    txtText = rngWord.Text
 


0
 

Author Comment

by:APlusComp247
ID: 12440210
The initial code for the for each/next works great!!  That is the speed I was looking for.  Got a problem with creating the new range to start the loop again.  This is the code I have

Set rngWords = wrdDoc.Words(lngWord)
rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
rngWords.End = wrdDoc.Range.End                               'set end of range to end of doc

I had to modify the last line because "rng" was not a defined varialbe and I assumed you meant "rngWords" and "wrdDoc" does not have a "End" property.  The problem is that it still runs from the very first word, so I don't make any prosess after exiting the loop and try to enter another one.

Thanks in advance for all the help
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 12442307
Yes, you were right about my typo.

For your loop, I don't know exactly what your code is, but it could be that the Range object is going out of scope between calls.
In fact I now think that nested loops are inappropriate. I think you would need to separate the initiation from the parsing (single loop) procedure, which is provided with a variable starting point.


0
 

Author Comment

by:APlusComp247
ID: 12444337
I probably didn't properly explain my question.  Let me try again.  I have the following variables declared at the top of the form so they don't lose scope until the form is closed.

Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim lngWord, lngWordTotal As Long

I have a command button that I declare and use the range in and then enter the loop.

Dim rngWords As Range

Set rngWords = wrdDoc.Words(lngWord)
rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
rngWords.End = wrdDoc.Range.End                               'set end of range to end of doc

For Each rngWords In wrdDoc.Words

    txtText = rngWords.Text
    lngColor = rngWords.Font.Color
   
    Select Case lngColor
        Case 8388736
               ' do code.....
     end select
Next


I want to set the range each time the command button is pressed, so it will continue through the document.

Thanks
0
 
LVL 16

Expert Comment

by:jimbobmcgee
ID: 12445584
Maintain a counter to store the current position:

      lngWord = 0
      For Each rngWords in wrdDoc.Words
            lngWord = lngWord + 1
            ...

At any point that you want to exit the loop, with Exit For, store the index of the current word in the Tag property of the button:

      myForm.myCommandButton.Tag = lngWord

Now enclose the existing inner contents of the For Each/Loop block inside an if statement:

      If lngWord < myForm.myCommandButton.Tag Then
         txtText = rngWords.Text
         ...

This will loop through all words, only performing code once the new word counter exceeds the old one...

HTH

J.
0
 

Author Comment

by:APlusComp247
ID: 12449241
Yes, I have the lngWord incrementing as I go through it, but I can't seem get set the range to start at the last word (lngWord) once I click back on the command button.
0
 

Author Comment

by:APlusComp247
ID: 12457351
Well, ends up I didn't need to exit my loop.  It would have been helpful while testing, but no big deal.

Thanks for all the help.  It works great!!!
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction While answering a recent question (http://www.experts-exchange.com/Q_27402310.html) in the VB classic zone, I wrote some VB code in the (Office) VBA environment, rather than fire up my older PC.  I didn't post completely correct code o…
You can of course define an array to hold data that is of a particular type like an array of Strings to hold customer names or an array of Doubles to hold customer sales, but what do you do if you want to coordinate that data? This article describes…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…
Suggested Courses

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question