Solved

Parse Huge Word Document by evaulating eash word

Posted on 2004-10-27
219 Views
Last Modified: 2010-05-02
Ok, I have most of this figured out, but ran into a huge performance problem.  I am parsing a Word Document by looking at each word and sorting it depending on the font color.  It works GREAT until I hit around word 5000 then it really starts to slow down.  Here is an idea of how I am coding it.

Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim strFileName As String
Dim txtText As String
Dim lngColor As Long
Dim lngWord as long
strFileName = "D:test.doc"
Set appWord = New Word.Application
Set wrdDoc = appWord.Documents.Open(strFileName)

lngWord = 1

Do While lngWord < wrdDoc.Words.Count

    txtText = wrdDoc.Words(lngWord).Text
    lngColor = wrdDoc.Words(lngWord).Font.Color
   
    Select Case lngColor
          Case black
                     'do some code
     end select

     lngWord = lngWord +1
loop


Like I said, my code is working fine.  It slows down terribly when I get above 5000 words and the document contains over 28000, so I really need to figure something else out.  I am guessing that Word doesn't keep track of the last word I read, so it continuously starts from the very begining of the file to get to the next word specified.

Hope this makes sence and someone has a suggestion.

Thanks,
0
Question by:APlusComp247
    11 Comments
     
    LVL 76

    Accepted Solution

    by:
    It's much faster to user For Each on the Words collection

    Dim appWord As Word.Application
    Dim wrdDoc As Word.Document
    Dim strFileName As String
    Dim txtText As String
    Dim lngColor As Long
    Dim lngWord As Long
    Dim rngWord As Range
    strFileName = "D:test.doc"
    Set appWord = New Word.Application
    Set wrdDoc = appWord.Documents.Open(strFileName)

    lngWord = 1

    For Each rngWord In wrdDoc.Words

        txtText = rngWord.Text
        lngColor = rngWord.Font.Color
       
        Select Case lngColor
              Case black
                         'do some code
         End Select

         lngWord = lngWord + 1
    Loop
    0
     
    LVL 16

    Expert Comment

    by:jimbobmcgee
    As GrahamSkan says, it is better to use For Each/Next...

    One thing I have noticed, however, is that you have not defined the variable 'black'.  Thankfully, the number for black text is 0, so it is still working but you may want to amend this if you intend to use other colors.

    You may also want to check if the text colour is set to Automatic...

    J.
    0
     

    Author Comment

    by:APlusComp247
    This looks like it can help, but I have a few questions.  

    1.  How do I know when I get to the end of the document?  I would like to put some kind of progress bar to show how far along I am.
    2.  I will need to exit the for each/next loop at some point and start it up again.  Does the rngWord know where it left off?

    FYI  jimbobmcqee, thanks for pointing out the "black variable" not being defined.  That isn't actually part of my code.  I just used it for convience.  I actually have about 5 different numbers that represent the actuall colors.  Once again thanks for being observant.  :)
    0
     
    LVL 76

    Expert Comment

    by:GrahamSkan
    1. You are at the end when it stops, of course! Actually, you can still use the Words.count to define the target and keep track of the progress with the counter - lngWord, as you have called it. I left it in the example.
    0
     
    LVL 76

    Expert Comment

    by:GrahamSkan
    2. No. But you could save the index and create a new range based on it. You would obviously need to create ain inner and an outer loop.

    Dim rngWords as Range
    For Each rngWord In wrdDoc.Words

        txtText = rngWord.Text
        lngColor = rngWord.Font.Color
       
        Select Case lngColor
              Case black
                         exit for
         End Select

         lngWord = lngWord + 1
    Loop
    ....

    set rngWords = wrdDoc.Words(lngWord)    'last word processed
    rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
    rng.End = wrdDoc.End                               'set end of range to end of doc

    For Each rngWord In rngWords

        txtText = rngWord.Text
     


    0
     

    Author Comment

    by:APlusComp247
    The initial code for the for each/next works great!!  That is the speed I was looking for.  Got a problem with creating the new range to start the loop again.  This is the code I have

    Set rngWords = wrdDoc.Words(lngWord)
    rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
    rngWords.End = wrdDoc.Range.End                               'set end of range to end of doc

    I had to modify the last line because "rng" was not a defined varialbe and I assumed you meant "rngWords" and "wrdDoc" does not have a "End" property.  The problem is that it still runs from the very first word, so I don't make any prosess after exiting the loop and try to enter another one.

    Thanks in advance for all the help
    0
     
    LVL 76

    Expert Comment

    by:GrahamSkan
    Yes, you were right about my typo.

    For your loop, I don't know exactly what your code is, but it could be that the Range object is going out of scope between calls.
    In fact I now think that nested loops are inappropriate. I think you would need to separate the initiation from the parsing (single loop) procedure, which is provided with a variable starting point.


    0
     

    Author Comment

    by:APlusComp247
    I probably didn't properly explain my question.  Let me try again.  I have the following variables declared at the top of the form so they don't lose scope until the form is closed.

    Dim appWord As Word.Application
    Dim wrdDoc As Word.Document
    Dim lngWord, lngWordTotal As Long

    I have a command button that I declare and use the range in and then enter the loop.

    Dim rngWords As Range

    Set rngWords = wrdDoc.Words(lngWord)
    rngWords.Collapse wdCollapseEnd              'set start of range to end of last word processed
    rngWords.End = wrdDoc.Range.End                               'set end of range to end of doc

    For Each rngWords In wrdDoc.Words

        txtText = rngWords.Text
        lngColor = rngWords.Font.Color
       
        Select Case lngColor
            Case 8388736
                   ' do code.....
         end select
    Next


    I want to set the range each time the command button is pressed, so it will continue through the document.

    Thanks
    0
     
    LVL 16

    Expert Comment

    by:jimbobmcgee
    Maintain a counter to store the current position:

          lngWord = 0
          For Each rngWords in wrdDoc.Words
                lngWord = lngWord + 1
                ...

    At any point that you want to exit the loop, with Exit For, store the index of the current word in the Tag property of the button:

          myForm.myCommandButton.Tag = lngWord

    Now enclose the existing inner contents of the For Each/Loop block inside an if statement:

          If lngWord < myForm.myCommandButton.Tag Then
             txtText = rngWords.Text
             ...

    This will loop through all words, only performing code once the new word counter exceeds the old one...

    HTH

    J.
    0
     

    Author Comment

    by:APlusComp247
    Yes, I have the lngWord incrementing as I go through it, but I can't seem get set the range to start at the last word (lngWord) once I click back on the command button.
    0
     

    Author Comment

    by:APlusComp247
    Well, ends up I didn't need to exit my loop.  It would have been helpful while testing, but no big deal.

    Thanks for all the help.  It works great!!!
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    How to run any project with ease

    Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
    - Combine task lists, docs, spreadsheets, and chat in one
    - View and edit from mobile/offline
    - Cut down on emails

    Enums (shorthand for ‘enumerations’) are not often used by programmers but they can be quite valuable when they are.  What are they? An Enum is just a type of variable like a string or an Integer, but in this case one that you create that contains…
    This article describes some techniques which will make your VBA or Visual Basic Classic code easier to understand and maintain, whether by you, your replacement, or another Experts-Exchange expert.
    Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
    Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…

    846 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    7 Experts available now in Live!

    Get 1:1 Help Now