APlusComp247
asked on
Parse Huge Word Document by evaulating eash word
Ok, I have most of this figured out, but ran into a huge performance problem. I am parsing a Word Document by looking at each word and sorting it depending on the font color. It works GREAT until I hit around word 5000 then it really starts to slow down. Here is an idea of how I am coding it.
Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim strFileName As String
Dim txtText As String
Dim lngColor As Long
Dim lngWord as long
strFileName = "D:test.doc"
Set appWord = New Word.Application
Set wrdDoc = appWord.Documents.Open(str FileName)
lngWord = 1
Do While lngWord < wrdDoc.Words.Count
txtText = wrdDoc.Words(lngWord).Text
lngColor = wrdDoc.Words(lngWord).Font .Color
Select Case lngColor
Case black
'do some code
end select
lngWord = lngWord +1
loop
Like I said, my code is working fine. It slows down terribly when I get above 5000 words and the document contains over 28000, so I really need to figure something else out. I am guessing that Word doesn't keep track of the last word I read, so it continuously starts from the very begining of the file to get to the next word specified.
Hope this makes sence and someone has a suggestion.
Thanks,
Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim strFileName As String
Dim txtText As String
Dim lngColor As Long
Dim lngWord as long
strFileName = "D:test.doc"
Set appWord = New Word.Application
Set wrdDoc = appWord.Documents.Open(str
lngWord = 1
Do While lngWord < wrdDoc.Words.Count
txtText = wrdDoc.Words(lngWord).Text
lngColor = wrdDoc.Words(lngWord).Font
Select Case lngColor
Case black
'do some code
end select
lngWord = lngWord +1
loop
Like I said, my code is working fine. It slows down terribly when I get above 5000 words and the document contains over 28000, so I really need to figure something else out. I am guessing that Word doesn't keep track of the last word I read, so it continuously starts from the very begining of the file to get to the next word specified.
Hope this makes sence and someone has a suggestion.
Thanks,
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
This looks like it can help, but I have a few questions.
1. How do I know when I get to the end of the document? I would like to put some kind of progress bar to show how far along I am.
2. I will need to exit the for each/next loop at some point and start it up again. Does the rngWord know where it left off?
FYI jimbobmcqee, thanks for pointing out the "black variable" not being defined. That isn't actually part of my code. I just used it for convience. I actually have about 5 different numbers that represent the actuall colors. Once again thanks for being observant. :)
1. How do I know when I get to the end of the document? I would like to put some kind of progress bar to show how far along I am.
2. I will need to exit the for each/next loop at some point and start it up again. Does the rngWord know where it left off?
FYI jimbobmcqee, thanks for pointing out the "black variable" not being defined. That isn't actually part of my code. I just used it for convience. I actually have about 5 different numbers that represent the actuall colors. Once again thanks for being observant. :)
1. You are at the end when it stops, of course! Actually, you can still use the Words.count to define the target and keep track of the progress with the counter - lngWord, as you have called it. I left it in the example.
2. No. But you could save the index and create a new range based on it. You would obviously need to create ain inner and an outer loop.
Dim rngWords as Range
For Each rngWord In wrdDoc.Words
txtText = rngWord.Text
lngColor = rngWord.Font.Color
Select Case lngColor
Case black
exit for
End Select
lngWord = lngWord + 1
Loop
....
set rngWords = wrdDoc.Words(lngWord) 'last word processed
rngWords.Collapse wdCollapseEnd 'set start of range to end of last word processed
rng.End = wrdDoc.End 'set end of range to end of doc
For Each rngWord In rngWords
txtText = rngWord.Text
Dim rngWords as Range
For Each rngWord In wrdDoc.Words
txtText = rngWord.Text
lngColor = rngWord.Font.Color
Select Case lngColor
Case black
exit for
End Select
lngWord = lngWord + 1
Loop
....
set rngWords = wrdDoc.Words(lngWord) 'last word processed
rngWords.Collapse wdCollapseEnd 'set start of range to end of last word processed
rng.End = wrdDoc.End 'set end of range to end of doc
For Each rngWord In rngWords
txtText = rngWord.Text
ASKER
The initial code for the for each/next works great!! That is the speed I was looking for. Got a problem with creating the new range to start the loop again. This is the code I have
Set rngWords = wrdDoc.Words(lngWord)
rngWords.Collapse wdCollapseEnd 'set start of range to end of last word processed
rngWords.End = wrdDoc.Range.End 'set end of range to end of doc
I had to modify the last line because "rng" was not a defined varialbe and I assumed you meant "rngWords" and "wrdDoc" does not have a "End" property. The problem is that it still runs from the very first word, so I don't make any prosess after exiting the loop and try to enter another one.
Thanks in advance for all the help
Set rngWords = wrdDoc.Words(lngWord)
rngWords.Collapse wdCollapseEnd 'set start of range to end of last word processed
rngWords.End = wrdDoc.Range.End 'set end of range to end of doc
I had to modify the last line because "rng" was not a defined varialbe and I assumed you meant "rngWords" and "wrdDoc" does not have a "End" property. The problem is that it still runs from the very first word, so I don't make any prosess after exiting the loop and try to enter another one.
Thanks in advance for all the help
Yes, you were right about my typo.
For your loop, I don't know exactly what your code is, but it could be that the Range object is going out of scope between calls.
In fact I now think that nested loops are inappropriate. I think you would need to separate the initiation from the parsing (single loop) procedure, which is provided with a variable starting point.
For your loop, I don't know exactly what your code is, but it could be that the Range object is going out of scope between calls.
In fact I now think that nested loops are inappropriate. I think you would need to separate the initiation from the parsing (single loop) procedure, which is provided with a variable starting point.
ASKER
I probably didn't properly explain my question. Let me try again. I have the following variables declared at the top of the form so they don't lose scope until the form is closed.
Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim lngWord, lngWordTotal As Long
I have a command button that I declare and use the range in and then enter the loop.
Dim rngWords As Range
Set rngWords = wrdDoc.Words(lngWord)
rngWords.Collapse wdCollapseEnd 'set start of range to end of last word processed
rngWords.End = wrdDoc.Range.End 'set end of range to end of doc
For Each rngWords In wrdDoc.Words
txtText = rngWords.Text
lngColor = rngWords.Font.Color
Select Case lngColor
Case 8388736
' do code.....
end select
Next
I want to set the range each time the command button is pressed, so it will continue through the document.
Thanks
Dim appWord As Word.Application
Dim wrdDoc As Word.Document
Dim lngWord, lngWordTotal As Long
I have a command button that I declare and use the range in and then enter the loop.
Dim rngWords As Range
Set rngWords = wrdDoc.Words(lngWord)
rngWords.Collapse wdCollapseEnd 'set start of range to end of last word processed
rngWords.End = wrdDoc.Range.End 'set end of range to end of doc
For Each rngWords In wrdDoc.Words
txtText = rngWords.Text
lngColor = rngWords.Font.Color
Select Case lngColor
Case 8388736
' do code.....
end select
Next
I want to set the range each time the command button is pressed, so it will continue through the document.
Thanks
Maintain a counter to store the current position:
lngWord = 0
For Each rngWords in wrdDoc.Words
lngWord = lngWord + 1
...
At any point that you want to exit the loop, with Exit For, store the index of the current word in the Tag property of the button:
myForm.myCommandButton.Tag = lngWord
Now enclose the existing inner contents of the For Each/Loop block inside an if statement:
If lngWord < myForm.myCommandButton.Tag Then
txtText = rngWords.Text
...
This will loop through all words, only performing code once the new word counter exceeds the old one...
HTH
J.
lngWord = 0
For Each rngWords in wrdDoc.Words
lngWord = lngWord + 1
...
At any point that you want to exit the loop, with Exit For, store the index of the current word in the Tag property of the button:
myForm.myCommandButton.Tag
Now enclose the existing inner contents of the For Each/Loop block inside an if statement:
If lngWord < myForm.myCommandButton.Tag
txtText = rngWords.Text
...
This will loop through all words, only performing code once the new word counter exceeds the old one...
HTH
J.
ASKER
Yes, I have the lngWord incrementing as I go through it, but I can't seem get set the range to start at the last word (lngWord) once I click back on the command button.
ASKER
Well, ends up I didn't need to exit my loop. It would have been helpful while testing, but no big deal.
Thanks for all the help. It works great!!!
Thanks for all the help. It works great!!!
One thing I have noticed, however, is that you have not defined the variable 'black'. Thankfully, the number for black text is 0, so it is still working but you may want to amend this if you intend to use other colors.
You may also want to check if the text colour is set to Automatic...
J.