huntson
asked on
How can I optimize this VBA script?
Below is a VBA script I've been working on that I'd like to optimize. It runs against a word doc (I'll provide a small sample file) of captions. When running against 51 pages, it takes forever (just under 3 minutes). I'm not a programmer at all so in any answers you give, please be a bit patient and descriptive.
Sub RemoveParaOrLineBreaks()
Dim staticTC As String
staticTC = "--:--:--:--,--:--:--:--,10"
Dim static_next_line As String
static_next_line = "80 80 80"
Dim found, boolvar, nulls2 As Boolean
Dim tpar, parnum, slidenum As Variant
Dim slide_s1 As String
Dim slide_s2 As String
Dim parasel, new_replace As String
Dim vararray() As String
Dim newsel, whole_slide As Range
tpar = ActiveDocument.Paragraphs.count
boolvar = True
nulls2 = False
slide_s1 = ""
slide_s2 = ""
Set whole_doc = Selection.Range
whole_doc.WholeStory
Do Until boolvar = False
found = False
Set whole_slide = Selection.Range
whole_slide.WholeStory
'find the first pattern of text: 4 digits followed by a space followed by a colon followed by a line break
With whole_slide.Find
.ClearFormatting
.Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, _
Format:=False, Wrap:=wdFindStop
'if we find it, tell us what we found and set found to true
If .found = True Then
whole_slide.Find.Execute FindText:="^#^#^#^#"
slidenum = whole_slide
continue = True
found = True
firstsel = False
'MsgBox slidenum
End If
End With
'if found is still not true, find the second pattern: 4 digits, followed by a letter, followed by a colon followed by a line break
If found = False Then
With whole_slide.Find
.ClearFormatting
.Execute FindText:="^#^#^#^#^$" & ":" & Chr(13), Forward:=True, _
Format:=False, Wrap:=wdFindStop
'if we find it, tell us what we found and set found to true. Also, parse found data and set slidenum to actual slide num
If .found = True Then
whole_slide.Find.Execute FindText:="^#^#^#^#^$"
slidenum = whole_slide
continue = True
found = True
firstsel = False
'MsgBox slidenum
End If
End With
End If
If found = False Then
boolvar = False
Exit Sub
End If
If (continue = True And boolvar = True) Then
'Assign sentence after slide number to slide_s1 var
slide_s1 = whole_slide.Next(Unit:=wdSentence, count:=1)
'if there are elipses, it's a special case we need to account for
If slilde_s1 = " ..." Then
slide_s1 = slide_s1 & whole_slide.Next(Unit:=wdSentence, count:=2)
End If
'If there are two spaces, it's a special case we need to account for
If Right(slide_s1, 2) = Chr(32) & Chr(32) Then
slide_s1 = slide_s1 & whole_slide.Next(Unit:=wdSentence, count:=2)
nulls2 = True
End If
'MsgBox "|" + slide_s1 + "|"
'Assign second sentence to slide_s2 var
slide_s2 = whole_slide.Next(Unit:=wdSentence, count:=2)
'Check if slide_s2 winds up being part of the next slide or if nulls2 is valid. If so, set it to null.
If (slide_s2 Like "*####*" = True Or nulls2 = True) Then
slide_s2 = ""
End If
' 'MsgBox "|" + slide_s2 + "|"
'Decide if slide has one or two sentences
If Len(slide_s1) <> 0 And Len(slide_s2) < 1 Then
'replace text and move whole_slide selection to end of slide
whole_slide.MoveEnd wdParagraph, 2
With whole_doc.Find
.Text = whole_slide
.ClearFormatting
.Replacement.Text = Chr(13) & Chr(13) & slidenum & " : " & staticTC & Chr(11) & static_next_line & Chr(11) & "C2N03 " & slidenum & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1
.Execute Replace:=wdReplaceAll
End With
ElseIf Len(slide_s1) <> 0 And Len(slide_s2) <> 0 Then
'truncate line breaks at end of second sentence
slide_s2 = Replace(slide_s2, Chr(13), "")
slide_s2 = Left$(slide_s2, Len(slide_s2) - 1)
'replace text and move whole_slide selection to end of slide
whole_slide.MoveEnd wdParagraph, 3
With whole_doc.Find
.Text = whole_slide
.ClearFormatting
.Replacement.Text = Chr(13) & Chr(13) & slidenum & " : " & staticTC & Chr(11) & static_next_line & Chr(11) & "C2N03 " & slidenum & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1 & "C2N03 " & slide_s2
.Execute Replace:=wdReplaceAll
End With
End If
End If
Loop
End Sub
1) Dim in VBA works needs an explicit data type per variable name otherwise it is implicit Variant, Thus you
is in reality
2) What is it supposed to do? At the first glance it looks like your always searching all over the entire document, this should no be necessary.
Dim parasel, new_replace As String
is in reality
Dim parasol As Variant, new_replace As String
So always declare the type explicitly.2) What is it supposed to do? At the first glance it looks like your always searching all over the entire document, this should no be necessary.
ASKER
This, I did not know - thank you.
It's supposed to go through the 'input' and find slides with text and convert them to a specific format. Initially, I had done something like the code below where it would calculate where it was and then only go further down the page rather than starting from the top however I found that this actually took longer.
It's supposed to go through the 'input' and find slides with text and convert them to a specific format. Initially, I had done something like the code below where it would calculate where it was and then only go further down the page rather than starting from the top however I found that this actually took longer.
Sub RemoveParaOrLineBreaks()
Dim staticTC As String
staticTC = "--:--:--:--,--:--:--:--,10"
Dim static_next_line As String
static_next_line = "80 80 80"
Dim found, boolvar, firstsel As Boolean
Dim tpar, parnum, slidenum As Variant
Dim slide_s1 As String
Dim slide_s2 As String
Dim parasel, new_replace As String
Dim vararray() As String
Dim newsel, whole_slide As Range
tpar = ActiveDocument.Paragraphs.count
boolvar = True
firstsel = True
found = False
slide_s1 = ""
slide_s2 = ""
Set whole_doc = Selection.Range
whole_doc.WholeStory
Set whole_slide = Selection.Range
whole_slide.WholeStory
Do Until boolvar = False
found = False
'if this is the first time we're doing program, eval the first section
If firstsel = True Then
'find the first pattern of text: 4 digits followed by a space followed by a colon followed by a line break
With whole_slide.Find
.ClearFormatting
.Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, _
Format:=False, Wrap:=wdFindStop
'if we find it, tell us what we found and set found to true
If .found = True Then
whole_slide.Find.Execute FindText:="^#^#^#^#"
slidenum = whole_slide
continue = True
found = True
firstsel = False
parnum = ActiveDocument.Range(0, whole_slide.Paragraphs(1).Range.End).Paragraphs.count
'MsgBox slidenum
'MsgBox "whole slide is " & Chr(13) & whole_slide
End If
End With
'if found is still not true, find the second pattern: 4 digits, followed by a letter, followed by a colon followed by a line break
If found = False Then
With whole_slide.Find
.ClearFormatting
.Execute FindText:="^#^#^#^#^$" & Chr(32) & ":" & Chr(13), Forward:=True, _
Format:=False, Wrap:=wdFindStop
'if we find it, tell us what we found and set found to true. Also, parse found data and set slidenum to actual slide num
If .found = True Then
whole_slide.Find.Execute FindText:="^#^#^#^#^$"
slidenum = whole_slide
continue = True
found = True
firstsel = False
parnum = ActiveDocument.Range(0, whole_slide.Paragraphs(1).Range.End).Paragraphs.count
'MsgBox slidenum
End If
End With
End If
'if this is not the first time running the program
Else
Set whole_slide = ActiveDocument.Range(ActiveDocument.Paragraphs(parnum).Range.Start, ActiveDocument.Paragraphs(tpar).Range.End)
With whole_slide.Find
.ClearFormatting
.Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, _
Format:=False, Wrap:=wdFindStop
'if we find it, tell us what we found and set found to true
If .found = True Then
whole_slide.Find.Execute FindText:="^#^#^#^#"
slidenum = whole_slide
continue = True
found = True
'MsgBox whole_slide
' MsgBox slidenum
End If
End With
'if found is still not true, find the second pattern: 4 digits, followed by a letter, followed by a colon followed by a line break
If found = False Then
With whole_slide.Find
.ClearFormatting
.Execute FindText:="^#^#^#^#^$" & Chr(32) & ":" & Chr(13), Forward:=True, _
Format:=False, Wrap:=wdFindStop
'if we find it, tell us what we found and set found to true. Also, parse found data and set slidenum to actual slide num
If .found = True Then
wholse_slide.Find.Execute FindText:="^#^#^#^#^$"
slidenum = whole_slide
continue = True
found = True
'MsgBox slidenum
End If
End With
End If
End If
If found = False Then
boolvar = False
Exit Sub
End If
If (continue = True And boolvar = True) Then
'Assign sentence after slide number to slide_s1 var
slide_s1 = whole_slide.Next(Unit:=wdSentence, count:=1)
'Assign second sentence to slide_s2 var
slide_s2 = whole_slide.Next(Unit:=wdSentence, count:=2)
'Check if slide_s2 winds up being part of the next slide. If so, set it to null.
If (slide_s2 Like "*####*" = True) Then
slide_s2 = ""
End If
'Decide if slide has one or two sentences
If Len(slide_s1) <> 0 And Len(slide_s2) < 1 Then
'replace text and move whole_slide selection to end of slide
whole_slide.MoveEnd wdParagraph, 2
With whole_doc.Find
.Text = whole_slide
.ClearFormatting
.Replacement.Text = Chr(13) & Chr(13) & slidenum & " : " & staticTC & Chr(11) & static_next_line & Chr(11) & "C2N03 " & slidenum & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1
.Execute Replace:=wdReplaceAll
End With
ElseIf Len(slide_s1) <> 0 And Len(slide_s2) <> 0 Then
'truncate line breaks at end of second sentence
slide_s2 = Replace(slide_s2, Chr(13), "")
slide_s2 = Left$(slide_s2, Len(slide_s2) - 1)
'replace text and move whole_slide selection to end of slide
whole_slide.MoveEnd wdParagraph, 3
With whole_doc.Find
.Text = whole_slide
.ClearFormatting
.Replacement.Text = Chr(13) & Chr(13) & slidenum & " : " & staticTC & Chr(11) & static_next_line & Chr(11) & "C2N03 " & slidenum & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1 & "C2N03 " & slide_s2
.Execute Replace:=wdReplaceAll
End With
End If
End If
Loop
End Sub
You talk about "slides", but your sample document only has a table. Please clarify.
Another important thing:
Every VBA code module should/must start with Option Explicit. This enforces the explicit declaration of variables. Otherwise VBA declares them implicitly when used. Or rephrased, without you can have typos in your variable names, which go undetected and then you use the wrong values in the end. You have misspelled slide_s1.
And generally:
It's hard for a beginner to structure code right, But I would tackle this first. You should separate the searching from the formatting. It should look like this:
Every VBA code module should/must start with Option Explicit. This enforces the explicit declaration of variables. Otherwise VBA declares them implicitly when used. Or rephrased, without you can have typos in your variable names, which go undetected and then you use the wrong values in the end. You have misspelled slide_s1.
And generally:
It's hard for a beginner to structure code right, But I would tackle this first. You should separate the searching from the formatting. It should look like this:
Option Explicit
Public Sub ProcessDocument()
Selection.WholeStory
Selection.Find.ClearFormatting
Selection.Find.Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, Format:=False, Wrap:=wdFindStop
Do While Selection.Find.found
ProcessRange Selection.Range
Selection.Find.Execute
Loop
End Sub
Private Sub ProcessRange(ByVal CMatch As Word.Range)
Const STATIC_TC As String = "--:--:--:--,--:--:--:--,10"
Const STATIC_NEXT_LINE As String = "80 80 80"
Dim nulls2 As Boolean
Dim slide_s1 As String
Dim slide_s2 As String
nulls2 = False
slide_s1 = ""
slide_s2 = ""
'Assign sentence after slide number to slide_s1 var
slide_s1 = CMatch.Next(Unit:=wdSentence, Count:=1)
'if there are elipses, it's a special case we need to account for
If slide_s1 = " ..." Then
slide_s1 = slide_s1 & CMatch.Next(Unit:=wdSentence, Count:=2)
End If
'If there are two spaces, it's a special case we need to account for
If Right(slide_s1, 2) = Chr(32) & Chr(32) Then
slide_s1 = slide_s1 & CMatch.Next(Unit:=wdSentence, Count:=2)
nulls2 = True
End If
'Assign second sentence to slide_s2 var
slide_s2 = CMatch.Next(Unit:=wdSentence, Count:=2)
'Check if slide_s2 winds up being part of the next slide or if nulls2 is valid. If so, set it to null.
If (slide_s2 Like "*####*" = True Or nulls2 = True) Then
slide_s2 = ""
End If
CMatch.Text = "<" & Replace(CMatch.Text, vbCr, "") & ">" & vbCr
'Decide if slide has one or two sentences
If Len(slide_s1) > 0 And Len(slide_s2) < 1 Then
'replace text and move whole_slide selection to end of slide
CMatch.MoveEnd wdParagraph, 2
CMatch.Find.Text = CMatch
CMatch.Find.ClearFormatting
CMatch.Find.Replacement.Text = Chr(13) & Chr(13) & CMatch & " : " & STATIC_TC & Chr(11) & STATIC_NEXT_LINE & Chr(11) & "C2N03 " & CMatch & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1
CMatch.Find.Execute Replace:=wdReplaceAll
ElseIf Len(slide_s1) > 0 And Len(slide_s2) > 0 Then
'truncate line breaks at end of second sentence
slide_s2 = Replace(slide_s2, Chr(13), "")
slide_s2 = Left$(slide_s2, Len(slide_s2) - 1)
'replace text and move whole_slide selection to end of slide
CMatch.MoveEnd wdParagraph, 3
CMatch.Find.Text = CMatch
CMatch.Find.ClearFormatting
CMatch.Find.Replacement.Text = Chr(13) & Chr(13) & CMatch & " : " & STATIC_TC & Chr(11) & STATIC_NEXT_LINE & Chr(11) & "C2N03 " & CMatch & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1 & "C2N03 " & slide_s2
CMatch.Find.Execute Replace:=wdReplaceAll
End If
End Sub
It's not exactly what you code does, you need to adjust the internals.
ASKER
While I understand organization is important, I haven't really started getting into multiple functions so obviously this is both a great help and a bit advanced. That being said, do you think it will speed things up or just make it easier to read?
What does this do exactly? ByVal CMatch As Word.Range
What does this do exactly? ByVal CMatch As Word.Range
In this case:
It should speed up things, cause it only runs once over the entire doc. So it should be faster,
Line 15 defines a parameter named CMatch of the type Range. This contains the actual/current hit:
It should speed up things, cause it only runs once over the entire doc. So it should be faster,
Line 15 defines a parameter named CMatch of the type Range. This contains the actual/current hit:
Option Explicit
Public Sub ProcessDocument()
Selection.WholeStory
Selection.Find.ClearFormatting
Selection.Find.Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, Format:=False, Wrap:=wdFindStop
Do While Selection.Find.found
ProcessRange Selection.Range
Selection.Find.Execute
Loop
End Sub
Private Sub ProcessRange(ByVal CMatch As Word.Range)
Debug.Print "Match: '" & CMatch.Text & "'."
End Sub
You can see the output of Debug.Print in the immediate window (CTRL-G, when not open).
ASKER
I see. I guess I had to do some studying. A simple copy and paste of the code shows some errors and now I have to figure out how to debug this as it's complicated.
ASKER
Does this account for the cases where the slide number is four digits and a letter?
ASKER
Can you explain the necessity of this:
CMatch.Text = "<" & Replace(CMatch.Text, vbCr, "") & ">" & vbCr
I can see what it does however I don't see why when I delete it the code just hangs. TO me, it's pointless as I can't have those brackets around the slide numbers anyway.
CMatch.Text = "<" & Replace(CMatch.Text, vbCr, "") & ">" & vbCr
I can see what it does however I don't see why when I delete it the code just hangs. TO me, it's pointless as I can't have those brackets around the slide numbers anyway.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Neil, this was really helpful. While ste5an re-wrote the code in what I'm being told is an optimized way, your version is much easier to understand, especially with the comments. I really appreciate it!!!
I'm curious why you chose ^p instead of Chr(32) for a carriage return in the .execute FindText section?
Also, let me know if I should ask a new question with this, however while I was doing some troubleshooting in my code, I found I needed to add some corner case scenarios as follows:
Since, in your version, no parsing of sentences are occurring, I presume these are totally unnecessary?
I'm curious why you chose ^p instead of Chr(32) for a carriage return in the .execute FindText section?
Also, let me know if I should ask a new question with this, however while I was doing some troubleshooting in my code, I found I needed to add some corner case scenarios as follows:
'if there are elipses, it's a special case we need to account for
If slide_s1 = " ..." Then
slide_s1 = slide_s1 & CMatch.Next(Unit:=wdSentence, count:=2)
End If
'If there are two spaces, it's a special case we need to account for
If Right(slide_s1, 2) = Chr(32) & Chr(32) Then
slide_s1 = slide_s1 & CMatch.Next(Unit:=wdSentence, count:=2)
nulls2 = True
End If
Since, in your version, no parsing of sentences are occurring, I presume these are totally unnecessary?
Chr(32) returns a space (i think you meant chr(13)) in VBA but won't work in a Word find-replace string.
Just as you needed "^#" for a digit in the string, so you need "^p" for a carriage return in the find-replace syntax.
And yes, by treating the whole of the slide text in that way, there is no need to figure out the number of sentences, or check for double spaces etc.
The thing that should speed it up most is avoiding the Selection object and built-in Word functions like "Find", "Move" etc.
The other thing you might try is to turn off screen updating while the routine runs. That will reduce the time overhead caused by Word constantly moving text around, which may be significant if the document is 50 pages long.
The following two routines turn off the screen and any alert, and then turn them back on and refresh the screen.
So you could add "allOff" at the start of your routine and "allOn" at the end. But I would advise adding an error trap routine that calls allOn in the event of your code breaking.
Thus:
Just as you needed "^#" for a digit in the string, so you need "^p" for a carriage return in the find-replace syntax.
And yes, by treating the whole of the slide text in that way, there is no need to figure out the number of sentences, or check for double spaces etc.
The thing that should speed it up most is avoiding the Selection object and built-in Word functions like "Find", "Move" etc.
The other thing you might try is to turn off screen updating while the routine runs. That will reduce the time overhead caused by Word constantly moving text around, which may be significant if the document is 50 pages long.
The following two routines turn off the screen and any alert, and then turn them back on and refresh the screen.
Sub allOff()
Application.ScreenUpdating = False
Application.DisplayAlerts = False
End Sub
Sub allOn()
Application.ScreenUpdating = True
Application.DisplayAlerts = True
Application.ScreenRefresh
End Sub
So you could add "allOff" at the start of your routine and "allOn" at the end. But I would advise adding an error trap routine that calls allOn in the event of your code breaking.
Thus:
Sub RemoveParaOrLineBreaks()
'constant and variable declarations here...
'trap errors:
On Error GoTo errortrap
'turn off the screen
alloff
'code here:
'xxxxxxx
'end of code
'turn screen back on
allOn
'exit routine if all ok
Exit Sub
'handle error if not ok
errortrap:
allon
MsgBox ("An error has occurred") 'or whatever..
End Sub
ASKER
Yes, I did mean chr(13).
I'll experiment with the updating aspect however since using your code, we're seeing around a 150% increase in speed which is much more in line with the target I was looking for so users wouldn't become impatient.
To your original post, I do agree that the range objects are tricky and I appreciate you breaking them down for me.
Another question about the code your wrote:
rslide.End = rdoc.start -1
Minus 1 what, exactly? I'm used to setting units like sentences or words.
I'll experiment with the updating aspect however since using your code, we're seeing around a 150% increase in speed which is much more in line with the target I was looking for so users wouldn't become impatient.
To your original post, I do agree that the range objects are tricky and I appreciate you breaking them down for me.
Another question about the code your wrote:
rslide.End = rdoc.start -1
Minus 1 what, exactly? I'm used to setting units like sentences or words.
the .start and .end properties of a range are simply the character numbers at which the range starts and ends. My code is manually adjusting those directly, rather than using ".move" etc, which seems to run much faster.
I just built a dummy version of your document with 16 pages in it, and embedded a millisecond timer in it. Seems to be 11% faster with the screen off.
Screen updating on throughout : 9.102 secs.
Screen updating off : 7.213 secs.
I just built a dummy version of your document with 16 pages in it, and embedded a millisecond timer in it. Seems to be 11% faster with the screen off.
Screen updating on throughout : 9.102 secs.
Screen updating off : 7.213 secs.
ASKER
Wow. That might be worth it and save us another 10-15 seconds on our 50page test script. Thanks so much for walking me through this and being patient. If you're interested, I posted another question related to same code that I'd love for you to look at.
Just to clarify:
rslide.End = rdoc.start -1
sets the end of the slide range, on each loop, to the character before the start of the "found range" in rdoc. since rslide.start has already been set, this makes a range that exactly covers the characters in the slide which precedes the range which results from the "find" command.
rslide.End = rdoc.start -1
sets the end of the slide range, on each loop, to the character before the start of the "found range" in rdoc. since rslide.start has already been set, this makes a range that exactly covers the characters in the slide which precedes the range which results from the "find" command.
I posted this on your other question too, but it really belongs here: The thing that speeds the code up more than ANYTHING, I discover, is eliminating the Word table in which you currently have the text. I converted the table to text and enlarged the file to 50 pages.
Then I I tweaked the code to handle both a table and ordinary word text.
An "unprocessed" sample file, with no table, is attached to this comment. Also with code module in it.
I'd suggest converting the slides in non-table form. If you really need a table, then you can either accept the slower processing time, or write a short routine to convert the table to text, process the text, and then convert back to a table.
LA-FILLE-DU-REGIMENT-NF.docm
Then I I tweaked the code to handle both a table and ordinary word text.
If rSlide.Tables.Count = 0 Then
rSlide.End = ActiveDocument.Range.End - 1
Else
rSlide.End = rSlide.Cells(1).Range.End - 2
End If
A 50 page document with no table now processes in 7.93 seconds on my laptop. This is probably because we are saving the time overhead cased by Word adjusting the table size, even with screen updates turned off.An "unprocessed" sample file, with no table, is attached to this comment. Also with code module in it.
I'd suggest converting the slides in non-table form. If you really need a table, then you can either accept the slower processing time, or write a short routine to convert the table to text, process the text, and then convert back to a table.
LA-FILLE-DU-REGIMENT-NF.docm
ASKER
LA-FILLE-DU-REGIMENT.docx