Link to home
Start Free TrialLog in
Avatar of huntson
huntson

asked on

How can I optimize this VBA script?

Below is a VBA script I've been working on that I'd like to optimize.  It runs against a word doc (I'll provide a small sample file) of captions.  When running against 51 pages, it takes forever (just under 3 minutes).  I'm not a programmer at all so in any answers you give, please be a bit patient and descriptive.  

Sub RemoveParaOrLineBreaks()



Dim staticTC As String
staticTC = "--:--:--:--,--:--:--:--,10"

Dim static_next_line As String
static_next_line = "80 80 80"



Dim found, boolvar, nulls2 As Boolean


Dim tpar, parnum, slidenum As Variant


Dim slide_s1 As String
Dim slide_s2 As String



Dim parasel, new_replace As String


Dim vararray() As String


Dim newsel, whole_slide As Range


tpar = ActiveDocument.Paragraphs.count

boolvar = True
nulls2 = False


slide_s1 = ""
slide_s2 = ""

Set whole_doc = Selection.Range
whole_doc.WholeStory





Do Until boolvar = False

found = False
Set whole_slide = Selection.Range
whole_slide.WholeStory


'find the first pattern of text: 4 digits followed by a space followed by a colon followed by a line break



    With whole_slide.Find
 
            .ClearFormatting
            .Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, _
            Format:=False, Wrap:=wdFindStop
    
            

'if we find it, tell us what we found and set found to true
            If .found = True Then

                whole_slide.Find.Execute FindText:="^#^#^#^#"
                slidenum = whole_slide
                continue = True
                found = True
                firstsel = False
           
         
                'MsgBox slidenum
                
                
            End If
                
    End With
    
  
    
'if found is still not true, find the second pattern: 4 digits, followed by a letter, followed by a colon followed by a line break
    If found = False Then

        With whole_slide.Find
            .ClearFormatting
            .Execute FindText:="^#^#^#^#^$" & ":" & Chr(13), Forward:=True, _
                Format:=False, Wrap:=wdFindStop
                

                
 'if we find it, tell us what we found and set found to true.  Also, parse found data and set slidenum to actual slide num
         If .found = True Then
                    
                whole_slide.Find.Execute FindText:="^#^#^#^#^$"
                slidenum = whole_slide
                continue = True
                found = True
                firstsel = False
                
            
                'MsgBox slidenum
                           
                End If
        End With
        
    End If

    


If found = False Then

    boolvar = False
    Exit Sub
End If


If (continue = True And boolvar = True) Then


                'Assign sentence after slide number to slide_s1 var

                                slide_s1 = whole_slide.Next(Unit:=wdSentence, count:=1)
                                
                'if there are elipses, it's a special case we need to account for
                                If slilde_s1 = " ..." Then
                                    slide_s1 = slide_s1 & whole_slide.Next(Unit:=wdSentence, count:=2)
                                End If
                'If there are two spaces, it's a special case we need to account for
                                If Right(slide_s1, 2) = Chr(32) & Chr(32) Then
                                    slide_s1 = slide_s1 & whole_slide.Next(Unit:=wdSentence, count:=2)
                                    nulls2 = True
                                End If
                            
                            'MsgBox "|" + slide_s1 + "|"


                'Assign second sentence to slide_s2 var
                                slide_s2 = whole_slide.Next(Unit:=wdSentence, count:=2)


                'Check if slide_s2 winds up being part of the next slide or if nulls2 is valid.  If so, set it to null.
                          
                           If (slide_s2 Like "*####*" = True Or nulls2 = True) Then
                                slide_s2 = ""
                            End If
                            
   '  'MsgBox "|" + slide_s2 + "|"
                    

                'Decide if slide has one or two sentences
                
                            If Len(slide_s1) <> 0 And Len(slide_s2) < 1 Then

                                'replace text and move whole_slide selection to end of slide

                                        whole_slide.MoveEnd wdParagraph, 2



                                        With whole_doc.Find

                                            .Text = whole_slide
                                            .ClearFormatting
                                            .Replacement.Text = Chr(13) & Chr(13) & slidenum & " : " & staticTC & Chr(11) & static_next_line & Chr(11) & "C2N03 " & slidenum & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1
                                            .Execute Replace:=wdReplaceAll


                                        End With

                            ElseIf Len(slide_s1) <> 0 And Len(slide_s2) <> 0 Then


                                 'truncate line breaks at end of second sentence
                                           slide_s2 = Replace(slide_s2, Chr(13), "")
                                           slide_s2 = Left$(slide_s2, Len(slide_s2) - 1)

                                 'replace text and move whole_slide selection to end of slide

                                 whole_slide.MoveEnd wdParagraph, 3

                                        With whole_doc.Find

                                            .Text = whole_slide
                                            .ClearFormatting
                                            .Replacement.Text = Chr(13) & Chr(13) & slidenum & " : " & staticTC & Chr(11) & static_next_line & Chr(11) & "C2N03 " & slidenum & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1 & "C2N03  " & slide_s2
                                           .Execute Replace:=wdReplaceAll

                                        End With

                                    End If
                                 
End If

Loop
                            
        
End Sub

Open in new window

Avatar of huntson
huntson

ASKER

Avatar of ste5an
1) Dim in VBA works needs an explicit data type per variable name otherwise it is implicit Variant, Thus you

Dim parasel, new_replace As String

Open in new window


is in reality
Dim parasol As Variant, new_replace As String

Open in new window

So always declare the type explicitly.

2) What is it supposed to do? At the first glance it looks like your always searching all over the entire document, this should no be necessary.
Avatar of huntson

ASKER

This, I did not know - thank you.  

It's supposed to go through the 'input' and find slides with text and convert them to a specific format.  Initially, I had done something like the code below where it would calculate where it was and then only go further down the page rather than starting from the top however I found that this actually took longer.

Sub RemoveParaOrLineBreaks()



Dim staticTC As String
staticTC = "--:--:--:--,--:--:--:--,10"

Dim static_next_line As String
static_next_line = "80 80 80"



Dim found, boolvar, firstsel As Boolean


Dim tpar, parnum, slidenum As Variant


Dim slide_s1 As String
Dim slide_s2 As String



Dim parasel, new_replace As String


Dim vararray() As String


Dim newsel, whole_slide As Range


tpar = ActiveDocument.Paragraphs.count

boolvar = True
firstsel = True
found = False

slide_s1 = ""
slide_s2 = ""

Set whole_doc = Selection.Range
whole_doc.WholeStory


Set whole_slide = Selection.Range
whole_slide.WholeStory



Do Until boolvar = False
found = False



'if this is the first time we're doing program, eval the first section

If firstsel = True Then

'find the first pattern of text: 4 digits followed by a space followed by a colon followed by a line break




    With whole_slide.Find
 
            .ClearFormatting
            .Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, _
            Format:=False, Wrap:=wdFindStop
            

'if we find it, tell us what we found and set found to true
            If .found = True Then

                whole_slide.Find.Execute FindText:="^#^#^#^#"
                slidenum = whole_slide
                continue = True
                found = True
                firstsel = False
                parnum = ActiveDocument.Range(0, whole_slide.Paragraphs(1).Range.End).Paragraphs.count
         
                'MsgBox slidenum
                'MsgBox "whole slide is " & Chr(13) & whole_slide
                
            End If
                
    End With
    
  
    
'if found is still not true, find the second pattern: 4 digits, followed by a letter, followed by a colon followed by a line break
    If found = False Then

        With whole_slide.Find
            .ClearFormatting
            .Execute FindText:="^#^#^#^#^$" & Chr(32) & ":" & Chr(13), Forward:=True, _
                Format:=False, Wrap:=wdFindStop
                

                
 'if we find it, tell us what we found and set found to true.  Also, parse found data and set slidenum to actual slide num
         If .found = True Then
                    
                whole_slide.Find.Execute FindText:="^#^#^#^#^$"
                slidenum = whole_slide
                continue = True
                found = True
                firstsel = False
                parnum = ActiveDocument.Range(0, whole_slide.Paragraphs(1).Range.End).Paragraphs.count
            
                'MsgBox slidenum
                           
                End If
        End With
        
    End If


    
'if this is not the first time running the program

Else

Set whole_slide = ActiveDocument.Range(ActiveDocument.Paragraphs(parnum).Range.Start, ActiveDocument.Paragraphs(tpar).Range.End)


    With whole_slide.Find
            .ClearFormatting
            .Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, _
                Format:=False, Wrap:=wdFindStop



'if we find it, tell us what we found and set found to true
            If .found = True Then

                whole_slide.Find.Execute FindText:="^#^#^#^#"
                slidenum = whole_slide
                continue = True
                found = True
                'MsgBox whole_slide

               
             
               ' MsgBox slidenum

            End If

    End With



'if found is still not true, find the second pattern: 4 digits, followed by a letter, followed by a colon followed by a line break
    If found = False Then

        With whole_slide.Find
            .ClearFormatting
            .Execute FindText:="^#^#^#^#^$" & Chr(32) & ":" & Chr(13), Forward:=True, _
                Format:=False, Wrap:=wdFindStop



 'if we find it, tell us what we found and set found to true.  Also, parse found data and set slidenum to actual slide num
         If .found = True Then

                wholse_slide.Find.Execute FindText:="^#^#^#^#^$"
                slidenum = whole_slide
                continue = True
                found = True
                'MsgBox slidenum

        End If
        
        End With

    End If
    
End If
    


If found = False Then
    boolvar = False
    Exit Sub
End If


If (continue = True And boolvar = True) Then


                'Assign sentence after slide number to slide_s1 var

                                slide_s1 = whole_slide.Next(Unit:=wdSentence, count:=1)


                'Assign second sentence to slide_s2 var
                                slide_s2 = whole_slide.Next(Unit:=wdSentence, count:=2)

                'Check if slide_s2 winds up being part of the next slide.  If so, set it to null.
                          
                           If (slide_s2 Like "*####*" = True) Then
                                slide_s2 = ""
                            End If
                                    
                    

                'Decide if slide has one or two sentences
                
                            If Len(slide_s1) <> 0 And Len(slide_s2) < 1 Then

                                'replace text and move whole_slide selection to end of slide

                                        whole_slide.MoveEnd wdParagraph, 2



                                        With whole_doc.Find

                                            .Text = whole_slide
                                            .ClearFormatting
                                            .Replacement.Text = Chr(13) & Chr(13) & slidenum & " : " & staticTC & Chr(11) & static_next_line & Chr(11) & "C2N03 " & slidenum & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1
                                            .Execute Replace:=wdReplaceAll


                                        End With

                            ElseIf Len(slide_s1) <> 0 And Len(slide_s2) <> 0 Then


                                 'truncate line breaks at end of second sentence
                                           slide_s2 = Replace(slide_s2, Chr(13), "")
                                           slide_s2 = Left$(slide_s2, Len(slide_s2) - 1)

                                 'replace text and move whole_slide selection to end of slide

                                 whole_slide.MoveEnd wdParagraph, 3

                                        With whole_doc.Find

                                            .Text = whole_slide
                                            .ClearFormatting
                                            .Replacement.Text = Chr(13) & Chr(13) & slidenum & " : " & staticTC & Chr(11) & static_next_line & Chr(11) & "C2N03 " & slidenum & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1 & "C2N03  " & slide_s2
                                           .Execute Replace:=wdReplaceAll

                                        End With

                                    End If
                                 
   End If

 Loop
                            
        
End Sub

Open in new window

You talk about "slides", but your sample document only has a table.  Please clarify.
Another important thing:

Every VBA code module should/must start with Option Explicit. This enforces the explicit declaration of variables. Otherwise VBA declares them implicitly when used. Or rephrased, without you can have typos in your variable names, which go undetected and then you use the wrong values in the end. You have misspelled slide_s1.

And generally:
It's hard for a beginner to structure code right, But I would tackle this first. You should separate the searching from the formatting. It should look like this:

Option Explicit

Public Sub ProcessDocument()
  
  Selection.WholeStory
  Selection.Find.ClearFormatting
  Selection.Find.Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, Format:=False, Wrap:=wdFindStop
  Do While Selection.Find.found
    ProcessRange Selection.Range
    Selection.Find.Execute
  Loop
 
End Sub

Private Sub ProcessRange(ByVal CMatch As Word.Range)

  Const STATIC_TC As String = "--:--:--:--,--:--:--:--,10"
  Const STATIC_NEXT_LINE As String = "80 80 80"
   
  Dim nulls2 As Boolean
  Dim slide_s1 As String
  Dim slide_s2 As String
    
  nulls2 = False
  slide_s1 = ""
  slide_s2 = ""

  'Assign sentence after slide number to slide_s1 var
  slide_s1 = CMatch.Next(Unit:=wdSentence, Count:=1)
  'if there are elipses, it's a special case we need to account for
  If slide_s1 = " ..." Then
    slide_s1 = slide_s1 & CMatch.Next(Unit:=wdSentence, Count:=2)
  End If

  'If there are two spaces, it's a special case we need to account for
  If Right(slide_s1, 2) = Chr(32) & Chr(32) Then
    slide_s1 = slide_s1 & CMatch.Next(Unit:=wdSentence, Count:=2)
    nulls2 = True
  End If

  'Assign second sentence to slide_s2 var
  slide_s2 = CMatch.Next(Unit:=wdSentence, Count:=2)
  'Check if slide_s2 winds up being part of the next slide or if nulls2 is valid.  If so, set it to null.
  If (slide_s2 Like "*####*" = True Or nulls2 = True) Then
    slide_s2 = ""
  End If

  CMatch.Text = "<" & Replace(CMatch.Text, vbCr, "") & ">" & vbCr

  'Decide if slide has one or two sentences
  If Len(slide_s1) > 0 And Len(slide_s2) < 1 Then
    'replace text and move whole_slide selection to end of slide
    CMatch.MoveEnd wdParagraph, 2
    CMatch.Find.Text = CMatch
    CMatch.Find.ClearFormatting
    CMatch.Find.Replacement.Text = Chr(13) & Chr(13) & CMatch & " : " & STATIC_TC & Chr(11) & STATIC_NEXT_LINE & Chr(11) & "C2N03 " & CMatch & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1
    CMatch.Find.Execute Replace:=wdReplaceAll
  ElseIf Len(slide_s1) > 0 And Len(slide_s2) > 0 Then
    'truncate line breaks at end of second sentence
    slide_s2 = Replace(slide_s2, Chr(13), "")
    slide_s2 = Left$(slide_s2, Len(slide_s2) - 1)
    'replace text and move whole_slide selection to end of slide
    CMatch.MoveEnd wdParagraph, 3
    CMatch.Find.Text = CMatch
    CMatch.Find.ClearFormatting
    CMatch.Find.Replacement.Text = Chr(13) & Chr(13) & CMatch & " : " & STATIC_TC & Chr(11) & STATIC_NEXT_LINE & Chr(11) & "C2N03 " & CMatch & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1 & "C2N03  " & slide_s2
    CMatch.Find.Execute Replace:=wdReplaceAll
  End If
    
End Sub

Open in new window

It's not exactly what you code does, you need to adjust the internals.
Avatar of huntson

ASKER

While I understand organization is important, I haven't really started getting into multiple functions so obviously this is both a great help and a bit advanced.  That being said, do you think it will speed things up or just make it easier to read?

What does this do exactly? ByVal CMatch As Word.Range
In this case:

It should speed up things, cause it only runs once over the entire doc. So it should be faster,

Line 15 defines a parameter named CMatch of the type Range. This contains the actual/current hit:

Option Explicit

Public Sub ProcessDocument()
  
  Selection.WholeStory
  Selection.Find.ClearFormatting
  Selection.Find.Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, Format:=False, Wrap:=wdFindStop
  Do While Selection.Find.found
    ProcessRange Selection.Range
    Selection.Find.Execute
  Loop
 
End Sub

Private Sub ProcessRange(ByVal CMatch As Word.Range)

  Debug.Print "Match: '" & CMatch.Text & "'."
    
End Sub

Open in new window

You can see the output of Debug.Print in the immediate window (CTRL-G, when not open).
Avatar of huntson

ASKER

I see.  I guess I had to do some studying.  A simple copy and paste of the code shows some errors and now I have to figure out how to debug this as it's complicated.
Avatar of huntson

ASKER

Does this account for the cases where the slide number is four digits and a letter?
Avatar of huntson

ASKER

Can you explain the necessity of this:

CMatch.Text = "<" & Replace(CMatch.Text, vbCr, "") & ">" & vbCr

I can see what it does however I don't see why when I delete it the code just hangs.  TO me, it's pointless as I can't have those brackets around the slide numbers anyway.
ASKER CERTIFIED SOLUTION
Avatar of Neil Fleming
Neil Fleming
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of huntson

ASKER

Neil, this was really helpful.  While ste5an re-wrote the code in what I'm being told is an optimized way, your version is much easier to understand, especially with the comments.  I really appreciate it!!!

I'm curious why you chose ^p instead of Chr(32) for a carriage return in the .execute FindText section?

Also, let me know if I should ask a new question with this, however while I was doing some troubleshooting in my code, I found I needed to add some corner case scenarios as follows:

'if there are elipses, it's a special case we need to account for
  If slide_s1 = " ..." Then
    slide_s1 = slide_s1 & CMatch.Next(Unit:=wdSentence, count:=2)
  End If

  'If there are two spaces, it's a special case we need to account for
  If Right(slide_s1, 2) = Chr(32) & Chr(32) Then
    slide_s1 = slide_s1 & CMatch.Next(Unit:=wdSentence, count:=2)
    nulls2 = True
  End If

Open in new window


Since, in your version, no parsing of sentences are occurring, I presume these are totally unnecessary?
Chr(32) returns a space (i think you meant chr(13)) in VBA but won't work in a Word find-replace string.

Just as you needed "^#" for a digit in the string, so you need "^p" for a carriage return in the find-replace syntax.

And yes, by treating the whole of the slide text in that way, there is no need to figure out the number of sentences, or check for double spaces etc.

The thing that should speed it up most is avoiding the Selection object and built-in Word functions like "Find", "Move" etc.

The other thing you might try is to turn off screen updating while the routine runs. That will reduce the time overhead caused by Word constantly moving text around, which may be significant if the document is 50 pages long.

The following two routines turn off the screen and any alert, and then turn them back on and refresh the screen.

Sub allOff()
Application.ScreenUpdating = False
Application.DisplayAlerts = False
End Sub

Sub allOn()
Application.ScreenUpdating = True
Application.DisplayAlerts = True
Application.ScreenRefresh
End Sub

Open in new window


So you could add "allOff" at the start of your routine and "allOn" at the end. But I would advise adding an error trap routine that calls allOn in the event of your code breaking.

Thus:

Sub RemoveParaOrLineBreaks()

'constant and variable declarations here...

'trap errors:
On Error GoTo errortrap
'turn off the screen
alloff

'code here:
'xxxxxxx
'end of code

'turn screen back on
allOn
'exit routine if all ok
Exit Sub

'handle error if not ok
errortrap:
allon
MsgBox ("An error has occurred") 'or whatever..
End Sub

Open in new window

Avatar of huntson

ASKER

Yes, I did mean chr(13).  

I'll experiment with the updating aspect however since using your code, we're seeing around a 150% increase in speed which is much more in line with the target I was looking for so users wouldn't become impatient.  

To your original post, I do agree that the range objects are tricky and I appreciate you breaking them down for me.

Another question about the code your wrote:

rslide.End  = rdoc.start -1

Minus 1 what, exactly?  I'm used to setting units like sentences or words.
the .start and .end properties of a range are simply the character numbers at which the range starts and ends. My code is manually adjusting those directly, rather than using ".move" etc, which seems to run much faster.

I just built a dummy version of your document with 16 pages in it, and embedded a millisecond timer in it. Seems to be 11% faster with the screen off.

Screen updating  on throughout   : 9.102 secs.
Screen updating off                         : 7.213 secs.
Avatar of huntson

ASKER

Wow.  That might be worth it and save us another 10-15 seconds on our 50page test script.  Thanks so much for walking me through this and being patient.  If you're interested, I posted another question related to same code that I'd love for you to look at.
Just to clarify:
rslide.End  = rdoc.start -1

sets the end of the slide range, on each loop, to the character before the start of the "found range" in rdoc. since rslide.start has already been set, this makes a range that exactly covers the characters in the slide which precedes the range which  results from the "find" command.
I posted this on your other question too, but it really belongs here:  The thing that speeds the code up more than ANYTHING, I discover, is eliminating the Word table in which you currently have the text. I converted the table to text and enlarged the file to 50 pages.

Then I I tweaked the code to handle both a table and ordinary word text.

If rSlide.Tables.Count = 0 Then
                rSlide.End = ActiveDocument.Range.End - 1
                Else
                rSlide.End = rSlide.Cells(1).Range.End - 2
                End If

Open in new window

A 50 page document with no table now processes in 7.93 seconds on my laptop. This is probably because we are saving the time overhead cased by Word adjusting the table size, even with screen updates turned off.

An "unprocessed" sample file, with no table, is attached to this comment. Also with code module in it.

I'd suggest converting the slides in non-table form. If you really need a table, then you can either accept the slower processing time, or write a short routine to convert the table to text, process the text, and then convert back to a table.
LA-FILLE-DU-REGIMENT-NF.docm