How can I optimize this VBA script?

huntson
huntson used Ask the Experts™
on
Below is a VBA script I've been working on that I'd like to optimize.  It runs against a word doc (I'll provide a small sample file) of captions.  When running against 51 pages, it takes forever (just under 3 minutes).  I'm not a programmer at all so in any answers you give, please be a bit patient and descriptive.  

Sub RemoveParaOrLineBreaks()



Dim staticTC As String
staticTC = "--:--:--:--,--:--:--:--,10"

Dim static_next_line As String
static_next_line = "80 80 80"



Dim found, boolvar, nulls2 As Boolean


Dim tpar, parnum, slidenum As Variant


Dim slide_s1 As String
Dim slide_s2 As String



Dim parasel, new_replace As String


Dim vararray() As String


Dim newsel, whole_slide As Range


tpar = ActiveDocument.Paragraphs.count

boolvar = True
nulls2 = False


slide_s1 = ""
slide_s2 = ""

Set whole_doc = Selection.Range
whole_doc.WholeStory





Do Until boolvar = False

found = False
Set whole_slide = Selection.Range
whole_slide.WholeStory


'find the first pattern of text: 4 digits followed by a space followed by a colon followed by a line break



    With whole_slide.Find
 
            .ClearFormatting
            .Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, _
            Format:=False, Wrap:=wdFindStop
    
            

'if we find it, tell us what we found and set found to true
            If .found = True Then

                whole_slide.Find.Execute FindText:="^#^#^#^#"
                slidenum = whole_slide
                continue = True
                found = True
                firstsel = False
           
         
                'MsgBox slidenum
                
                
            End If
                
    End With
    
  
    
'if found is still not true, find the second pattern: 4 digits, followed by a letter, followed by a colon followed by a line break
    If found = False Then

        With whole_slide.Find
            .ClearFormatting
            .Execute FindText:="^#^#^#^#^$" & ":" & Chr(13), Forward:=True, _
                Format:=False, Wrap:=wdFindStop
                

                
 'if we find it, tell us what we found and set found to true.  Also, parse found data and set slidenum to actual slide num
         If .found = True Then
                    
                whole_slide.Find.Execute FindText:="^#^#^#^#^$"
                slidenum = whole_slide
                continue = True
                found = True
                firstsel = False
                
            
                'MsgBox slidenum
                           
                End If
        End With
        
    End If

    


If found = False Then

    boolvar = False
    Exit Sub
End If


If (continue = True And boolvar = True) Then


                'Assign sentence after slide number to slide_s1 var

                                slide_s1 = whole_slide.Next(Unit:=wdSentence, count:=1)
                                
                'if there are elipses, it's a special case we need to account for
                                If slilde_s1 = " ..." Then
                                    slide_s1 = slide_s1 & whole_slide.Next(Unit:=wdSentence, count:=2)
                                End If
                'If there are two spaces, it's a special case we need to account for
                                If Right(slide_s1, 2) = Chr(32) & Chr(32) Then
                                    slide_s1 = slide_s1 & whole_slide.Next(Unit:=wdSentence, count:=2)
                                    nulls2 = True
                                End If
                            
                            'MsgBox "|" + slide_s1 + "|"


                'Assign second sentence to slide_s2 var
                                slide_s2 = whole_slide.Next(Unit:=wdSentence, count:=2)


                'Check if slide_s2 winds up being part of the next slide or if nulls2 is valid.  If so, set it to null.
                          
                           If (slide_s2 Like "*####*" = True Or nulls2 = True) Then
                                slide_s2 = ""
                            End If
                            
   '  'MsgBox "|" + slide_s2 + "|"
                    

                'Decide if slide has one or two sentences
                
                            If Len(slide_s1) <> 0 And Len(slide_s2) < 1 Then

                                'replace text and move whole_slide selection to end of slide

                                        whole_slide.MoveEnd wdParagraph, 2



                                        With whole_doc.Find

                                            .Text = whole_slide
                                            .ClearFormatting
                                            .Replacement.Text = Chr(13) & Chr(13) & slidenum & " : " & staticTC & Chr(11) & static_next_line & Chr(11) & "C2N03 " & slidenum & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1
                                            .Execute Replace:=wdReplaceAll


                                        End With

                            ElseIf Len(slide_s1) <> 0 And Len(slide_s2) <> 0 Then


                                 'truncate line breaks at end of second sentence
                                           slide_s2 = Replace(slide_s2, Chr(13), "")
                                           slide_s2 = Left$(slide_s2, Len(slide_s2) - 1)

                                 'replace text and move whole_slide selection to end of slide

                                 whole_slide.MoveEnd wdParagraph, 3

                                        With whole_doc.Find

                                            .Text = whole_slide
                                            .ClearFormatting
                                            .Replacement.Text = Chr(13) & Chr(13) & slidenum & " : " & staticTC & Chr(11) & static_next_line & Chr(11) & "C2N03 " & slidenum & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1 & "C2N03  " & slide_s2
                                           .Execute Replace:=wdReplaceAll

                                        End With

                                    End If
                                 
End If

Loop
                            
        
End Sub

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Author

Commented:
ste5anSenior Developer

Commented:
1) Dim in VBA works needs an explicit data type per variable name otherwise it is implicit Variant, Thus you

Dim parasel, new_replace As String

Open in new window


is in reality
Dim parasol As Variant, new_replace As String

Open in new window

So always declare the type explicitly.

2) What is it supposed to do? At the first glance it looks like your always searching all over the entire document, this should no be necessary.

Author

Commented:
This, I did not know - thank you.  

It's supposed to go through the 'input' and find slides with text and convert them to a specific format.  Initially, I had done something like the code below where it would calculate where it was and then only go further down the page rather than starting from the top however I found that this actually took longer.

Sub RemoveParaOrLineBreaks()



Dim staticTC As String
staticTC = "--:--:--:--,--:--:--:--,10"

Dim static_next_line As String
static_next_line = "80 80 80"



Dim found, boolvar, firstsel As Boolean


Dim tpar, parnum, slidenum As Variant


Dim slide_s1 As String
Dim slide_s2 As String



Dim parasel, new_replace As String


Dim vararray() As String


Dim newsel, whole_slide As Range


tpar = ActiveDocument.Paragraphs.count

boolvar = True
firstsel = True
found = False

slide_s1 = ""
slide_s2 = ""

Set whole_doc = Selection.Range
whole_doc.WholeStory


Set whole_slide = Selection.Range
whole_slide.WholeStory



Do Until boolvar = False
found = False



'if this is the first time we're doing program, eval the first section

If firstsel = True Then

'find the first pattern of text: 4 digits followed by a space followed by a colon followed by a line break




    With whole_slide.Find
 
            .ClearFormatting
            .Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, _
            Format:=False, Wrap:=wdFindStop
            

'if we find it, tell us what we found and set found to true
            If .found = True Then

                whole_slide.Find.Execute FindText:="^#^#^#^#"
                slidenum = whole_slide
                continue = True
                found = True
                firstsel = False
                parnum = ActiveDocument.Range(0, whole_slide.Paragraphs(1).Range.End).Paragraphs.count
         
                'MsgBox slidenum
                'MsgBox "whole slide is " & Chr(13) & whole_slide
                
            End If
                
    End With
    
  
    
'if found is still not true, find the second pattern: 4 digits, followed by a letter, followed by a colon followed by a line break
    If found = False Then

        With whole_slide.Find
            .ClearFormatting
            .Execute FindText:="^#^#^#^#^$" & Chr(32) & ":" & Chr(13), Forward:=True, _
                Format:=False, Wrap:=wdFindStop
                

                
 'if we find it, tell us what we found and set found to true.  Also, parse found data and set slidenum to actual slide num
         If .found = True Then
                    
                whole_slide.Find.Execute FindText:="^#^#^#^#^$"
                slidenum = whole_slide
                continue = True
                found = True
                firstsel = False
                parnum = ActiveDocument.Range(0, whole_slide.Paragraphs(1).Range.End).Paragraphs.count
            
                'MsgBox slidenum
                           
                End If
        End With
        
    End If


    
'if this is not the first time running the program

Else

Set whole_slide = ActiveDocument.Range(ActiveDocument.Paragraphs(parnum).Range.Start, ActiveDocument.Paragraphs(tpar).Range.End)


    With whole_slide.Find
            .ClearFormatting
            .Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, _
                Format:=False, Wrap:=wdFindStop



'if we find it, tell us what we found and set found to true
            If .found = True Then

                whole_slide.Find.Execute FindText:="^#^#^#^#"
                slidenum = whole_slide
                continue = True
                found = True
                'MsgBox whole_slide

               
             
               ' MsgBox slidenum

            End If

    End With



'if found is still not true, find the second pattern: 4 digits, followed by a letter, followed by a colon followed by a line break
    If found = False Then

        With whole_slide.Find
            .ClearFormatting
            .Execute FindText:="^#^#^#^#^$" & Chr(32) & ":" & Chr(13), Forward:=True, _
                Format:=False, Wrap:=wdFindStop



 'if we find it, tell us what we found and set found to true.  Also, parse found data and set slidenum to actual slide num
         If .found = True Then

                wholse_slide.Find.Execute FindText:="^#^#^#^#^$"
                slidenum = whole_slide
                continue = True
                found = True
                'MsgBox slidenum

        End If
        
        End With

    End If
    
End If
    


If found = False Then
    boolvar = False
    Exit Sub
End If


If (continue = True And boolvar = True) Then


                'Assign sentence after slide number to slide_s1 var

                                slide_s1 = whole_slide.Next(Unit:=wdSentence, count:=1)


                'Assign second sentence to slide_s2 var
                                slide_s2 = whole_slide.Next(Unit:=wdSentence, count:=2)

                'Check if slide_s2 winds up being part of the next slide.  If so, set it to null.
                          
                           If (slide_s2 Like "*####*" = True) Then
                                slide_s2 = ""
                            End If
                                    
                    

                'Decide if slide has one or two sentences
                
                            If Len(slide_s1) <> 0 And Len(slide_s2) < 1 Then

                                'replace text and move whole_slide selection to end of slide

                                        whole_slide.MoveEnd wdParagraph, 2



                                        With whole_doc.Find

                                            .Text = whole_slide
                                            .ClearFormatting
                                            .Replacement.Text = Chr(13) & Chr(13) & slidenum & " : " & staticTC & Chr(11) & static_next_line & Chr(11) & "C2N03 " & slidenum & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1
                                            .Execute Replace:=wdReplaceAll


                                        End With

                            ElseIf Len(slide_s1) <> 0 And Len(slide_s2) <> 0 Then


                                 'truncate line breaks at end of second sentence
                                           slide_s2 = Replace(slide_s2, Chr(13), "")
                                           slide_s2 = Left$(slide_s2, Len(slide_s2) - 1)

                                 'replace text and move whole_slide selection to end of slide

                                 whole_slide.MoveEnd wdParagraph, 3

                                        With whole_doc.Find

                                            .Text = whole_slide
                                            .ClearFormatting
                                            .Replacement.Text = Chr(13) & Chr(13) & slidenum & " : " & staticTC & Chr(11) & static_next_line & Chr(11) & "C2N03 " & slidenum & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1 & "C2N03  " & slide_s2
                                           .Execute Replace:=wdReplaceAll

                                        End With

                                    End If
                                 
   End If

 Loop
                            
        
End Sub

Open in new window

Amazon Web Services

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

Top Expert 2014

Commented:
You talk about "slides", but your sample document only has a table.  Please clarify.
ste5anSenior Developer

Commented:
Another important thing:

Every VBA code module should/must start with Option Explicit. This enforces the explicit declaration of variables. Otherwise VBA declares them implicitly when used. Or rephrased, without you can have typos in your variable names, which go undetected and then you use the wrong values in the end. You have misspelled slide_s1.

And generally:
It's hard for a beginner to structure code right, But I would tackle this first. You should separate the searching from the formatting. It should look like this:

Option Explicit

Public Sub ProcessDocument()
  
  Selection.WholeStory
  Selection.Find.ClearFormatting
  Selection.Find.Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, Format:=False, Wrap:=wdFindStop
  Do While Selection.Find.found
    ProcessRange Selection.Range
    Selection.Find.Execute
  Loop
 
End Sub

Private Sub ProcessRange(ByVal CMatch As Word.Range)

  Const STATIC_TC As String = "--:--:--:--,--:--:--:--,10"
  Const STATIC_NEXT_LINE As String = "80 80 80"
   
  Dim nulls2 As Boolean
  Dim slide_s1 As String
  Dim slide_s2 As String
    
  nulls2 = False
  slide_s1 = ""
  slide_s2 = ""

  'Assign sentence after slide number to slide_s1 var
  slide_s1 = CMatch.Next(Unit:=wdSentence, Count:=1)
  'if there are elipses, it's a special case we need to account for
  If slide_s1 = " ..." Then
    slide_s1 = slide_s1 & CMatch.Next(Unit:=wdSentence, Count:=2)
  End If

  'If there are two spaces, it's a special case we need to account for
  If Right(slide_s1, 2) = Chr(32) & Chr(32) Then
    slide_s1 = slide_s1 & CMatch.Next(Unit:=wdSentence, Count:=2)
    nulls2 = True
  End If

  'Assign second sentence to slide_s2 var
  slide_s2 = CMatch.Next(Unit:=wdSentence, Count:=2)
  'Check if slide_s2 winds up being part of the next slide or if nulls2 is valid.  If so, set it to null.
  If (slide_s2 Like "*####*" = True Or nulls2 = True) Then
    slide_s2 = ""
  End If

  CMatch.Text = "<" & Replace(CMatch.Text, vbCr, "") & ">" & vbCr

  'Decide if slide has one or two sentences
  If Len(slide_s1) > 0 And Len(slide_s2) < 1 Then
    'replace text and move whole_slide selection to end of slide
    CMatch.MoveEnd wdParagraph, 2
    CMatch.Find.Text = CMatch
    CMatch.Find.ClearFormatting
    CMatch.Find.Replacement.Text = Chr(13) & Chr(13) & CMatch & " : " & STATIC_TC & Chr(11) & STATIC_NEXT_LINE & Chr(11) & "C2N03 " & CMatch & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1
    CMatch.Find.Execute Replace:=wdReplaceAll
  ElseIf Len(slide_s1) > 0 And Len(slide_s2) > 0 Then
    'truncate line breaks at end of second sentence
    slide_s2 = Replace(slide_s2, Chr(13), "")
    slide_s2 = Left$(slide_s2, Len(slide_s2) - 1)
    'replace text and move whole_slide selection to end of slide
    CMatch.MoveEnd wdParagraph, 3
    CMatch.Find.Text = CMatch
    CMatch.Find.ClearFormatting
    CMatch.Find.Replacement.Text = Chr(13) & Chr(13) & CMatch & " : " & STATIC_TC & Chr(11) & STATIC_NEXT_LINE & Chr(11) & "C2N03 " & CMatch & " : " & Chr(11) & "C2N03" & Chr(32) & Chr(32) & slide_s1 & "C2N03  " & slide_s2
    CMatch.Find.Execute Replace:=wdReplaceAll
  End If
    
End Sub

Open in new window

It's not exactly what you code does, you need to adjust the internals.

Author

Commented:
While I understand organization is important, I haven't really started getting into multiple functions so obviously this is both a great help and a bit advanced.  That being said, do you think it will speed things up or just make it easier to read?

What does this do exactly? ByVal CMatch As Word.Range
ste5anSenior Developer

Commented:
In this case:

It should speed up things, cause it only runs once over the entire doc. So it should be faster,

Line 15 defines a parameter named CMatch of the type Range. This contains the actual/current hit:

Option Explicit

Public Sub ProcessDocument()
  
  Selection.WholeStory
  Selection.Find.ClearFormatting
  Selection.Find.Execute FindText:="^#^#^#^#" & Chr(32) & ":" & Chr(13), Forward:=True, Format:=False, Wrap:=wdFindStop
  Do While Selection.Find.found
    ProcessRange Selection.Range
    Selection.Find.Execute
  Loop
 
End Sub

Private Sub ProcessRange(ByVal CMatch As Word.Range)

  Debug.Print "Match: '" & CMatch.Text & "'."
    
End Sub

Open in new window

You can see the output of Debug.Print in the immediate window (CTRL-G, when not open).

Author

Commented:
I see.  I guess I had to do some studying.  A simple copy and paste of the code shows some errors and now I have to figure out how to debug this as it's complicated.

Author

Commented:
Does this account for the cases where the slide number is four digits and a letter?

Author

Commented:
Can you explain the necessity of this:

CMatch.Text = "<" & Replace(CMatch.Text, vbCr, "") & ">" & vbCr

I can see what it does however I don't see why when I delete it the code just hangs.  TO me, it's pointless as I can't have those brackets around the slide numbers anyway.
Consultant and developer
Commented:
The main thing that causes a piece of Word VBA to run slowly is accessing built-in objects like the "Selection" object, using the "Find" routine, using "Move" commands etc, or manipulating text directly in a Word range.

With this in mind, I've rebuilt your code below to minimise the use of the Selection object, and routines Find, Find/replace, Move, Next, etc and to perform text manipulation in VBA string variables which, once complete, are then fed back into the document.

Declaring variables by type and using Option Explicit, as Ste5an say, is always a good idea but will not actually speed up your code very much, I suspect.

I took the following approach:
  • You only need to search for the four numbers that mark a slide start once per slide - since these four numbers always occur after a carriage return you can reliably find the start of the slide by searching for "^p^#^#^#^#"
  • You can then set a range that is a little larger than the search result and check directly whether there is an extra character after the four digits (to handle, eg "0004A")
  • By passing the text from this range to a string, you can extract the desired slide number without performing the further Word search you were undertaking
  • There is no need to discover whether the slide has one sentence or two, I think. The code below simply captures the start and end point of each slide, passes the text to a string variable, inserts the time code and your other marker code as required, and then replaces all the text between the start and end point of the slide with the new text from the manipulated string
  • This avoids using the "find/replace" command that you were using, which also slows up Word

So in summary the code below loops through the document but only uses the find or find/replace command ONCE per slide, to find the start. All manipulation occurs in string variables, and the results are then put directly back into Word ranges.

There are a couple of key things --
  1. the code works by finding a new slide number and then processing the PREVIOUS slide -- because finding a new slide automatically gives you the end point of the previous one, and means you can skip all that code that tries to figure out how many sentences there are. This code will also work with unlimited numbers of sentences, rather than just two.
  2. The code needs to avoid trying to process any text before the first slide. It does this by initially setting the range "rSlide" to start at character zero. An "if" clause then skips processing if rSlide.start=0.
  3. The code needs to handle the final slide correctly. When no further slide numbers are found, therefore, it needs to use the end of the text (in this case the end of the table cell you are using) as the end point of the final slide. It does this in the "Else" part of the "If .found" clause.
  4. Word's "range" objects are notoriously tricky and counter-intuitive to use.  I've used three: one for the document range to search through (rDoc), one for the slide to process (rSlide), and one for extracting the next slide number (rNext). For these to work properly it's important to define them separately - ie not "set" one range to equal another. And for the "range.find" loop to work as desire, you cannot manually tinker with the start and end properties of the range you are searching through. Weirdly, the .execute command does temporarily change the start and end points of the rDoc range, which I've exploited to adjust rNext, but as soon as you "execute" another search on rDoc it reverts to searching the full original range.

As Ste5an says, it's a good idea to do the manipulation in a separate subroutine, for clarity, but the way I have built the code below does not make it obligatory.

Lastly I change a lot of the variable names to make the code easier to read. So all range variables now start with an "r", all strings with an "s", booleans with a "b", integers with an "i" etc. I removed all the variables that were not in use, and changed the fixed elements to constants at the top, including the C2N03 marker code.


Sub RemoveParaOrLineBreaks()

'use constants here instead of variables
Const staticTC = "--:--:--:--,--:--:--:--,10"
Const static_next_line = "80 80 80"
'added this
Const marker_code = "C2N03"

Dim bStop As Boolean
'variant for previous and next slide number
Dim vSlide As Variant, vNext As Variant
'we need 3 range variables:
Dim rSlide As Range, rNext As Range, rDoc As Range
'start and end long integer markers for rSlide range:
Dim iStart As Long, iEnd As Long
'length of slide number line:
Dim iPos As Long
'variable to hold replacement text:
Dim sText As String

'initial setup
Set rDoc = ActiveDocument.Range
'create rSlide initially as zero size. NB do NOT initialise both rNext and rSlide to the whole document range as well, since
'this non-obviously makes them both the same range, and changing one will change the other

Set rSlide = ActiveDocument.Range(0, 0)

'set up the find parameters before you loop through the document, so you don't repeat unnecessary setup on each loop:
    With rDoc.Find
    .ClearFormatting
    .Format = False
    .Wrap = wdFindStop
    .Forward = True
    .MatchWildcards = False
    End With

Do
'find the next (or first) slide, using carriage-return number number number number
    With rDoc.Find
            .Execute FindText:="^p^#^#^#^#"
            If .found Then
                'set end of previous slide range
                rSlide.End = rDoc.Start - 1
                'set range of rNext to the temporary values for rDoc that result from the "find", and then extend to include 4 more characters
                Set rNext = ActiveDocument.Range(rDoc.Start, rDoc.End)
                rNext.End = rNext.End + 4
               'grab the resulting text to variable vNext
                vNext = rNext.Text
                    'check the sixth character and include it as part of the slide number if it's not a space, using VBA string function Mid
                    If Mid(vNext, 6, 1) <> Chr(32) Then
                    vNext = Mid(vNext, 2, 5) 
                    Else
                    'otherwise just use the first four characters (omitting the carriage return at the start)
                    vNext = Mid(vNext, 2, 4)
                    End If
            'if no further slide numbers found, then exit at end of loop
            Else
            bStop = True
            'set final slide range.. assuming text is in a table, as in your example, this needs to be the end of the table cell
            'if text is not in a table then use activedocument.range.end-1
            rSlide.End = rSlide.Cells(1).Range.End - 2
            End If
    End With
'process the previous slide -
'this eliminates all the stuff with deciding how many sentences there are etc, because we are
'simply replacing the previous slide's range with new text

    If rSlide.Start > 0 Then 'check start >0 so we don't try to process anything before slide 1
    
    'work where possible with a string variable rather than a document range. MUCH faster
    sText = rSlide.Text
    
    'cut the initial number line altogether by finding a colon and carriage return
    iPos = InStr(sText, ":" & Chr(13))
    sText = Mid(sText, iPos + 1) ' returns everything to the right of iPos+1
    
    'remove unwanted double carriage returns
    While InStr(sText, Chr(13) & Chr(13))
    sText = Replace(sText, Chr(13) & Chr(13), Chr(13))
    Wend
    
    'cut any final CR
    If Right(sText, 1) = Chr(13) Then sText = Left(sText, Len(sText) - 1)
    
    'add marker code at each text line break - replacing need to count lines etc
    sText = Replace(sText, Chr(13), Chr(11) & marker_code & "  ")
    
    'assemble full text, including the sText we have already built
    sText = vSlide & " : " & staticTC & Chr(11) & static_next_line & Chr(11) & marker_code & " " & _
            vSlide & " : " & sText & Chr(13) & Chr(13)
            
            
    'replace the text in the pre-defined range rSlide. Much quicker than using the built-in Word find and replace
    rSlide.Text = sText
    End If

'set new start point for previous slide's range
rSlide.Start = rSlide.End + 1
'make vNext into vSlide
vSlide = vNext
    
Loop Until bStop
        
End Sub

Open in new window


Hope this helps

Author

Commented:
Neil, this was really helpful.  While ste5an re-wrote the code in what I'm being told is an optimized way, your version is much easier to understand, especially with the comments.  I really appreciate it!!!

I'm curious why you chose ^p instead of Chr(32) for a carriage return in the .execute FindText section?

Also, let me know if I should ask a new question with this, however while I was doing some troubleshooting in my code, I found I needed to add some corner case scenarios as follows:

'if there are elipses, it's a special case we need to account for
  If slide_s1 = " ..." Then
    slide_s1 = slide_s1 & CMatch.Next(Unit:=wdSentence, count:=2)
  End If

  'If there are two spaces, it's a special case we need to account for
  If Right(slide_s1, 2) = Chr(32) & Chr(32) Then
    slide_s1 = slide_s1 & CMatch.Next(Unit:=wdSentence, count:=2)
    nulls2 = True
  End If

Open in new window


Since, in your version, no parsing of sentences are occurring, I presume these are totally unnecessary?
Neil FlemingConsultant and developer
Chr(32) returns a space (i think you meant chr(13)) in VBA but won't work in a Word find-replace string.

Just as you needed "^#" for a digit in the string, so you need "^p" for a carriage return in the find-replace syntax.

And yes, by treating the whole of the slide text in that way, there is no need to figure out the number of sentences, or check for double spaces etc.

The thing that should speed it up most is avoiding the Selection object and built-in Word functions like "Find", "Move" etc.

The other thing you might try is to turn off screen updating while the routine runs. That will reduce the time overhead caused by Word constantly moving text around, which may be significant if the document is 50 pages long.

The following two routines turn off the screen and any alert, and then turn them back on and refresh the screen.

Sub allOff()
Application.ScreenUpdating = False
Application.DisplayAlerts = False
End Sub

Sub allOn()
Application.ScreenUpdating = True
Application.DisplayAlerts = True
Application.ScreenRefresh
End Sub

Open in new window


So you could add "allOff" at the start of your routine and "allOn" at the end. But I would advise adding an error trap routine that calls allOn in the event of your code breaking.

Thus:

Sub RemoveParaOrLineBreaks()

'constant and variable declarations here...

'trap errors:
On Error GoTo errortrap
'turn off the screen
alloff

'code here:
'xxxxxxx
'end of code

'turn screen back on
allOn
'exit routine if all ok
Exit Sub

'handle error if not ok
errortrap:
allon
MsgBox ("An error has occurred") 'or whatever..
End Sub

Open in new window

Author

Commented:
Yes, I did mean chr(13).  

I'll experiment with the updating aspect however since using your code, we're seeing around a 150% increase in speed which is much more in line with the target I was looking for so users wouldn't become impatient.  

To your original post, I do agree that the range objects are tricky and I appreciate you breaking them down for me.

Another question about the code your wrote:

rslide.End  = rdoc.start -1

Minus 1 what, exactly?  I'm used to setting units like sentences or words.
Neil FlemingConsultant and developer
the .start and .end properties of a range are simply the character numbers at which the range starts and ends. My code is manually adjusting those directly, rather than using ".move" etc, which seems to run much faster.

I just built a dummy version of your document with 16 pages in it, and embedded a millisecond timer in it. Seems to be 11% faster with the screen off.

Screen updating  on throughout   : 9.102 secs.
Screen updating off                         : 7.213 secs.

Author

Commented:
Wow.  That might be worth it and save us another 10-15 seconds on our 50page test script.  Thanks so much for walking me through this and being patient.  If you're interested, I posted another question related to same code that I'd love for you to look at.
Neil FlemingConsultant and developer
Just to clarify:
rslide.End  = rdoc.start -1

sets the end of the slide range, on each loop, to the character before the start of the "found range" in rdoc. since rslide.start has already been set, this makes a range that exactly covers the characters in the slide which precedes the range which  results from the "find" command.
Neil FlemingConsultant and developer
I posted this on your other question too, but it really belongs here:  The thing that speeds the code up more than ANYTHING, I discover, is eliminating the Word table in which you currently have the text. I converted the table to text and enlarged the file to 50 pages.

Then I I tweaked the code to handle both a table and ordinary word text.

If rSlide.Tables.Count = 0 Then
                rSlide.End = ActiveDocument.Range.End - 1
                Else
                rSlide.End = rSlide.Cells(1).Range.End - 2
                End If

Open in new window

A 50 page document with no table now processes in 7.93 seconds on my laptop. This is probably because we are saving the time overhead cased by Word adjusting the table size, even with screen updates turned off.

An "unprocessed" sample file, with no table, is attached to this comment. Also with code module in it.

I'd suggest converting the slides in non-table form. If you really need a table, then you can either accept the slower processing time, or write a short routine to convert the table to text, process the text, and then convert back to a table.
LA-FILLE-DU-REGIMENT-NF.docm

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial