Community Pick: Many members of our community have endorsed this article.
Editor's Choice: This article has been selected by our editors as an exceptional contribution.

Using Regular Expressions in Visual Basic for Applications and Visual Basic 6

Patrick Matthews
CERTIFIED EXPERT
Published:
Updated:
Using Regular Expressions in VBA and VB6 Projects

By Patrick Matthews (matthewspatrick, https://www.experts-exchange.com/M_2428436.html)

Introduction

Regular Expressions (RegExp) is a class that offers incredibly powerful text parsing capabilities, in particular allowing users to find (and even replace) substrings within larger text entries that fit a particular pattern.  This article provides basic information on the VBScript implementation of RegExp, and how you can use it to gain robust text parsing capabilities in Visual Basic 6 (VB6) and Visual Basic for Applications (VBA) projects.  With a little knowledge of the rules and some imagination, you can build incredibly flexible and/or complex patterns.

While this article may be most useful to users with intermediate or advanced programming skills, users of all skill levels (beginners included) can benefit from using RegExp by implementing the "wrapper" functions included in this article.  (This article includes a sample Excel workbook that provides examples for each function.)  By adding these functions to your VB6/VBA projects, you can quickly and easily incorporate RegExp functionality into your applications.  If you are a Microsoft Office user, you can even use these functions in worksheet formulas for Microsoft Excel or in queries in Microsoft Access.  

These new functions are:
RegExpFind (Returns information for a single match from the input text, or an array of all the matches.  Info returned can be the match Value, starting character position, or Length for a single match, or for all matches)
RegExpFindExtended (Returns a 2-D array of the Value, starting character position, or Length)
RegExpFindSubmatch (Returns a 2-D array of submatches for the various matches in the input text)
RegExpReplace (Returns a string in which either all matches are replaced with the replacement text, or just the first match)
RegExpReplaceExpression (Excel only, allows you to build an expression using arithmetic operators and/or Excel functions to manipulate the replacement text)
RegExpReplaceRange (Returns a string in which the Xth to Yth matches are replaced)

Quick Examples Using RegExp

The following two examples briefly illustrate the usefulness of RegExp, and also you might use some of the included wrapper functions to address each problem.

Consider the following text:

"He hit the ball.  Then he ran.  The crowd was cheering!  How did he feel?  I felt so energized!"

Suppose you needed to extract the first word from each sentence.  You could use a pattern such as:

"(^|([\.!?] +))([a-z]+)"

Open in new window


That pattern matches the following:
The beginning of the string OR (period, exclamation point, or question mark followed by 1+ spaces) {This whole item taken together is the first submatch; the period, exclamation point, or question mark followed by 1+ spaces is the 2nd submatch}
Then, 1+ characters from a-z {3rd submatch}

Using one of the wrapper functions, this would return an array of the first words in each sentence:

RegExpFindSubmatch(MyText, "(^|([\.!?] +))([a-z]+)", , 3, False)
                      Returns "He", "Then", "The", "How", "I"

Open in new window


If you wanted to extract the last word in each sentence, you could use this pattern instead:

"[a-z]+(?=[\.!\?])"

Open in new window


That pattern matches the following:
1+ characters from a-z
Then,  a "look ahead" to see if it is followed by a period, exclamation point, or question mark.  In a look ahead, RegExp checks to see whether the look ahead condition is met, but does not include the look ahead in the match that gets returned

Using a wrapper function, this returns an array of the last word in each sentence:

RegExpFind(MyText, "[a-z]+(?=[\.!\?])", , False)
                      Returns "ball", "ran", "cheering", "feel", "energized"

Open in new window


For another example (based on EE question https://www.experts-exchange.com/questions/24367818/Microsoft-Access-Extract-Domain-Name-from-Hyperlink-or-text-field.html), suppose you had the following log entries:

http://www.abc.com
www.xyz.com
123.mno.net
http://www.abc.com/category
www.abc.net/category/sub-category/index.html
xyz.info/index.php
www.abc.com:8082/

Suppose that, for each entry, you had to extract the site address (e.g., www.abc.com).  You could use the following pattern, replacing all text matching that pattern with a zero-length string:

"http://|[:/].*"

Open in new window


That pattern matches the following:
http:// OR
(colon or forward slash) followed by zero or more characters of any type

If you were looping through an array of log entries, you could use this to extract the address:

RegExpReplace(LogArr(i), "http://|[:/].*", "", True, False)

Open in new window


If instead you needed to extract just the top-level domain (e.g., com, net, info), you could use this pattern instead:

"(http://)?([\w-]+\.)([\w-]+\.)?([\w-]+)([:/].*)?"

Open in new window


That pattern matches the following:
http:// appearing zero or one time {1st submatch}
Then, a word character or dash appearing 1+ times followed by a period {2nd submatch}
Then, a word character or dash appearing 1+ times followed by a period, the whole block appearing zero or one time {3rd submatch}
Then, a word character or dash appearing 1+ times {4th submatch}
Then, a colon or forward slash followed by zero or more characters of any type {5th submatch}

In this case, the 4th submatch will be the top-level domain.  To get it, you could use one of two expressions:

RegExpReplace(LogArr(i), "(http://)?([\w-]+\.)([\w-]+\.)?([\w-]+)([:/].*)?", "$4", True, False)
                      
                      or
                      
                      =RegExpFindSubmatch(LogArr(i), "(http://)?([\w-]+\.)([\w-]+\.)?([\w-]+)([:/].*)?", 1, 4, False, False)

Open in new window


If instead this log were in an Access table, you could use these wrapper functions in a query such as this:

SELECT LogEntry, 
                          RegExpReplace(LogArr(i), "http://|[:/].*", "", True, False) AS Address,
                          RegExpReplace(LogArr(i), "(http://)?([\w-]+\.)([\w-]+\.)?([\w-]+)([:/].*)?", 
                              "$4", True, False) AS TopLevel
                      FROM LogTable

Open in new window


The VBScript RegExp Class

Regular Expressions may be most familiar to users of the Perl programming language, but many other languages incorporate similar functionality.  This article concerns how to use the RegExp class implemented in VBScript 5.5 to enhance your text parsing capabilities in your Visual Basic for Applications (and thus Microsoft Excel, Access, or any other application that supports VBA) and Visual Basic 6 projects.  Because this article focuses on the specific VBScript implementation, there may be some differences in syntax and/or features when compared to other Regular Expressions implementations.  (For example, the VBScript implementation does not support the "look behind" capabilities in Perl and the .Net framework.)

 RegExp Object Model
To use RegExp in your VB6 or VBA projects, you must instantiate a RegExp object.  If you are using early binding, then you would add a reference to the "Microsoft VBScript Regular Expressions 5.5" library.  (Do not use the version 1.0 library, as it lacks many of the best features of RegExp and is provided solely for backward compatibility.)  You would then declare the variable in your code this way:

Dim RegX As VBScript_RegExp_55.RegExp
                      Set RegX = New VBScript_RegExp_55.RegExp

Open in new window


Using late binding, you would do it this way:

Dim RegX As Object
                      Set RegX = CreateObject("VBScript.RegExp")

Open in new window


Once you have the object instantiated, you will have to set certain properties of the RegExp class:
Global: This determines whether a RegExp method will try to find only the first match (False), or all matches (True).  The default value is False.
IgnoreCase: This determines whether the pattern match is case-sensitive; the default is False (thus making pattern matching case-sensitive by default).  If IgnoreCase is True, then a pattern of [a-z] or [A-Z] is equivalent to a pattern of [a-zA-Z].
Multiline: Indicates how RegExp should handle input text with line breaks.  In particular, it determines how the ^ (start of input) and $ (end of input) behave: if False, then these match only the very beginning and very end of the entire input string.  If True, then ^ and $ match the beginning and end of each line in the input string.  The default value is False.
Pattern: This is the pattern that RegExp tries to match.  Patterns can be relatively simple, or highly complex.  For example, the pattern

circus
will look only for the six consecutive characters "circus" (and, if IgnoreCase = False, then those characters would have to be lower case to make a match).  On the other hand:

circus(es)?
will match "circus" or "circuses".  Further, the following pattern matches circus, circuses, pony, or ponies:

circus(es)?|pon(y|ies)

Please see the Quick Primer on Pattern Syntax for a few simple tips on Pattern syntax to get you started.  The functionality is incredibly rich, however, and for fuller treatments of the subject you should look in the Further Reading section of this article for web sites and books that provide excellent information on the RegExp Pattern syntax.

After you have set the RegExp properties to appropriate values, you then have to decide which method to use.  The RegExp class exposes three methods:
Execute: This method returns a Matches collection (of Match objects), containing the various matches to the specified pattern found in the input string (which is passed to the Execute method as an argument).  The Global property determines whether the collection contains all matches (True) or just the first match (False).
Replace: This method takes two arguments.  The first argument is the string to be searched for pattern matches, and the second is the replacement text to be used if a match is found.  The Global property determines whether RegExp replaces all matches (True) or just the first match (False).  The Replace method returns a string that includes all applicable replacements.  (Note that if there are no matches found, then the output and the original input are identical.)  To incorporate the entire match into the replacement text, use $& to stand in for the match, and to use a submatch in the replacement text, use $1, $2, $3, etc. to use the nth submatch (or $+ for the last submatch).  You can also use $` for the text prior to the match, $' for the text after the match, and $_ for the entire input string
Test: This method takes a single argument, the input string, and returns a Boolean True/False to indicate whether there is at least one match in the input string.

If you need to work with the various matches, then you work through the Matches collection and the various Match objects it contains.  The Matches collection has the following properties:
Count: This is the number of Match objects contained in the collection
Item: This returns the indicated Match object based on an index (the argument for this property).  Please note that the first item takes the index number zero, and thus the index for the last item is one less than the Count.  The Item property is the default property for the Matches collection, so MyMatches(0) is equivalent to MyMatches.Item(0).

For example, this code enumerates the Matches:

Dim RegX As Object, Mats As Object, Counter As Long
                      
                      Set RegX = CreateObject("VBScript.RegExp")
                      With RegX
                          .Global = True
                          .Pattern = "\d+"
                          Set Mats = .Execute("1 turtle, 15 rabbits, and 1078 swans")
                      End With
                      
                      For Counter = 0 To Mats.Count - 1
                          MsgBox Mats(Counter)
                      Next
                      
                      Set Mats = Nothing
                      Set RegX = Nothing

Open in new window


That code will present three MsgBoxes, showing 1, 15, and 1078 respectively.

To work further with the individual Match objects, you can use the following properties:
FirstIndex: The starting position of the Match value within the original input string.  Please be aware that this property treats the first character in the input string as position zero, whereas VB6 and VBA (as well as Access and Excel) typically treat the first character as position 1.  The sample functions I provide here correct for this.
Length: The number of characters in the Match value
Submatches: This returns a Submatches collection composed of the various Submatch objects found within the Match.
Value: The string of the Match itself.  Value is the Match object's default property, so in the code example above Mats(Counter) is equivalent to Mats(Counter).Value.

Drilling down even further, a Match may have Submatches.  A Submatch is a portion of a Match that corresponds to part of the Pattern bracketed by parentheses.  For example, you could use a pattern such as this to match IP addresses in an input string:

(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})

Open in new window


Suppose that the input string were:

      
I thought her IP address was 100.200.300.400, but it was really 99.99.99.99

There would be two matches, 100.200.300.400 and 99.99.99.99.  In addition, each match would have four submatches; in the first case these submatches are 100, 200, 300, and 400.  These submatches correspond to the portions of the pattern enclosed in parentheses.

The Submatches collection has the following properties:
Count: This is the number of Submatch objects contained in the collection.  There will always be as many Submatch objects as there are groups in the pattern enclosed in parentheses.
Item: This returns the indicated Submatch object based on an index (the argument for this property).  Please note that the first item takes the index number zero, and thus the index for the last item is one less than the Count.  The Item property is the default property for the Submatches collection, so MyMatch.Submatches(0) is equivalent to MyMatch.Submatches.Item(0).

The Submatch object has a single property, Value.  This Value is simply the text comprising the Submatch.  If there is no match found for the indicated group in the pattern, then the Submatch value is a zero-length string.

Quick Primer on Pattern Syntax

This section includes some basic pointers on Pattern syntax to help you get started.  It is, however, by no means complete, and for more detailed treatments of the subject you should review the Web sites and books in the Further Reading section of this article.

Keep in mind that in all cases, whether a match is case sensitive or not depends on the setting for RegExp.IgnoreCase.

Symbol		Notes
                      --------------------------------------------------------------------------------------------
                      abc, ABC	Matches "abc" or "ABC"
                      [abc]		Matches any character within the brackets
                      [^abc]		Matches any character not within the brackets
                      [a-n], [0-5]	Matches any character within the range of letters or digits in brackets
                      [^a-n], [^0-5]	Matches any character not within the range of letters or digits in brackets
                      .		Matches any single character except a new line character
                      A|B		Matches "A" or "B"
                      ()		Creates a subgroup within the pattern
                      {x}		Matches the preceding element exactly x times
                      {x,y}		Matches the preceding element between x and y times (greedy)
                      {x,y}?		Matches the preceding element between x and y times (non-greedy)
                      {x,}		Matches the preceding element at least x times (greedy)
                      {x,}?		Matches the preceding element at least x times (non-greedy)
                      {,y}		Matches the preceding element between zero and y times (greedy)
                      {,y}?		Matches the preceding element between zero and y times (non-greedy)
                      *		Matches preceding element zero or more times (greedy)
                      *?		Matches preceding element zero or more times (non-greedy)
                      ?		Matches preceding element zero or one times
                      +		Matches preceding element one or more times (greedy)
                      +?		Matches preceding element one or more times (non-greedy)
                      ^		Matches start of input (or start of line if RegExp.Multiline = True)
                      $		Matches end of input (or end of line if RegExp.Multiline = True)
                      \d		Matches a digit (same as [0-9])
                      \D		Matches a non-digit (same as [^0-9])
                      \b		Matches a word boundary
                      \B		Matches a non-word boundary
                      \f		Form feed
                      \n		New line
                      \r		Carriage return
                      \t		Tab
                      \v		Vertical tab
                      \s		Matches a "white space" character, same as [ \f\n\r\t\v]
                      \S		Matches a non-"white space" character, same as [^ \f\n\r\t\v]
                      \w		Matches a "word" character plus underscore, same as [A-Za-z0-9_]
                      \W		Matches a non-"word" character plus underscore, same as [^A-Za-z0-9_]
                      \		"Escapes" a special character.  
                      		For example:
                      		   To match the backslash, use \\
                      		   To match the $, use \$
                      		   To match the period, use \.
                      
                      x(?=y)		Positive lookahead
                      		   X is a pattern you want to match
                      		   Y is a pattern you want to see immediately follow the pattern you are 
                      			looking for, but is not included in the match
                      		Example: Suppose you have a part number that is always four digits, 
                      		   with an optional suffix a, b, c, etc.  You want to grab all matches 
                      		   that look like part numbers with suffixes, but you want to exclude 
                      		   the suffix from the match.  
                      		   You could use a pattern \d{4}(?=[a-z]).  
                      		   If the input string is "1234, 2345x, 3456", it matches "2345".
                      
                      x(?!y)		Negative lookahead
                      		   X is a pattern you want to match
                      		   Y is a pattern you do not want to see immediately follow the pattern 
                      		   	you are looking for, but is not included in the match
                      		Example: Suppose you have a part number that is always four digits, 
                      		   with an optional suffix a, b, c, etc.  You want to grab all matches that 
                      		   look like part numbers, but you want to exclude part numbers with suffixes.  
                      		   You could use a pattern \d{4}(?![a-z])
                      		   If the input string is "1234, 2345x, 3456", it matches "1234" and "3456".

Open in new window


Further Reading

For further reading, I recommend the following web sites and books:

For an excellent tutorial on RegExp, including information on the various language-specific implementations, go to: "Using Regular Expressions with Client Side Scripting", http://www.regular-expressions.info/vbscript.html

This site provides a very complete listing of the syntax for RegExp as implemented in VBScript and Jscript: "Regular Expression Syntax (Scripting)", http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx

For general information on VBScript RegExp: "Microsoft Beefs Up VBScript with Regular Expressions", http://msdn.microsoft.com/en-us/library/ms974570.aspx

Another site for general information: "An Introduction to Regular Expression with VBScript", http://www.4guysfromrolla.com/webtech/090199-1.shtml

I keep the following book at my desk for quick reference on RegExp and other VBScript matters: VBScript in a Nutshell, written by Lomax, Childs, and Petrusha.  Its descriptions of the various RegExp-related objects, properties, and methods are useful, but are a bit incomplete.  (For example, the book makes no mention of submatches or "look ahead".)

If you really want to master RegExp, then I suggest Mastering Regular Expressions, by Friedl.  That book is not for the faint-hearted, but if you really want to know about how RegExp works its magic, and how you can get the most out of RegExp, it is worth the effort.

For additional examples and practical applications, I also recommend the Regular Expressions Cookbook, by Goyvaerts and Levithan.

Acknowledgements

I would like to thank EE Member brettdj (https://www.experts-exchange.com/M_770818.html) for introducing me to the wonderful world of Regular Expressions, and for his generous assistance in reviewing this article before submission.  His very helpful comments helped improve the example workbook markedly.  webtubbs (https://www.experts-exchange.com/M_3470508.html) also reviewed this article before submission.  Any mistakes in this article are most certainly mine, and not theirs.

Code Samples

Below is source code for six "wrapper" user defined functions you may use and freely distribute, as long as you properly attribute authorship and where you found the code.  You will also find these functions in the included sample file "Regular Expressions Examples.xls".

Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
                          Optional MatchCase As Boolean = True, Optional ReturnType As Long = 0, _
                          Optional MultiLine As Boolean = False)
                          
                          ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
                          ' as long as you properly credit and attribute authorship and the URL of where you
                          ' found the code
                          
                          ' This function relies on the VBScript version of Regular Expressions, and thus some of
                          ' the functionality available in Perl and/or .Net may not be available.  The full extent
                          ' of what functionality will be available on any given computer is based on which version
                          ' of the VBScript runtime is installed on that computer
                          
                          ' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
                          ' pattern (PatternStr).  Use Pos to indicate which match you want:
                          ' Pos omitted               : function returns a zero-based array of all matches
                          ' Pos = 1                   : the first match
                          ' Pos = 2                   : the second match
                          ' Pos = <positive integer>  : the Nth match
                          ' Pos = 0                   : the last match
                          ' Pos = -1                  : the last match
                          ' Pos = -2                  : the 2nd to last match
                          ' Pos = <negative integer>  : the Nth to last match
                          ' If Pos is non-numeric, or if the absolute value of Pos is greater than the number of
                          ' matches, the function returns an empty string.  If no match is found, the function returns
                          ' an empty string.  (Earlier versions of this code used zero for the last match; this is
                          ' retained for backward compatibility)
                          
                          ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
                          ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
                          
                          ' ReturnType indicates what information you want to return:
                          ' ReturnType = 0            : the matched values
                          ' ReturnType = 1            : the starting character positions for the matched values
                          ' ReturnType = 2            : the lengths of the matched values
                          
                          ' If you use this function in Excel, you can use range references for any of the arguments.
                          ' If you use this in Excel and return the full array, make sure to set up the formula as an
                          ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
                          
                          ' Note: RegExp counts the character positions for the Match.FirstIndex property as starting
                          ' at zero.  Since VB6 and VBA has strings starting at position 1, I have added one to make
                          ' the character positions conform to VBA/VB6 expectations
                          
                          ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
                          ' where a large number of calls to this function are made, making RegX a static variable that
                          ' preserves its state in between calls significantly improves performance
                          
                          Static RegX As Object
                          Dim TheMatches As Object
                          Dim Answer()
                          Dim Counter As Long
                          
                          ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
                          
                          If Not IsMissing(Pos) Then
                              If Not IsNumeric(Pos) Then
                                  RegExpFind = ""
                                  Exit Function
                              Else
                                  Pos = CLng(Pos)
                              End If
                          End If
                          
                          ' Evaluate ReturnType
                          
                          If ReturnType < 0 Or ReturnType > 2 Then
                              RegExpFind = ""
                              Exit Function
                          End If
                          
                          ' Create instance of RegExp object if needed, and set properties
                          
                          If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
                          With RegX
                              .Pattern = PatternStr
                              .Global = True
                              .IgnoreCase = Not MatchCase
                              .MultiLine = MultiLine
                          End With
                              
                          ' Test to see if there are any matches
                          
                          If RegX.Test(LookIn) Then
                              
                              ' Run RegExp to get the matches, which are returned as a zero-based collection
                              
                              Set TheMatches = RegX.Execute(LookIn)
                              
                              ' Test to see if Pos is negative, which indicates the user wants the Nth to last
                              ' match.  If it is, then based on the number of matches convert Pos to a positive
                              ' number, or zero for the last match
                              
                              If Not IsMissing(Pos) Then
                                  If Pos < 0 Then
                                      If Pos = -1 Then
                                          Pos = 0
                                      Else
                                          
                                          ' If Abs(Pos) > number of matches, then the Nth to last match does not
                                          ' exist.  Return a zero-length string
                                          
                                          If Abs(Pos) <= TheMatches.Count Then
                                              Pos = TheMatches.Count + Pos + 1
                                          Else
                                              RegExpFind = ""
                                              GoTo Cleanup
                                          End If
                                      End If
                                  End If
                              End If
                              
                              ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
                              ' function's return value
                              
                              If IsMissing(Pos) Then
                                  ReDim Answer(0 To TheMatches.Count - 1)
                                  For Counter = 0 To UBound(Answer)
                                      Select Case ReturnType
                                          Case 0: Answer(Counter) = TheMatches(Counter)
                                          Case 1: Answer(Counter) = TheMatches(Counter).FirstIndex + 1
                                          Case 2: Answer(Counter) = TheMatches(Counter).Length
                                      End Select
                                  Next
                                  RegExpFind = Answer
                              
                              ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
                              
                              Else
                                  Select Case Pos
                                      Case 0                          ' Last match
                                          Select Case ReturnType
                                              Case 0: RegExpFind = TheMatches(TheMatches.Count - 1)
                                              Case 1: RegExpFind = TheMatches(TheMatches.Count - 1).FirstIndex + 1
                                              Case 2: RegExpFind = TheMatches(TheMatches.Count - 1).Length
                                          End Select
                                      Case 1 To TheMatches.Count      ' Nth match
                                          Select Case ReturnType
                                              Case 0: RegExpFind = TheMatches(Pos - 1)
                                              Case 1: RegExpFind = TheMatches(Pos - 1).FirstIndex + 1
                                              Case 2: RegExpFind = TheMatches(Pos - 1).Length
                                          End Select
                                      Case Else                       ' Invalid item number
                                          RegExpFind = ""
                                  End Select
                              End If
                          
                          ' If there are no matches, return empty string
                          
                          Else
                              RegExpFind = ""
                          End If
                          
                      Cleanup:
                          ' Release object variables
                          
                          Set TheMatches = Nothing
                          
                      End Function

Open in new window


Function RegExpFindExtended(LookIn As String, PatternStr As String, Optional Pos, _
                          Optional MatchCase As Boolean = True, Optional MultiLine As Boolean = False)
                          
                          ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
                          ' as long as you properly credit and attribute authorship and the URL of where you
                          ' found the code
                          
                          ' This function relies on the VBScript version of Regular Expressions, and thus some of
                          ' the functionality available in Perl and/or .Net may not be available.  The full extent
                          ' of what functionality will be available on any given computer is based on which version
                          ' of the VBScript runtime is installed on that computer
                          
                          ' This function uses Regular Expressions to parse a string (LookIn), and returns a 0-(N-1), 0-2
                          ' array of the matched values (position 0 for the 2nd dimension), the starting character
                          ' positions (position 1 for the 2nd dimension), and the length of the matched values (position 2
                          ' for the 2nd dimension)
                          
                          ' Use Pos to indicate which match you want:
                          ' Pos omitted               : function returns a zero-based array of all matches
                          ' Pos = 1                   : the first match
                          ' Pos = 2                   : the second match
                          ' Pos = <positive integer>  : the Nth match
                          ' Pos = 0                   : the last match
                          ' Pos = -1                  : the last match
                          ' Pos = -2                  : the 2nd to last match
                          ' Pos = <negative integer>  : the Nth to last match
                          ' If Pos is non-numeric, or if the absolute value of Pos is greater than the number of
                          ' matches, the function returns an empty string.  If no match is found, the function returns
                          ' an empty string.
                          
                          ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
                          ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
                          
                          ' If you use this function in Excel, you can use range references for any of the arguments.
                          ' If you use this in Excel and return the full array, make sure to set up the formula as an
                          ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
                          
                          ' Note: RegExp counts the character positions for the Match.FirstIndex property as starting
                          ' at zero.  Since VB6 and VBA has strings starting at position 1, I have added one to make
                          ' the character positions conform to VBA/VB6 expectations
                          
                          ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
                          ' where a large number of calls to this function are made, making RegX a static variable that
                          ' preserves its state in between calls significantly improves performance
                          
                          Static RegX As Object
                          Dim TheMatches As Object
                          Dim Answer()
                          Dim Counter As Long
                          
                          ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
                          
                          If Not IsMissing(Pos) Then
                              If Not IsNumeric(Pos) Then
                                  RegExpFindExtended = ""
                                  Exit Function
                              Else
                                  Pos = CLng(Pos)
                              End If
                          End If
                          
                          ' Create instance of RegExp object
                          
                          If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
                          With RegX
                              .Pattern = PatternStr
                              .Global = True
                              .IgnoreCase = Not MatchCase
                              .MultiLine = MultiLine
                          End With
                              
                          ' Test to see if there are any matches
                          
                          If RegX.Test(LookIn) Then
                              
                              ' Run RegExp to get the matches, which are returned as a zero-based collection
                              
                              Set TheMatches = RegX.Execute(LookIn)
                              
                              If Not IsMissing(Pos) Then
                                  If Pos < 0 Then
                                      If Pos = -1 Then
                                          Pos = 0
                                      Else
                                          
                                          ' If Abs(Pos) > number of matches, then the Nth to last match does not
                                          ' exist.  Return a zero-length string
                                          
                                          If Abs(Pos) <= TheMatches.Count Then
                                              Pos = TheMatches.Count + Pos + 1
                                          Else
                                              RegExpFindExtended = ""
                                              GoTo Cleanup
                                          End If
                                      End If
                                  End If
                              End If
                              
                              ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
                              ' function's return value
                              
                              If IsMissing(Pos) Then
                                  ReDim Answer(0 To TheMatches.Count - 1, 0 To 2)
                                  For Counter = 0 To UBound(Answer)
                                      Answer(Counter, 0) = TheMatches(Counter)
                                      Answer(Counter, 1) = TheMatches(Counter).FirstIndex + 1
                                      Answer(Counter, 2) = TheMatches(Counter).Length
                                  Next
                                  RegExpFindExtended = Answer
                              
                              ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
                              
                              Else
                                  Select Case Pos
                                      Case 0                          ' Last match
                                          ReDim Answer(0 To 0, 0 To 2)
                                          Answer(0, 0) = TheMatches(TheMatches.Count - 1)
                                          Answer(0, 1) = TheMatches(TheMatches.Count - 1).FirstIndex + 1
                                          Answer(0, 2) = TheMatches(TheMatches.Count - 1).Length
                                          RegExpFindExtended = Answer
                                      Case 1 To TheMatches.Count      ' Nth match
                                          ReDim Answer(0 To 0, 0 To 2)
                                          Answer(0, 0) = TheMatches(Pos - 1)
                                          Answer(0, 1) = TheMatches(Pos - 1).FirstIndex + 1
                                          Answer(0, 2) = TheMatches(Pos - 1).Length
                                          RegExpFindExtended = Answer
                                      Case Else                       ' Invalid item number
                                          RegExpFindExtended = ""
                                  End Select
                              End If
                          
                          ' If there are no matches, return empty string
                          
                          Else
                              RegExpFindExtended = ""
                          End If
                          
                      Cleanup:
                      
                          ' Release object variables
                          
                          Set TheMatches = Nothing
                          
                      End Function

Open in new window


Function RegExpFindSubmatch(LookIn As String, PatternStr As String, Optional MatchPos, _
                          Optional SubmatchPos, Optional MatchCase As Boolean = True, _
                          Optional MultiLine As Boolean = False)
                          
                          ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
                          ' as long as you properly credit and attribute authorship and the URL of where you
                          ' found the code
                          
                          ' This function relies on the VBScript version of Regular Expressions, and thus some of
                          ' the functionality available in Perl and/or .Net may not be available.  The full extent
                          ' of what functionality will be available on any given computer is based on which version
                          ' of the VBScript runtime is installed on that computer
                          
                          ' This function uses Regular Expressions to parse a string (LookIn), and return "submatches"
                          ' from the various matches to a pattern (PatternStr).  In RegExp, submatches within a pattern
                          ' are defined by grouping portions of the pattern within parentheses.
                          
                          ' Use MatchPos to indicate which match you want:
                          ' MatchPos omitted               : function returns results for all matches
                          ' MatchPos = 1                   : the first match
                          ' MatchPos = 2                   : the second match
                          ' MatchPos = <positive integer>  : the Nth match
                          ' MatchPos = 0                   : the last match
                          ' MatchPos = -1                  : the last match
                          ' MatchPos = -2                  : the 2nd to last match
                          ' MatchPos = <negative integer>  : the Nth to last match
                          
                          ' Use SubmatchPos to indicate which match you want:
                          ' SubmatchPos omitted               : function returns results for all submatches
                          ' SubmatchPos = 1                   : the first submatch
                          ' SubmatchPos = 2                   : the second submatch
                          ' SubmatchPos = <positive integer>  : the Nth submatch
                          ' SubmatchPos = 0                   : the last submatch
                          ' SubmatchPos = -1                  : the last submatch
                          ' SubmatchPos = -2                  : the 2nd to last submatch
                          ' SubmatchPos = <negative integer>  : the Nth to last submatch
                          
                          ' The return type for this function depends on whether your choice for MatchPos is looking for
                          ' a single value or for potentially many.  All arrays returned by this function are zero-based.
                          ' When the function returns a 2-D array, the first dimension is for the matches and the second
                          ' dimension is for the submatches
                          ' MatchPos omitted, SubmatchPos omitted: 2-D array of submatches for each match.  First dimension
                          '                                        based on number of matches (0 to N-1), second dimension
                          '                                        based on number of submatches (0 to N-1)
                          ' MatchPos omitted, SubmatchPos used   : 2-D array (0 to N-1, 0 to 0) of the specified submatch
                          '                                        from each match
                          ' MatchPos used, SubmatchPos omitted   : 2-D array (0 to 0, 0 to N-1) of the submatches from the
                          '                                        specified match
                          ' MatchPos used, SubmatchPos used      : String with specified submatch from specified match
                          
                          ' For any submatch that is not found, the function treats the result as a zero-length string
                          
                          ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
                          ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
                          
                          ' If you use this function in Excel, you can use range references for any of the arguments.
                          ' If you use this in Excel and return the full array, make sure to set up the formula as an
                          ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
                          
                          ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
                          ' where a large number of calls to this function are made, making RegX a static variable that
                          ' preserves its state in between calls significantly improves performance
                          
                          Static RegX As Object
                          Dim TheMatches As Object
                          Dim Mat As Object
                          Dim Answer() As String
                          Dim Counter As Long
                          Dim SubCounter As Long
                          
                          ' Evaluate MatchPos.  If it is there, it must be numeric and converted to Long
                          
                          If Not IsMissing(MatchPos) Then
                              If Not IsNumeric(MatchPos) Then
                                  RegExpFindSubmatch = ""
                                  Exit Function
                              Else
                                  MatchPos = CLng(MatchPos)
                              End If
                          End If
                          
                          ' Evaluate SubmatchPos.  If it is there, it must be numeric and converted to Long
                          
                          If Not IsMissing(SubmatchPos) Then
                              If Not IsNumeric(SubmatchPos) Then
                                  RegExpFindSubmatch = ""
                                  Exit Function
                              Else
                                  SubmatchPos = CLng(SubmatchPos)
                              End If
                          End If
                          
                          ' Create instance of RegExp object
                          
                          If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
                          With RegX
                              .Pattern = PatternStr
                              .Global = True
                              .IgnoreCase = Not MatchCase
                              .MultiLine = MultiLine
                          End With
                              
                          ' Test to see if there are any matches
                          
                          If RegX.Test(LookIn) Then
                              
                              ' Run RegExp to get the matches, which are returned as a zero-based collection
                              
                              Set TheMatches = RegX.Execute(LookIn)
                              
                              ' If MatchPos is missing, user either wants array of all the submatches for each match, or an
                              ' array of all the specified submatches for each match.  Build it and assign it as the
                              ' function's return value
                              
                              If IsMissing(MatchPos) Then
                                  
                                  ' Return value is a 2-D array of all the submatches for each match
                                  
                                  If IsMissing(SubmatchPos) Then
                                      For Counter = 0 To TheMatches.Count - 1
                                          Set Mat = TheMatches(Counter)
                                          
                                          ' To determine how many submatches there are we need to first evaluate a match.  That
                                          ' is why we redim the array inside the for/next loop
                                          
                                          If Counter = 0 Then
                                              ReDim Answer(0 To TheMatches.Count - 1, 0 To Mat.Submatches.Count - 1) As String
                                          End If
                                          
                                          ' Loop through the submatches and populate the array.  If the Nth submatch is not
                                          ' found, RegExp returns a zero-length string
                                          
                                          For SubCounter = 0 To UBound(Answer, 2)
                                              Answer(Counter, SubCounter) = Mat.Submatches(SubCounter)
                                          Next
                                      Next
                                  
                                  ' Return value is a 2-D array of the specified submatch for each match.
                                  
                                  Else
                                      For Counter = 0 To TheMatches.Count - 1
                                          Set Mat = TheMatches(Counter)
                                          
                                          ' To determine how many submatches there are we need to first evaluate a match.  That
                                          ' is why we redim the array inside the for/next loop.  If SubmatchPos = 0, then we want
                                          ' the last submatch.  In that case reset SubmatchPos so it equals the submatch count.
                                          ' Negative number indicates Nth to last; convert that to applicable "positive" position
                                          
                                          If Counter = 0 Then
                                              ReDim Answer(0 To TheMatches.Count - 1, 0 To 0) As String
                                              Select Case SubmatchPos
                                                  Case Is > 0: 'no adjustment needed
                                                  Case 0, -1: SubmatchPos = Mat.Submatches.Count
                                                  Case Is < -Mat.Submatches.Count: SubmatchPos = -SubmatchPos
                                                  Case Else: SubmatchPos = Mat.Submatches.Count + SubmatchPos + 1
                                              End Select
                                          End If
                                          
                                          ' Populate array with the submatch value.  If the submatch value is not found, or if
                                          ' SubmatchPos > the count of submatches, populate with a zero-length string
                                          
                                          If SubmatchPos <= Mat.Submatches.Count Then
                                              Answer(Counter, 0) = Mat.Submatches(SubmatchPos - 1)
                                          Else
                                              Answer(Counter, 0) = ""
                                          End If
                                      Next
                                  End If
                                  RegExpFindSubmatch = Answer
                                  
                              ' User wanted the info associated with the Nth match (or last match, if MatchPos = 0)
                              
                              Else
                                  
                                  ' If MatchPos = 0 then make MatchPos equal the match count.  If negative (indicates Nth
                                  ' to last), convert to equivalent position.
                                  
                                  Select Case MatchPos
                                      Case Is > 0: 'no adjustment needed
                                      Case 0, -1: MatchPos = TheMatches.Count
                                      Case Is < -TheMatches.Count: MatchPos = -MatchPos
                                      Case Else: MatchPos = TheMatches.Count + MatchPos + 1
                                  End Select
                                  
                                  ' As long as MatchPos does not exceed the match count, process the Nth match.  If the
                                  ' match count is exceeded, return a zero-length string
                                  
                                  If MatchPos <= TheMatches.Count Then
                                      Set Mat = TheMatches(MatchPos - 1)
                                      
                                      ' User wants a 2-D array of all submatches for the specified match; populate array.  If
                                      ' a particular submatch is not found, RegExp treats it as a zero-length string
                                      
                                      If IsMissing(SubmatchPos) Then
                                          ReDim Answer(0 To 0, 0 To Mat.Submatches.Count - 1)
                                          For SubCounter = 0 To UBound(Answer, 2)
                                              Answer(0, SubCounter) = Mat.Submatches(SubCounter)
                                          Next
                                          RegExpFindSubmatch = Answer
                                      
                                      ' User wants a single value
                                      
                                      Else
                                          
                                          ' If SubmatchPos = 0 then make it equal count of submatches.  If negative, this
                                          ' indicates Nth to last; convert to equivalent positive position
                                          
                                          Select Case SubmatchPos
                                              Case Is > 0: 'no adjustment needed
                                              Case 0, -1: SubmatchPos = Mat.Submatches.Count
                                              Case Is < -Mat.Submatches.Count: SubmatchPos = -SubmatchPos
                                              Case Else: SubmatchPos = Mat.Submatches.Count + SubmatchPos + 1
                                          End Select
                                          
                                          ' If SubmatchPos <= count of submatches, then get that submatch for the specified
                                          ' match.  If the submatch value is not found, or if SubmathPos exceeds count of
                                          ' submatches, return a zero-length string.  In testing, it appeared necessary to
                                          ' use CStr to coerce the return to be a zero-length string instead of zero
                                          
                                          If SubmatchPos <= Mat.Submatches.Count Then
                                              RegExpFindSubmatch = CStr(Mat.Submatches(SubmatchPos - 1))
                                          Else
                                              RegExpFindSubmatch = ""
                                          End If
                                      End If
                                  Else
                                      RegExpFindSubmatch = ""
                                  End If
                              End If
                          
                          ' If there are no matches, return empty string
                          
                          Else
                              RegExpFindSubmatch = ""
                          End If
                          
                      Cleanup:
                          ' Release object variables
                          Set Mat = Nothing
                          Set TheMatches = Nothing
                          
                      End Function

Open in new window


Function RegExpReplace(LookIn As String, PatternStr As String, Optional ReplaceWith As String = "", _
                          Optional ReplaceAll As Boolean = True, Optional MatchCase As Boolean = True, _
                          Optional MultiLine As Boolean = False)
                          
                          ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
                          ' as long as you properly credit and attribute authorship and the URL of where you
                          ' found the code
                          
                          ' This function relies on the VBScript version of Regular Expressions, and thus some of
                          ' the functionality available in Perl and/or .Net may not be available.  The full extent
                          ' of what functionality will be available on any given computer is based on which version
                          ' of the VBScript runtime is installed on that computer
                          
                          ' This function uses Regular Expressions to parse a string, and replace parts of the string
                          ' matching the specified pattern with another string.  The optional argument ReplaceAll
                          ' controls whether all instances of the matched string are replaced (True) or just the first
                          ' instance (False)
                          
                          ' If you need to replace the Nth match, or a range of matches, then use RegExpReplaceRange
                          ' instead
                          
                          ' By default, RegExp is case-sensitive in pattern-matching.  To keep this, omit MatchCase or
                          ' set it to True
                          
                          ' If you use this function from Excel, you may substitute range references for all the arguments
                          
                          ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
                          ' where a large number of calls to this function are made, making RegX a static variable that
                          ' preserves its state in between calls significantly improves performance
                          
                          Static RegX As Object
                          
                          If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
                          With RegX
                              .Pattern = PatternStr
                              .Global = ReplaceAll
                              .IgnoreCase = Not MatchCase
                              .MultiLine = MultiLine
                          End With
                          
                          RegExpReplace = RegX.Replace(LookIn, ReplaceWith)
                          
                      End Function

Open in new window


Function RegExpReplaceRange(LookIn As String, PatternStr As String, Optional ReplaceWith As String = "", _
                          Optional StartAt As Long = 1, Optional EndAt As Long = 0, Optional MatchCase As Boolean = True, _
                          Optional MultiLine As Boolean = False)
                          
                          ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
                          ' as long as you properly credit and attribute authorship and the URL of where you
                          ' found the code
                          
                          ' This function relies on the VBScript version of Regular Expressions, and thus some of
                          ' the functionality available in Perl and/or .Net may not be available.  The full extent
                          ' of what functionality will be available on any given computer is based on which version
                          ' of the VBScript runtime is installed on that computer
                          
                          ' This function uses Regular Expressions to parse a string, and replace parts of the string
                          ' matching the specified pattern with another string.  In particular, this function replaces
                          ' the specified range of matched values with the designated replacement string.
                          
                          ' StartAt indicates the start of the range of matches to be replaced.  Thus, 2 indicates
                          ' that the second match gets replaced starts the range of matches to be replaced.  Use zero
                          ' to specify the last match.
                          
                          ' EndAt indicates the end of the range of matches to be replaced.  Thus, a 5 would indicate
                          ' that the 5th match is the last one to be replaced. Use zero to specify the last match.
                          
                          ' Thus, if you use StartAt = 2 and EndAt = 5, then the 2nd through 5th matches will be
                          ' replaced.
                          
                          ' By default, RegExp is case-sensitive in pattern-matching.  To keep this, omit MatchCase
                          ' or set it to True
                          
                          ' If you use this function from Excel, you may substitute range references for all the
                          ' arguments
                          
                          ' Note: Match.FirstIndex assumes that the first character position in a string is zero.
                          ' This differs from VBA and VB6, which has the first character at position 1
                          
                          ' Normally as an object variable I would set the RegX variable to Nothing; however, in
                          ' cases where a large number of calls to this function are made, making RegX a static
                          ' variable that preserves its state in between calls significantly improves performance
                          
                          Static RegX As Object
                          Dim TheMatches As Object
                          Dim StartStr As String
                          Dim WorkingStr As String
                          Dim Counter As Long
                          Dim arr() As String
                          Dim StrStart As Long
                          Dim StrEnd As Long
                          
                          ' Instantiate RegExp object
                          
                          If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
                          With RegX
                              .Pattern = PatternStr
                              
                              ' First search needs to find all matches
                              
                              .Global = True
                              .IgnoreCase = Not MatchCase
                              .MultiLine = MultiLine
                              
                              ' Run RegExp to find the matches
                              
                              Set TheMatches = .Execute(LookIn)
                              
                              ' If there are no matches, no replacement need to happen
                          
                              If TheMatches.Count = 0 Then
                                  RegExpReplaceRange = LookIn
                                  GoTo Cleanup
                              End If
                              
                              ' Reset StartAt and EndAt if necessary based on matches actually found.  Escape if StartAt > EndAt
                              ' 0 or -1 indicates last match.  Negative number indicates Nth to last
                              
                              Select Case StartAt
                                  Case Is > 0: 'no adjustment needed
                                  Case 0, -1: StartAt = TheMatches.Count
                                  Case Is < -TheMatches.Count: StartAt = -StartAt
                                  Case Else: StartAt = TheMatches.Count + StartAt + 1
                              End Select
                              
                              Select Case EndAt
                                  Case Is > 0: 'no adjustment needed
                                  Case 0, -1: EndAt = TheMatches.Count
                                  Case Is < -TheMatches.Count: EndAt = -EndAt
                                  Case Else: EndAt = TheMatches.Count + EndAt + 1
                              End Select
                              
                              If StartAt > EndAt Then
                                  RegExpReplaceRange = LookIn
                                  GoTo Cleanup
                              End If
                              
                              ' Now create an array for the partial strings.  The elements of the array correspond to...
                              ' 0         : text before the 1st match
                              ' 1         : the first match
                              ' 2 * N - 2 : text between the (N - 1)th and the Nth match (repeat as needed)
                              ' 2 * N - 1 : the Nth match (repeat as needed)
                              ' X         : text after the last match (X = 2 * number of matches)
                              
                              ReDim arr(0 To 2 * TheMatches.Count) As String
                              
                              ' Loop through the matches to populate the array
                              
                              For Counter = 1 To TheMatches.Count
                                  
                                  ' If Counter = 1 then it's the first match, and we need the text before the first match.
                                  ' If not, then we need the text between the (N - 1)th and the Nth match
                                  
                                  If Counter = 1 Then
                                      arr(0) = Left(LookIn, TheMatches(0).FirstIndex)
                                  Else
                                      
                                      ' Starting character position for text between the (N - 1)th and the Nth match
                                      
                                      StrStart = TheMatches(Counter - 2).FirstIndex + TheMatches(Counter - 2).Length + 1
                                      
                                      ' Length of text between the (N - 1)th and the Nth match
                                      
                                      StrEnd = TheMatches(Counter - 1).FirstIndex - StrStart + 1
                                      arr(2 * Counter - 2) = Mid(LookIn, StrStart, StrEnd)
                                  End If
                                  
                                  ' Now we process the match.  If the match number is within the replacement range,
                                  ' then put the replacement value into the array.  If not, put in the match value
                                  
                                  If Counter >= StartAt And Counter <= EndAt Then
                      '                arr(2 * Counter - 1) = ReplaceWith
                                      arr(2 * Counter - 1) = .Replace(TheMatches(Counter - 1), ReplaceWith)
                                  Else
                                      arr(2 * Counter - 1) = TheMatches(Counter - 1)
                                  End If
                                  
                                  ' If Counter = TheMatches.Count then we need to get the text after the last match
                                  
                                  If Counter = TheMatches.Count Then
                                      StrStart = TheMatches(Counter - 1).FirstIndex + TheMatches(Counter - 1).Length + 1
                                      arr(UBound(arr)) = Mid(LookIn, StrStart)
                                  End If
                              Next
                          End With
                          
                          ' Use Join to concatenate the elements of the array for our answer
                          
                          RegExpReplaceRange = Join(arr, "")
                          
                      Cleanup:
                          
                          ' Clear object variables
                          
                          Set TheMatches = Nothing
                          
                      End Function

Open in new window


Function RegExpReplaceExpression(LookIn As String, PatternStr As String, Expression As String, _
                          Optional StartAt As Long = 1, Optional EndAt As Long = 0, _
                          Optional MatchCase As Boolean = True, Optional MultiLine As Boolean = False)
                          
                          ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
                          ' as long as you properly credit and attribute authorship and the URL of where you
                          ' found the code
                          
                          ' This function relies on the VBScript version of Regular Expressions, and thus some of
                          ' the functionality available in Perl and/or .Net may not be available.  The full extent
                          ' of what functionality will be available on any given computer is based on which version
                          ' of the VBScript runtime is installed on that computer
                          
                          ' This function is intended for use only in Excel-based VBA projects, since it relies on the
                          ' Excel Application.Evaluate method to process the expression.  The expression must use only
                          ' normal arithmetic operators and/or native Excel functions.  Use $& to indicate where the
                          ' entire match value should go, or $1 through $9 to use submatches 1 through 9
                          
                          ' This function uses Regular Expressions to parse a string, and replace parts of the string
                          ' matching the specified pattern with another string.  In particular, this function replaces
                          ' the specified range of matched values with the designated replacement string.  In a twist,
                          ' though
                          
                          ' StartAt indicates the start of the range of matches to be replaced.  Thus, 2 indicates that
                          ' the second match gets replaced.  Use zero to specify the last match.  Negative numbers
                          ' indicate the Nth to last: -1 is the last, -2 the 2nd to last, etc
                          
                          ' EndAt indicates the end of the range of matches to be replaced.  Thus, a 5 would indicate
                          ' that the 5th match is the last one to be replaced. Use zero to specify the last match.
                          ' Negative numbers indicate the Nth to last: -1 is the last, -2 the 2nd to last, etc
                          
                          ' Thus, if you use StartAt = 2 and EndAt = 5, then the 2nd through 5th matches will be replaced.
                          
                          ' By default, RegExp is case-sensitive in pattern-matching.  To keep this, omit MatchCase or
                          ' set it to True
                          
                          ' Note: Match.FirstIndex assumes that the first character position in a string is zero.  This
                          ' differs from VBA and VB6, which has the first character at position 1
                          
                          ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
                          ' where a large number of calls to this function are made, making RegX a static variable that
                          ' preserves its state in between calls significantly improves performance
                          
                          Static RegX As Object
                          Dim TheMatches As Object
                          Dim StartStr As String
                          Dim WorkingStr As String
                          Dim Counter As Long
                          Dim arr() As String
                          Dim StrStart As Long
                          Dim StrEnd As Long
                          Dim Counter2 As Long
                          
                          ' Instantiate RegExp object
                          
                          If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
                          With RegX
                              .Pattern = PatternStr
                              
                              ' First search needs to find all matches
                              
                              .Global = True
                              .IgnoreCase = Not MatchCase
                              .MultiLine = MultiLine
                              
                              ' Run RegExp to find the matches
                              
                              Set TheMatches = .Execute(LookIn)
                              
                              ' If there are no matches, no replacement need to happen
                          
                              If TheMatches.Count = 0 Then
                                  RegExpReplaceExpression = LookIn
                                  GoTo Cleanup
                              End If
                              
                              ' Reset StartAt and EndAt if necessary based on matches actually found.  Escape if StartAt > EndAt
                              
                              Select Case StartAt
                                  Case Is > 0: 'no adjustment needed
                                  Case 0, -1: StartAt = TheMatches.Count
                                  Case Is < -TheMatches.Count: StartAt = -StartAt
                                  Case Else: StartAt = TheMatches.Count + StartAt + 1
                              End Select
                                  
                              Select Case EndAt
                                  Case Is > 0: 'no adjustment needed
                                  Case 0, -1: EndAt = TheMatches.Count
                                  Case Is < -TheMatches.Count: EndAt = -EndAt
                                  Case Else: EndAt = TheMatches.Count + EndAt + 1
                              End Select
                              
                              If StartAt > EndAt Then
                                  RegExpReplaceExpression = LookIn
                                  GoTo Cleanup
                              End If
                              
                              ' Now create an array for the partial strings.  The elements of the array correspond to...
                              ' 0         : text before the 1st match
                              ' 1         : the first match
                              ' 2 * N - 2 : text between the (N - 1)th and the Nth match (repeat as needed)
                              ' 2 * N - 1 : the Nth match (repeat as needed)
                              ' X         : text after the last match (X = 2 * number of matches)
                              
                              ReDim arr(0 To 2 * TheMatches.Count) As String
                              
                              ' Loop through the matches to populate the array
                              
                              For Counter = 1 To TheMatches.Count
                                  
                                  ' If Counter = 1 then it's the first match, and we need the text before the first match.
                                  ' If not, then we need the text between the (N - 1)th and the Nth match
                                  
                                  If Counter = 1 Then
                                      arr(0) = Left(LookIn, TheMatches(0).FirstIndex)
                                  Else
                                      
                                      ' Starting character position for text between the (N - 1)th and the Nth match
                                      
                                      StrStart = TheMatches(Counter - 2).FirstIndex + TheMatches(Counter - 2).Length + 1
                                      
                                      ' Length of text between the (N - 1)th and the Nth match
                                      
                                      StrEnd = TheMatches(Counter - 1).FirstIndex - StrStart + 1
                                      arr(2 * Counter - 2) = Mid(LookIn, StrStart, StrEnd)
                                  End If
                                  
                                  ' Now we process the match.  If the match number is within the replacement range,
                                  ' then put the replacement value into an Evaluate expression, and place the result
                                  ' into the array.  If not, put in the match value
                                  
                                  If Counter >= StartAt And Counter <= EndAt Then
                                  
                                      ' $& stands in for the entire match
                                      
                                      Expression = Replace(Expression, "$&", TheMatches(Counter - 1))
                                      
                                      ' Now loop through the Submatches, if applicable, and make replacements
                                      
                                      For Counter2 = 1 To TheMatches(Counter - 1).Submatches.Count
                                          Expression = Replace(Expression, "$" & Counter2, TheMatches(Counter - 1).Submatches(Counter2 - 1))
                                      Next
                                      
                                      ' Evaluate the expression
                                      
                                      arr(2 * Counter - 1) = Evaluate(Expression)
                                  Else
                                      arr(2 * Counter - 1) = TheMatches(Counter - 1)
                                  End If
                                  
                                  ' If Counter = TheMatches.Count then we need to get the text after the last match
                                  
                                  If Counter = TheMatches.Count Then
                                      StrStart = TheMatches(Counter - 1).FirstIndex + TheMatches(Counter - 1).Length + 1
                                      arr(UBound(arr)) = Mid(LookIn, StrStart)
                                  End If
                              Next
                          End With
                          
                          ' Use Join to concatenate the elements of the array for our answer
                          
                          RegExpReplaceExpression = Join(arr, "")
                          
                      Cleanup:
                          
                          ' Clear object variables
                          
                          Set TheMatches = Nothing
                          Set RegX = Nothing
                          
                      End Function

Open in new window



=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you liked this article and want to see more from this author,  please click here.

If you found this article helpful, please click the Yes button near the:

      Was this article helpful?

label that is just below and to the right of this text.   Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Regular-Expressions-Examples.xls
80
133,247 Views
Patrick Matthews
CERTIFIED EXPERT

Comments (38)

Most Valuable Expert 2012
Top Expert 2012

Commented:
Patrick - I use several of your articles as a regular resource (and some methods just roll off the fingertips, today :).  In my most recent article, https://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/A_8933-How-to-quickly-and-accurately-populate-Word-documents-with-Excel-data-charts-and-images-including-Automated-Bookmark-generation.html, I used your code (see comment below):

*Note:  The code for finding bookmarks in Word was achieved leveraging RegEx – specifically the RegExpFind function (see matthewspatrick's most excellent article on the subject: http:\A_1336.html).  My RegEx pattern \[{2}[A-Za-z0-9_]+\]{2} looks for the [[ ]] brackets to define where the bookmark name would be, with alphanumeric and underscore allowed inside the brackets.  Bookmark indicators with any other characters or spaces are included inside the brackets will be ignored (Word doesn't allow spaces in Bookmark names, and through convenience, disallowed any other characters).  Check your template_BM.dot to ensure all your Bookmarks were created, just to be sure you didn’t use any disallowed characters!

I did, however, need to make one fairly minor change to work most effectively in my routines, basically to return NULL as opposed to empty string (example below):
 
' If there are no matches, return empty string
    
    Else
        RegExpFind = Null 'updated by dlmille of E-E for testing of null result if empty or array without generating error
    End If

Open in new window


This seems fairly obvious, and anyone can modify to suit the needs of their calling apps with reuseable code.  The point I'd like to make is that your code was so well documented, it was a pleasure jumping right in and tweaking with literally no trouble at all.  In fact, your code has made me a bit lazy, as I've not put much attention into learning the RegEx object as your routines come to mind, every time, lol ;)

Cheers,

Dave
' If there are no matches, return empty string
    
    Else
        RegExpFind = Null 'updated by dlmille of E-E for testing of null result if empty or array without generating error
    End If

Open in new window

CERTIFIED EXPERT
Top Expert 2010

Author

Commented:
Dave,

I can think of no higher praise for my articles than for colleagues like you to say that you use them and learn from them!

Nice work on amending that to return a null rather than an empty string.  I can see how that would be handy.

You've become quite the accomplished author yourself!

Cheers,

Patrick
CERTIFIED EXPERT
Top Expert 2010

Author

Commented:
This article has been updated to indicate that "lazy" quantifiers--that is, non-greedy--are supported, and that the following additional tokens are available in the Replace method:

$+   Last submatch
$`   Text prior to match
$'   Text after match
$_   Entire input string

Open in new window

Stephen ByromWarehouse/Shipping

Commented:
This computer hobby of mine is slowly becoming very time consuming with stuff for work, and other people. It's a good job there are people out there like Patrick who are willing to impart invaluable knowledge for us lesser mortals, who are just scraping at the edge of coding.

Thanks
SteveCost Accountant
CERTIFIED EXPERT
Top Expert 2012

Commented:
As ever a very concise clear and useful article.
Prety much all I needed to know in one place.
Nice job,
Steve.

View More

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.