<

Using Regular Expressions in Visual Basic for Applications and Visual Basic 6

Published on
180,137 Points
89,336 Views
78 Endorsements
Last Modified:
Awarded
Editor's Choice
Community Pick
Using Regular Expressions in VBA and VB6 Projects

By Patrick Matthews (matthewspatrick, http://www.experts-exchange.com/M_2428436.html)

Introduction

Regular Expressions (RegExp) is a class that offers incredibly powerful text parsing capabilities, in particular allowing users to find (and even replace) substrings within larger text entries that fit a particular pattern.  This article provides basic information on the VBScript implementation of RegExp, and how you can use it to gain robust text parsing capabilities in Visual Basic 6 (VB6) and Visual Basic for Applications (VBA) projects.  With a little knowledge of the rules and some imagination, you can build incredibly flexible and/or complex patterns.

While this article may be most useful to users with intermediate or advanced programming skills, users of all skill levels (beginners included) can benefit from using RegExp by implementing the "wrapper" functions included in this article.  (This article includes a sample Excel workbook that provides examples for each function.)  By adding these functions to your VB6/VBA projects, you can quickly and easily incorporate RegExp functionality into your applications.  If you are a Microsoft Office user, you can even use these functions in worksheet formulas for Microsoft Excel or in queries in Microsoft Access.  

These new functions are:
RegExpFind (Returns information for a single match from the input text, or an array of all the matches.  Info returned can be the match Value, starting character position, or Length for a single match, or for all matches)
RegExpFindExtended (Returns a 2-D array of the Value, starting character position, or Length)
RegExpFindSubmatch (Returns a 2-D array of submatches for the various matches in the input text)
RegExpReplace (Returns a string in which either all matches are replaced with the replacement text, or just the first match)
RegExpReplaceExpression (Excel only, allows you to build an expression using arithmetic operators and/or Excel functions to manipulate the replacement text)
RegExpReplaceRange (Returns a string in which the Xth to Yth matches are replaced)

Quick Examples Using RegExp

The following two examples briefly illustrate the usefulness of RegExp, and also you might use some of the included wrapper functions to address each problem.

Consider the following text:

"He hit the ball.  Then he ran.  The crowd was cheering!  How did he feel?  I felt so energized!"

Suppose you needed to extract the first word from each sentence.  You could use a pattern such as:

"(^|([\.!?] +))([a-z]+)"

Open in new window


That pattern matches the following:
The beginning of the string OR (period, exclamation point, or question mark followed by 1+ spaces) {This whole item taken together is the first submatch; the period, exclamation point, or question mark followed by 1+ spaces is the 2nd submatch}
Then, 1+ characters from a-z {3rd submatch}

Using one of the wrapper functions, this would return an array of the first words in each sentence:

RegExpFindSubmatch(MyText, "(^|([\.!?] +))([a-z]+)", , 3, False)
Returns "He", "Then", "The", "How", "I"

Open in new window


If you wanted to extract the last word in each sentence, you could use this pattern instead:

"[a-z]+(?=[\.!\?])"

Open in new window


That pattern matches the following:
1+ characters from a-z
Then,  a "look ahead" to see if it is followed by a period, exclamation point, or question mark.  In a look ahead, RegExp checks to see whether the look ahead condition is met, but does not include the look ahead in the match that gets returned

Using a wrapper function, this returns an array of the last word in each sentence:

RegExpFind(MyText, "[a-z]+(?=[\.!\?])", , False)
Returns "ball", "ran", "cheering", "feel", "energized"

Open in new window


For another example (based on EE question http://www.experts-exchange.com/Q_24367818.html), suppose you had the following log entries:

http://www.abc.com
www.xyz.com
123.mno.net
http://www.abc.com/category
www.abc.net/category/sub-category/index.html
xyz.info/index.php
www.abc.com:8082/

Suppose that, for each entry, you had to extract the site address (e.g., www.abc.com).  You could use the following pattern, replacing all text matching that pattern with a zero-length string:

"http://|[:/].*"

Open in new window


That pattern matches the following:
http:// OR
(colon or forward slash) followed by zero or more characters of any type

If you were looping through an array of log entries, you could use this to extract the address:

RegExpReplace(LogArr(i), "http://|[:/].*", "", True, False)

Open in new window


If instead you needed to extract just the top-level domain (e.g., com, net, info), you could use this pattern instead:

"(http://)?([\w-]+\.)([\w-]+\.)?([\w-]+)([:/].*)?"

Open in new window


That pattern matches the following:
http:// appearing zero or one time {1st submatch}
Then, a word character or dash appearing 1+ times followed by a period {2nd submatch}
Then, a word character or dash appearing 1+ times followed by a period, the whole block appearing zero or one time {3rd submatch}
Then, a word character or dash appearing 1+ times {4th submatch}
Then, a colon or forward slash followed by zero or more characters of any type {5th submatch}

In this case, the 4th submatch will be the top-level domain.  To get it, you could use one of two expressions:

RegExpReplace(LogArr(i), "(http://)?([\w-]+\.)([\w-]+\.)?([\w-]+)([:/].*)?", "$4", True, False)

or

=RegExpFindSubmatch(LogArr(i), "(http://)?([\w-]+\.)([\w-]+\.)?([\w-]+)([:/].*)?", 1, 4, False, False)

Open in new window


If instead this log were in an Access table, you could use these wrapper functions in a query such as this:

SELECT LogEntry, 
    RegExpReplace(LogArr(i), "http://|[:/].*", "", True, False) AS Address,
    RegExpReplace(LogArr(i), "(http://)?([\w-]+\.)([\w-]+\.)?([\w-]+)([:/].*)?", 
        "$4", True, False) AS TopLevel
FROM LogTable

Open in new window


The VBScript RegExp Class

Regular Expressions may be most familiar to users of the Perl programming language, but many other languages incorporate similar functionality.  This article concerns how to use the RegExp class implemented in VBScript 5.5 to enhance your text parsing capabilities in your Visual Basic for Applications (and thus Microsoft Excel, Access, or any other application that supports VBA) and Visual Basic 6 projects.  Because this article focuses on the specific VBScript implementation, there may be some differences in syntax and/or features when compared to other Regular Expressions implementations.  (For example, the VBScript implementation does not support the "look behind" capabilities in Perl and the .Net framework.)

 RegExp Object Model
To use RegExp in your VB6 or VBA projects, you must instantiate a RegExp object.  If you are using early binding, then you would add a reference to the "Microsoft VBScript Regular Expressions 5.5" library.  (Do not use the version 1.0 library, as it lacks many of the best features of RegExp and is provided solely for backward compatibility.)  You would then declare the variable in your code this way:

Dim RegX As VBScript_RegExp_55.RegExp
Set RegX = New VBScript_RegExp_55.RegExp

Open in new window


Using late binding, you would do it this way:

Dim RegX As Object
Set RegX = CreateObject("VBScript.RegExp")

Open in new window


Once you have the object instantiated, you will have to set certain properties of the RegExp class:
Global: This determines whether a RegExp method will try to find only the first match (False), or all matches (True).  The default value is False.
IgnoreCase: This determines whether the pattern match is case-sensitive; the default is False (thus making pattern matching case-sensitive by default).  If IgnoreCase is True, then a pattern of [a-z] or [A-Z] is equivalent to a pattern of [a-zA-Z].
Multiline: Indicates how RegExp should handle input text with line breaks.  In particular, it determines how the ^ (start of input) and $ (end of input) behave: if False, then these match only the very beginning and very end of the entire input string.  If True, then ^ and $ match the beginning and end of each line in the input string.  The default value is False.
Pattern: This is the pattern that RegExp tries to match.  Patterns can be relatively simple, or highly complex.  For example, the pattern

circus
will look only for the six consecutive characters "circus" (and, if IgnoreCase = False, then those characters would have to be lower case to make a match).  On the other hand:

circus(es)?
will match "circus" or "circuses".  Further, the following pattern matches circus, circuses, pony, or ponies:

circus(es)?|pon(y|ies)

Please see the Quick Primer on Pattern Syntax for a few simple tips on Pattern syntax to get you started.  The functionality is incredibly rich, however, and for fuller treatments of the subject you should look in the Further Reading section of this article for web sites and books that provide excellent information on the RegExp Pattern syntax.

After you have set the RegExp properties to appropriate values, you then have to decide which method to use.  The RegExp class exposes three methods:
Execute: This method returns a Matches collection (of Match objects), containing the various matches to the specified pattern found in the input string (which is passed to the Execute method as an argument).  The Global property determines whether the collection contains all matches (True) or just the first match (False).
Replace: This method takes two arguments.  The first argument is the string to be searched for pattern matches, and the second is the replacement text to be used if a match is found.  The Global property determines whether RegExp replaces all matches (True) or just the first match (False).  The Replace method returns a string that includes all applicable replacements.  (Note that if there are no matches found, then the output and the original input are identical.)  To incorporate the entire match into the replacement text, use $& to stand in for the match, and to use a submatch in the replacement text, use $1, $2, $3, etc. to use the nth submatch (or $+ for the last submatch).  You can also use $` for the text prior to the match, $' for the text after the match, and $_ for the entire input string
Test: This method takes a single argument, the input string, and returns a Boolean True/False to indicate whether there is at least one match in the input string.

If you need to work with the various matches, then you work through the Matches collection and the various Match objects it contains.  The Matches collection has the following properties:
Count: This is the number of Match objects contained in the collection
Item: This returns the indicated Match object based on an index (the argument for this property).  Please note that the first item takes the index number zero, and thus the index for the last item is one less than the Count.  The Item property is the default property for the Matches collection, so MyMatches(0) is equivalent to MyMatches.Item(0).

For example, this code enumerates the Matches:

Dim RegX As Object, Mats As Object, Counter As Long

Set RegX = CreateObject("VBScript.RegExp")
With RegX
    .Global = True
    .Pattern = "\d+"
    Set Mats = .Execute("1 turtle, 15 rabbits, and 1078 swans")
End With

For Counter = 0 To Mats.Count - 1
    MsgBox Mats(Counter)
Next

Set Mats = Nothing
Set RegX = Nothing

Open in new window


That code will present three MsgBoxes, showing 1, 15, and 1078 respectively.

To work further with the individual Match objects, you can use the following properties:
FirstIndex: The starting position of the Match value within the original input string.  Please be aware that this property treats the first character in the input string as position zero, whereas VB6 and VBA (as well as Access and Excel) typically treat the first character as position 1.  The sample functions I provide here correct for this.
Length: The number of characters in the Match value
Submatches: This returns a Submatches collection composed of the various Submatch objects found within the Match.
Value: The string of the Match itself.  Value is the Match object's default property, so in the code example above Mats(Counter) is equivalent to Mats(Counter).Value.

Drilling down even further, a Match may have Submatches.  A Submatch is a portion of a Match that corresponds to part of the Pattern bracketed by parentheses.  For example, you could use a pattern such as this to match IP addresses in an input string:

(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})

Open in new window


Suppose that the input string were:

      
I thought her IP address was 100.200.300.400, but it was really 99.99.99.99

There would be two matches, 100.200.300.400 and 99.99.99.99.  In addition, each match would have four submatches; in the first case these submatches are 100, 200, 300, and 400.  These submatches correspond to the portions of the pattern enclosed in parentheses.

The Submatches collection has the following properties:
Count: This is the number of Submatch objects contained in the collection.  There will always be as many Submatch objects as there are groups in the pattern enclosed in parentheses.
Item: This returns the indicated Submatch object based on an index (the argument for this property).  Please note that the first item takes the index number zero, and thus the index for the last item is one less than the Count.  The Item property is the default property for the Submatches collection, so MyMatch.Submatches(0) is equivalent to MyMatch.Submatches.Item(0).

The Submatch object has a single property, Value.  This Value is simply the text comprising the Submatch.  If there is no match found for the indicated group in the pattern, then the Submatch value is a zero-length string.

Quick Primer on Pattern Syntax

This section includes some basic pointers on Pattern syntax to help you get started.  It is, however, by no means complete, and for more detailed treatments of the subject you should review the Web sites and books in the Further Reading section of this article.

Keep in mind that in all cases, whether a match is case sensitive or not depends on the setting for RegExp.IgnoreCase.

Symbol		Notes
--------------------------------------------------------------------------------------------
abc, ABC	Matches "abc" or "ABC"
[abc]		Matches any character within the brackets
[^abc]		Matches any character not within the brackets
[a-n], [0-5]	Matches any character within the range of letters or digits in brackets
[^a-n], [^0-5]	Matches any character not within the range of letters or digits in brackets
.		Matches any single character except a new line character
A|B		Matches "A" or "B"
()		Creates a subgroup within the pattern
{x}		Matches the preceding element exactly x times
{x,y}		Matches the preceding element between x and y times (greedy)
{x,y}?		Matches the preceding element between x and y times (non-greedy)
{x,}		Matches the preceding element at least x times (greedy)
{x,}?		Matches the preceding element at least x times (non-greedy)
{,y}		Matches the preceding element between zero and y times (greedy)
{,y}?		Matches the preceding element between zero and y times (non-greedy)
*		Matches preceding element zero or more times (greedy)
*?		Matches preceding element zero or more times (non-greedy)
?		Matches preceding element zero or one times
+		Matches preceding element one or more times (greedy)
+?		Matches preceding element one or more times (non-greedy)
^		Matches start of input (or start of line if RegExp.Multiline = True)
$		Matches end of input (or end of line if RegExp.Multiline = True)
\d		Matches a digit (same as [0-9])
\D		Matches a non-digit (same as [^0-9])
\b		Matches a word boundary
\B		Matches a non-word boundary
\f		Form feed
\n		New line
\r		Carriage return
\t		Tab
\v		Vertical tab
\s		Matches a "white space" character, same as [ \f\n\r\t\v]
\S		Matches a non-"white space" character, same as [^ \f\n\r\t\v]
\w		Matches a "word" character plus underscore, same as [A-Za-z0-9_]
\W		Matches a non-"word" character plus underscore, same as [^A-Za-z0-9_]
\		"Escapes" a special character.  
		For example:
		   To match the backslash, use \\
		   To match the $, use \$
		   To match the period, use \.

x(?=y)		Positive lookahead
		   X is a pattern you want to match
		   Y is a pattern you want to see immediately follow the pattern you are 
			looking for, but is not included in the match
		Example: Suppose you have a part number that is always four digits, 
		   with an optional suffix a, b, c, etc.  You want to grab all matches 
		   that look like part numbers with suffixes, but you want to exclude 
		   the suffix from the match.  
		   You could use a pattern \d{4}(?=[a-z]).  
		   If the input string is "1234, 2345x, 3456", it matches "2345".

x(?!y)		Negative lookahead
		   X is a pattern you want to match
		   Y is a pattern you do not want to see immediately follow the pattern 
		   	you are looking for, but is not included in the match
		Example: Suppose you have a part number that is always four digits, 
		   with an optional suffix a, b, c, etc.  You want to grab all matches that 
		   look like part numbers, but you want to exclude part numbers with suffixes.  
		   You could use a pattern \d{4}(?![a-z])
		   If the input string is "1234, 2345x, 3456", it matches "1234" and "3456".

Open in new window


Further Reading

For further reading, I recommend the following web sites and books:

For an excellent tutorial on RegExp, including information on the various language-specific implementations, go to: "Using Regular Expressions with Client Side Scripting", http://www.regular-expressions.info/vbscript.html

This site provides a very complete listing of the syntax for RegExp as implemented in VBScript and Jscript: "Regular Expression Syntax (Scripting)", http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx

For general information on VBScript RegExp: "Microsoft Beefs Up VBScript with Regular Expressions", http://msdn.microsoft.com/en-us/library/ms974570.aspx

Another site for general information: "An Introduction to Regular Expression with VBScript", http://www.4guysfromrolla.com/webtech/090199-1.shtml

I keep the following book at my desk for quick reference on RegExp and other VBScript matters: VBScript in a Nutshell, written by Lomax, Childs, and Petrusha.  Its descriptions of the various RegExp-related objects, properties, and methods are useful, but are a bit incomplete.  (For example, the book makes no mention of submatches or "look ahead".)

If you really want to master RegExp, then I suggest Mastering Regular Expressions, by Friedl.  That book is not for the faint-hearted, but if you really want to know about how RegExp works its magic, and how you can get the most out of RegExp, it is worth the effort.

For additional examples and practical applications, I also recommend the Regular Expressions Cookbook, by Goyvaerts and Levithan.

Acknowledgements

I would like to thank EE Member brettdj (http://www.experts-exchange.com/M_770818.html) for introducing me to the wonderful world of Regular Expressions, and for his generous assistance in reviewing this article before submission.  His very helpful comments helped improve the example workbook markedly.  webtubbs (http://www.experts-exchange.com/M_3470508.html) also reviewed this article before submission.  Any mistakes in this article are most certainly mine, and not theirs.

Code Samples

Below is source code for six "wrapper" user defined functions you may use and freely distribute, as long as you properly attribute authorship and where you found the code.  You will also find these functions in the included sample file "Regular Expressions Examples.xls".

Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
    Optional MatchCase As Boolean = True, Optional ReturnType As Long = 0, _
    Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
    ' pattern (PatternStr).  Use Pos to indicate which match you want:
    ' Pos omitted               : function returns a zero-based array of all matches
    ' Pos = 1                   : the first match
    ' Pos = 2                   : the second match
    ' Pos = <positive integer>  : the Nth match
    ' Pos = 0                   : the last match
    ' Pos = -1                  : the last match
    ' Pos = -2                  : the 2nd to last match
    ' Pos = <negative integer>  : the Nth to last match
    ' If Pos is non-numeric, or if the absolute value of Pos is greater than the number of
    ' matches, the function returns an empty string.  If no match is found, the function returns
    ' an empty string.  (Earlier versions of this code used zero for the last match; this is
    ' retained for backward compatibility)
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' ReturnType indicates what information you want to return:
    ' ReturnType = 0            : the matched values
    ' ReturnType = 1            : the starting character positions for the matched values
    ' ReturnType = 2            : the lengths of the matched values
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    ' Note: RegExp counts the character positions for the Match.FirstIndex property as starting
    ' at zero.  Since VB6 and VBA has strings starting at position 1, I have added one to make
    ' the character positions conform to VBA/VB6 expectations
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    Dim TheMatches As Object
    Dim Answer()
    Dim Counter As Long
    
    ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(Pos) Then
        If Not IsNumeric(Pos) Then
            RegExpFind = ""
            Exit Function
        Else
            Pos = CLng(Pos)
        End If
    End If
    
    ' Evaluate ReturnType
    
    If ReturnType < 0 Or ReturnType > 2 Then
        RegExpFind = ""
        Exit Function
    End If
    
    ' Create instance of RegExp object if needed, and set properties
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
        
    ' Test to see if there are any matches
    
    If RegX.Test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        
        Set TheMatches = RegX.Execute(LookIn)
        
        ' Test to see if Pos is negative, which indicates the user wants the Nth to last
        ' match.  If it is, then based on the number of matches convert Pos to a positive
        ' number, or zero for the last match
        
        If Not IsMissing(Pos) Then
            If Pos < 0 Then
                If Pos = -1 Then
                    Pos = 0
                Else
                    
                    ' If Abs(Pos) > number of matches, then the Nth to last match does not
                    ' exist.  Return a zero-length string
                    
                    If Abs(Pos) <= TheMatches.Count Then
                        Pos = TheMatches.Count + Pos + 1
                    Else
                        RegExpFind = ""
                        GoTo Cleanup
                    End If
                End If
            End If
        End If
        
        ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
        ' function's return value
        
        If IsMissing(Pos) Then
            ReDim Answer(0 To TheMatches.Count - 1)
            For Counter = 0 To UBound(Answer)
                Select Case ReturnType
                    Case 0: Answer(Counter) = TheMatches(Counter)
                    Case 1: Answer(Counter) = TheMatches(Counter).FirstIndex + 1
                    Case 2: Answer(Counter) = TheMatches(Counter).Length
                End Select
            Next
            RegExpFind = Answer
        
        ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
        
        Else
            Select Case Pos
                Case 0                          ' Last match
                    Select Case ReturnType
                        Case 0: RegExpFind = TheMatches(TheMatches.Count - 1)
                        Case 1: RegExpFind = TheMatches(TheMatches.Count - 1).FirstIndex + 1
                        Case 2: RegExpFind = TheMatches(TheMatches.Count - 1).Length
                    End Select
                Case 1 To TheMatches.Count      ' Nth match
                    Select Case ReturnType
                        Case 0: RegExpFind = TheMatches(Pos - 1)
                        Case 1: RegExpFind = TheMatches(Pos - 1).FirstIndex + 1
                        Case 2: RegExpFind = TheMatches(Pos - 1).Length
                    End Select
                Case Else                       ' Invalid item number
                    RegExpFind = ""
            End Select
        End If
    
    ' If there are no matches, return empty string
    
    Else
        RegExpFind = ""
    End If
    
Cleanup:
    ' Release object variables
    
    Set TheMatches = Nothing
    
End Function

Open in new window


Function RegExpFindExtended(LookIn As String, PatternStr As String, Optional Pos, _
    Optional MatchCase As Boolean = True, Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function uses Regular Expressions to parse a string (LookIn), and returns a 0-(N-1), 0-2
    ' array of the matched values (position 0 for the 2nd dimension), the starting character
    ' positions (position 1 for the 2nd dimension), and the length of the matched values (position 2
    ' for the 2nd dimension)
    
    ' Use Pos to indicate which match you want:
    ' Pos omitted               : function returns a zero-based array of all matches
    ' Pos = 1                   : the first match
    ' Pos = 2                   : the second match
    ' Pos = <positive integer>  : the Nth match
    ' Pos = 0                   : the last match
    ' Pos = -1                  : the last match
    ' Pos = -2                  : the 2nd to last match
    ' Pos = <negative integer>  : the Nth to last match
    ' If Pos is non-numeric, or if the absolute value of Pos is greater than the number of
    ' matches, the function returns an empty string.  If no match is found, the function returns
    ' an empty string.
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    ' Note: RegExp counts the character positions for the Match.FirstIndex property as starting
    ' at zero.  Since VB6 and VBA has strings starting at position 1, I have added one to make
    ' the character positions conform to VBA/VB6 expectations
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    Dim TheMatches As Object
    Dim Answer()
    Dim Counter As Long
    
    ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(Pos) Then
        If Not IsNumeric(Pos) Then
            RegExpFindExtended = ""
            Exit Function
        Else
            Pos = CLng(Pos)
        End If
    End If
    
    ' Create instance of RegExp object
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
        
    ' Test to see if there are any matches
    
    If RegX.Test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        
        Set TheMatches = RegX.Execute(LookIn)
        
        If Not IsMissing(Pos) Then
            If Pos < 0 Then
                If Pos = -1 Then
                    Pos = 0
                Else
                    
                    ' If Abs(Pos) > number of matches, then the Nth to last match does not
                    ' exist.  Return a zero-length string
                    
                    If Abs(Pos) <= TheMatches.Count Then
                        Pos = TheMatches.Count + Pos + 1
                    Else
                        RegExpFindExtended = ""
                        GoTo Cleanup
                    End If
                End If
            End If
        End If
        
        ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
        ' function's return value
        
        If IsMissing(Pos) Then
            ReDim Answer(0 To TheMatches.Count - 1, 0 To 2)
            For Counter = 0 To UBound(Answer)
                Answer(Counter, 0) = TheMatches(Counter)
                Answer(Counter, 1) = TheMatches(Counter).FirstIndex + 1
                Answer(Counter, 2) = TheMatches(Counter).Length
            Next
            RegExpFindExtended = Answer
        
        ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
        
        Else
            Select Case Pos
                Case 0                          ' Last match
                    ReDim Answer(0 To 0, 0 To 2)
                    Answer(0, 0) = TheMatches(TheMatches.Count - 1)
                    Answer(0, 1) = TheMatches(TheMatches.Count - 1).FirstIndex + 1
                    Answer(0, 2) = TheMatches(TheMatches.Count - 1).Length
                    RegExpFindExtended = Answer
                Case 1 To TheMatches.Count      ' Nth match
                    ReDim Answer(0 To 0, 0 To 2)
                    Answer(0, 0) = TheMatches(Pos - 1)
                    Answer(0, 1) = TheMatches(Pos - 1).FirstIndex + 1
                    Answer(0, 2) = TheMatches(Pos - 1).Length
                    RegExpFindExtended = Answer
                Case Else                       ' Invalid item number
                    RegExpFindExtended = ""
            End Select
        End If
    
    ' If there are no matches, return empty string
    
    Else
        RegExpFindExtended = ""
    End If
    
Cleanup:

    ' Release object variables
    
    Set TheMatches = Nothing
    
End Function

Open in new window


Function RegExpFindSubmatch(LookIn As String, PatternStr As String, Optional MatchPos, _
    Optional SubmatchPos, Optional MatchCase As Boolean = True, _
    Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function uses Regular Expressions to parse a string (LookIn), and return "submatches"
    ' from the various matches to a pattern (PatternStr).  In RegExp, submatches within a pattern
    ' are defined by grouping portions of the pattern within parentheses.
    
    ' Use MatchPos to indicate which match you want:
    ' MatchPos omitted               : function returns results for all matches
    ' MatchPos = 1                   : the first match
    ' MatchPos = 2                   : the second match
    ' MatchPos = <positive integer>  : the Nth match
    ' MatchPos = 0                   : the last match
    ' MatchPos = -1                  : the last match
    ' MatchPos = -2                  : the 2nd to last match
    ' MatchPos = <negative integer>  : the Nth to last match
    
    ' Use SubmatchPos to indicate which match you want:
    ' SubmatchPos omitted               : function returns results for all submatches
    ' SubmatchPos = 1                   : the first submatch
    ' SubmatchPos = 2                   : the second submatch
    ' SubmatchPos = <positive integer>  : the Nth submatch
    ' SubmatchPos = 0                   : the last submatch
    ' SubmatchPos = -1                  : the last submatch
    ' SubmatchPos = -2                  : the 2nd to last submatch
    ' SubmatchPos = <negative integer>  : the Nth to last submatch
    
    ' The return type for this function depends on whether your choice for MatchPos is looking for
    ' a single value or for potentially many.  All arrays returned by this function are zero-based.
    ' When the function returns a 2-D array, the first dimension is for the matches and the second
    ' dimension is for the submatches
    ' MatchPos omitted, SubmatchPos omitted: 2-D array of submatches for each match.  First dimension
    '                                        based on number of matches (0 to N-1), second dimension
    '                                        based on number of submatches (0 to N-1)
    ' MatchPos omitted, SubmatchPos used   : 2-D array (0 to N-1, 0 to 0) of the specified submatch
    '                                        from each match
    ' MatchPos used, SubmatchPos omitted   : 2-D array (0 to 0, 0 to N-1) of the submatches from the
    '                                        specified match
    ' MatchPos used, SubmatchPos used      : String with specified submatch from specified match
    
    ' For any submatch that is not found, the function treats the result as a zero-length string
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    Dim TheMatches As Object
    Dim Mat As Object
    Dim Answer() As String
    Dim Counter As Long
    Dim SubCounter As Long
    
    ' Evaluate MatchPos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(MatchPos) Then
        If Not IsNumeric(MatchPos) Then
            RegExpFindSubmatch = ""
            Exit Function
        Else
            MatchPos = CLng(MatchPos)
        End If
    End If
    
    ' Evaluate SubmatchPos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(SubmatchPos) Then
        If Not IsNumeric(SubmatchPos) Then
            RegExpFindSubmatch = ""
            Exit Function
        Else
            SubmatchPos = CLng(SubmatchPos)
        End If
    End If
    
    ' Create instance of RegExp object
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
        
    ' Test to see if there are any matches
    
    If RegX.Test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        
        Set TheMatches = RegX.Execute(LookIn)
        
        ' If MatchPos is missing, user either wants array of all the submatches for each match, or an
        ' array of all the specified submatches for each match.  Build it and assign it as the
        ' function's return value
        
        If IsMissing(MatchPos) Then
            
            ' Return value is a 2-D array of all the submatches for each match
            
            If IsMissing(SubmatchPos) Then
                For Counter = 0 To TheMatches.Count - 1
                    Set Mat = TheMatches(Counter)
                    
                    ' To determine how many submatches there are we need to first evaluate a match.  That
                    ' is why we redim the array inside the for/next loop
                    
                    If Counter = 0 Then
                        ReDim Answer(0 To TheMatches.Count - 1, 0 To Mat.Submatches.Count - 1) As String
                    End If
                    
                    ' Loop through the submatches and populate the array.  If the Nth submatch is not
                    ' found, RegExp returns a zero-length string
                    
                    For SubCounter = 0 To UBound(Answer, 2)
                        Answer(Counter, SubCounter) = Mat.Submatches(SubCounter)
                    Next
                Next
            
            ' Return value is a 2-D array of the specified submatch for each match.
            
            Else
                For Counter = 0 To TheMatches.Count - 1
                    Set Mat = TheMatches(Counter)
                    
                    ' To determine how many submatches there are we need to first evaluate a match.  That
                    ' is why we redim the array inside the for/next loop.  If SubmatchPos = 0, then we want
                    ' the last submatch.  In that case reset SubmatchPos so it equals the submatch count.
                    ' Negative number indicates Nth to last; convert that to applicable "positive" position
                    
                    If Counter = 0 Then
                        ReDim Answer(0 To TheMatches.Count - 1, 0 To 0) As String
                        Select Case SubmatchPos
                            Case Is > 0: 'no adjustment needed
                            Case 0, -1: SubmatchPos = Mat.Submatches.Count
                            Case Is < -Mat.Submatches.Count: SubmatchPos = -SubmatchPos
                            Case Else: SubmatchPos = Mat.Submatches.Count + SubmatchPos + 1
                        End Select
                    End If
                    
                    ' Populate array with the submatch value.  If the submatch value is not found, or if
                    ' SubmatchPos > the count of submatches, populate with a zero-length string
                    
                    If SubmatchPos <= Mat.Submatches.Count Then
                        Answer(Counter, 0) = Mat.Submatches(SubmatchPos - 1)
                    Else
                        Answer(Counter, 0) = ""
                    End If
                Next
            End If
            RegExpFindSubmatch = Answer
            
        ' User wanted the info associated with the Nth match (or last match, if MatchPos = 0)
        
        Else
            
            ' If MatchPos = 0 then make MatchPos equal the match count.  If negative (indicates Nth
            ' to last), convert to equivalent position.
            
            Select Case MatchPos
                Case Is > 0: 'no adjustment needed
                Case 0, -1: MatchPos = TheMatches.Count
                Case Is < -TheMatches.Count: MatchPos = -MatchPos
                Case Else: MatchPos = TheMatches.Count + MatchPos + 1
            End Select
            
            ' As long as MatchPos does not exceed the match count, process the Nth match.  If the
            ' match count is exceeded, return a zero-length string
            
            If MatchPos <= TheMatches.Count Then
                Set Mat = TheMatches(MatchPos - 1)
                
                ' User wants a 2-D array of all submatches for the specified match; populate array.  If
                ' a particular submatch is not found, RegExp treats it as a zero-length string
                
                If IsMissing(SubmatchPos) Then
                    ReDim Answer(0 To 0, 0 To Mat.Submatches.Count - 1)
                    For SubCounter = 0 To UBound(Answer, 2)
                        Answer(0, SubCounter) = Mat.Submatches(SubCounter)
                    Next
                    RegExpFindSubmatch = Answer
                
                ' User wants a single value
                
                Else
                    
                    ' If SubmatchPos = 0 then make it equal count of submatches.  If negative, this
                    ' indicates Nth to last; convert to equivalent positive position
                    
                    Select Case SubmatchPos
                        Case Is > 0: 'no adjustment needed
                        Case 0, -1: SubmatchPos = Mat.Submatches.Count
                        Case Is < -Mat.Submatches.Count: SubmatchPos = -SubmatchPos
                        Case Else: SubmatchPos = Mat.Submatches.Count + SubmatchPos + 1
                    End Select
                    
                    ' If SubmatchPos <= count of submatches, then get that submatch for the specified
                    ' match.  If the submatch value is not found, or if SubmathPos exceeds count of
                    ' submatches, return a zero-length string.  In testing, it appeared necessary to
                    ' use CStr to coerce the return to be a zero-length string instead of zero
                    
                    If SubmatchPos <= Mat.Submatches.Count Then
                        RegExpFindSubmatch = CStr(Mat.Submatches(SubmatchPos - 1))
                    Else
                        RegExpFindSubmatch = ""
                    End If
                End If
            Else
                RegExpFindSubmatch = ""
            End If
        End If
    
    ' If there are no matches, return empty string
    
    Else
        RegExpFindSubmatch = ""
    End If
    
Cleanup:
    ' Release object variables
    Set Mat = Nothing
    Set TheMatches = Nothing
    
End Function

Open in new window


Function RegExpReplace(LookIn As String, PatternStr As String, Optional ReplaceWith As String = "", _
    Optional ReplaceAll As Boolean = True, Optional MatchCase As Boolean = True, _
    Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function uses Regular Expressions to parse a string, and replace parts of the string
    ' matching the specified pattern with another string.  The optional argument ReplaceAll
    ' controls whether all instances of the matched string are replaced (True) or just the first
    ' instance (False)
    
    ' If you need to replace the Nth match, or a range of matches, then use RegExpReplaceRange
    ' instead
    
    ' By default, RegExp is case-sensitive in pattern-matching.  To keep this, omit MatchCase or
    ' set it to True
    
    ' If you use this function from Excel, you may substitute range references for all the arguments
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = ReplaceAll
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
    
    RegExpReplace = RegX.Replace(LookIn, ReplaceWith)
    
End Function

Open in new window


Function RegExpReplaceRange(LookIn As String, PatternStr As String, Optional ReplaceWith As String = "", _
    Optional StartAt As Long = 1, Optional EndAt As Long = 0, Optional MatchCase As Boolean = True, _
    Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function uses Regular Expressions to parse a string, and replace parts of the string
    ' matching the specified pattern with another string.  In particular, this function replaces
    ' the specified range of matched values with the designated replacement string.
    
    ' StartAt indicates the start of the range of matches to be replaced.  Thus, 2 indicates
    ' that the second match gets replaced starts the range of matches to be replaced.  Use zero
    ' to specify the last match.
    
    ' EndAt indicates the end of the range of matches to be replaced.  Thus, a 5 would indicate
    ' that the 5th match is the last one to be replaced. Use zero to specify the last match.
    
    ' Thus, if you use StartAt = 2 and EndAt = 5, then the 2nd through 5th matches will be
    ' replaced.
    
    ' By default, RegExp is case-sensitive in pattern-matching.  To keep this, omit MatchCase
    ' or set it to True
    
    ' If you use this function from Excel, you may substitute range references for all the
    ' arguments
    
    ' Note: Match.FirstIndex assumes that the first character position in a string is zero.
    ' This differs from VBA and VB6, which has the first character at position 1
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in
    ' cases where a large number of calls to this function are made, making RegX a static
    ' variable that preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    Dim TheMatches As Object
    Dim StartStr As String
    Dim WorkingStr As String
    Dim Counter As Long
    Dim arr() As String
    Dim StrStart As Long
    Dim StrEnd As Long
    
    ' Instantiate RegExp object
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        
        ' First search needs to find all matches
        
        .Global = True
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
        
        ' Run RegExp to find the matches
        
        Set TheMatches = .Execute(LookIn)
        
        ' If there are no matches, no replacement need to happen
    
        If TheMatches.Count = 0 Then
            RegExpReplaceRange = LookIn
            GoTo Cleanup
        End If
        
        ' Reset StartAt and EndAt if necessary based on matches actually found.  Escape if StartAt > EndAt
        ' 0 or -1 indicates last match.  Negative number indicates Nth to last
        
        Select Case StartAt
            Case Is > 0: 'no adjustment needed
            Case 0, -1: StartAt = TheMatches.Count
            Case Is < -TheMatches.Count: StartAt = -StartAt
            Case Else: StartAt = TheMatches.Count + StartAt + 1
        End Select
        
        Select Case EndAt
            Case Is > 0: 'no adjustment needed
            Case 0, -1: EndAt = TheMatches.Count
            Case Is < -TheMatches.Count: EndAt = -EndAt
            Case Else: EndAt = TheMatches.Count + EndAt + 1
        End Select
        
        If StartAt > EndAt Then
            RegExpReplaceRange = LookIn
            GoTo Cleanup
        End If
        
        ' Now create an array for the partial strings.  The elements of the array correspond to...
        ' 0         : text before the 1st match
        ' 1         : the first match
        ' 2 * N - 2 : text between the (N - 1)th and the Nth match (repeat as needed)
        ' 2 * N - 1 : the Nth match (repeat as needed)
        ' X         : text after the last match (X = 2 * number of matches)
        
        ReDim arr(0 To 2 * TheMatches.Count) As String
        
        ' Loop through the matches to populate the array
        
        For Counter = 1 To TheMatches.Count
            
            ' If Counter = 1 then it's the first match, and we need the text before the first match.
            ' If not, then we need the text between the (N - 1)th and the Nth match
            
            If Counter = 1 Then
                arr(0) = Left(LookIn, TheMatches(0).FirstIndex)
            Else
                
                ' Starting character position for text between the (N - 1)th and the Nth match
                
                StrStart = TheMatches(Counter - 2).FirstIndex + TheMatches(Counter - 2).Length + 1
                
                ' Length of text between the (N - 1)th and the Nth match
                
                StrEnd = TheMatches(Counter - 1).FirstIndex - StrStart + 1
                arr(2 * Counter - 2) = Mid(LookIn, StrStart, StrEnd)
            End If
            
            ' Now we process the match.  If the match number is within the replacement range,
            ' then put the replacement value into the array.  If not, put in the match value
            
            If Counter >= StartAt And Counter <= EndAt Then
'                arr(2 * Counter - 1) = ReplaceWith
                arr(2 * Counter - 1) = .Replace(TheMatches(Counter - 1), ReplaceWith)
            Else
                arr(2 * Counter - 1) = TheMatches(Counter - 1)
            End If
            
            ' If Counter = TheMatches.Count then we need to get the text after the last match
            
            If Counter = TheMatches.Count Then
                StrStart = TheMatches(Counter - 1).FirstIndex + TheMatches(Counter - 1).Length + 1
                arr(UBound(arr)) = Mid(LookIn, StrStart)
            End If
        Next
    End With
    
    ' Use Join to concatenate the elements of the array for our answer
    
    RegExpReplaceRange = Join(arr, "")
    
Cleanup:
    
    ' Clear object variables
    
    Set TheMatches = Nothing
    
End Function

Open in new window


Function RegExpReplaceExpression(LookIn As String, PatternStr As String, Expression As String, _
    Optional StartAt As Long = 1, Optional EndAt As Long = 0, _
    Optional MatchCase As Boolean = True, Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function is intended for use only in Excel-based VBA projects, since it relies on the
    ' Excel Application.Evaluate method to process the expression.  The expression must use only
    ' normal arithmetic operators and/or native Excel functions.  Use $& to indicate where the
    ' entire match value should go, or $1 through $9 to use submatches 1 through 9
    
    ' This function uses Regular Expressions to parse a string, and replace parts of the string
    ' matching the specified pattern with another string.  In particular, this function replaces
    ' the specified range of matched values with the designated replacement string.  In a twist,
    ' though
    
    ' StartAt indicates the start of the range of matches to be replaced.  Thus, 2 indicates that
    ' the second match gets replaced.  Use zero to specify the last match.  Negative numbers
    ' indicate the Nth to last: -1 is the last, -2 the 2nd to last, etc
    
    ' EndAt indicates the end of the range of matches to be replaced.  Thus, a 5 would indicate
    ' that the 5th match is the last one to be replaced. Use zero to specify the last match.
    ' Negative numbers indicate the Nth to last: -1 is the last, -2 the 2nd to last, etc
    
    ' Thus, if you use StartAt = 2 and EndAt = 5, then the 2nd through 5th matches will be replaced.
    
    ' By default, RegExp is case-sensitive in pattern-matching.  To keep this, omit MatchCase or
    ' set it to True
    
    ' Note: Match.FirstIndex assumes that the first character position in a string is zero.  This
    ' differs from VBA and VB6, which has the first character at position 1
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    Dim TheMatches As Object
    Dim StartStr As String
    Dim WorkingStr As String
    Dim Counter As Long
    Dim arr() As String
    Dim StrStart As Long
    Dim StrEnd As Long
    Dim Counter2 As Long
    
    ' Instantiate RegExp object
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        
        ' First search needs to find all matches
        
        .Global = True
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
        
        ' Run RegExp to find the matches
        
        Set TheMatches = .Execute(LookIn)
        
        ' If there are no matches, no replacement need to happen
    
        If TheMatches.Count = 0 Then
            RegExpReplaceExpression = LookIn
            GoTo Cleanup
        End If
        
        ' Reset StartAt and EndAt if necessary based on matches actually found.  Escape if StartAt > EndAt
        
        Select Case StartAt
            Case Is > 0: 'no adjustment needed
            Case 0, -1: StartAt = TheMatches.Count
            Case Is < -TheMatches.Count: StartAt = -StartAt
            Case Else: StartAt = TheMatches.Count + StartAt + 1
        End Select
            
        Select Case EndAt
            Case Is > 0: 'no adjustment needed
            Case 0, -1: EndAt = TheMatches.Count
            Case Is < -TheMatches.Count: EndAt = -EndAt
            Case Else: EndAt = TheMatches.Count + EndAt + 1
        End Select
        
        If StartAt > EndAt Then
            RegExpReplaceExpression = LookIn
            GoTo Cleanup
        End If
        
        ' Now create an array for the partial strings.  The elements of the array correspond to...
        ' 0         : text before the 1st match
        ' 1         : the first match
        ' 2 * N - 2 : text between the (N - 1)th and the Nth match (repeat as needed)
        ' 2 * N - 1 : the Nth match (repeat as needed)
        ' X         : text after the last match (X = 2 * number of matches)
        
        ReDim arr(0 To 2 * TheMatches.Count) As String
        
        ' Loop through the matches to populate the array
        
        For Counter = 1 To TheMatches.Count
            
            ' If Counter = 1 then it's the first match, and we need the text before the first match.
            ' If not, then we need the text between the (N - 1)th and the Nth match
            
            If Counter = 1 Then
                arr(0) = Left(LookIn, TheMatches(0).FirstIndex)
            Else
                
                ' Starting character position for text between the (N - 1)th and the Nth match
                
                StrStart = TheMatches(Counter - 2).FirstIndex + TheMatches(Counter - 2).Length + 1
                
                ' Length of text between the (N - 1)th and the Nth match
                
                StrEnd = TheMatches(Counter - 1).FirstIndex - StrStart + 1
                arr(2 * Counter - 2) = Mid(LookIn, StrStart, StrEnd)
            End If
            
            ' Now we process the match.  If the match number is within the replacement range,
            ' then put the replacement value into an Evaluate expression, and place the result
            ' into the array.  If not, put in the match value
            
            If Counter >= StartAt And Counter <= EndAt Then
            
                ' $& stands in for the entire match
                
                Expression = Replace(Expression, "$&", TheMatches(Counter - 1))
                
                ' Now loop through the Submatches, if applicable, and make replacements
                
                For Counter2 = 1 To TheMatches(Counter - 1).Submatches.Count
                    Expression = Replace(Expression, "$" & Counter2, TheMatches(Counter - 1).Submatches(Counter2 - 1))
                Next
                
                ' Evaluate the expression
                
                arr(2 * Counter - 1) = Evaluate(Expression)
            Else
                arr(2 * Counter - 1) = TheMatches(Counter - 1)
            End If
            
            ' If Counter = TheMatches.Count then we need to get the text after the last match
            
            If Counter = TheMatches.Count Then
                StrStart = TheMatches(Counter - 1).FirstIndex + TheMatches(Counter - 1).Length + 1
                arr(UBound(arr)) = Mid(LookIn, StrStart)
            End If
        Next
    End With
    
    ' Use Join to concatenate the elements of the array for our answer
    
    RegExpReplaceExpression = Join(arr, "")
    
Cleanup:
    
    ' Clear object variables
    
    Set TheMatches = Nothing
    Set RegX = Nothing
    
End Function

Open in new window



=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you liked this article and want to see more from this author,  please click here.

If you found this article helpful, please click the Yes button near the:

      Was this article helpful?

label that is just below and to the right of this text.   Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Regular-Expressions-Examples.xls
78
Comment
38 Comments
 
LVL 47

Expert Comment

by:aikimark
very nice arcticle
0
 
LVL 15

Expert Comment

by:Eric AKA Netminder
Dang... I even understand it... sorta kinda.

Nice work, Patrick!
0
 
LVL 18

Expert Comment

by:WaterStreet
A very useful article.

Intermediate users can use the Excel examples in the attached workbook, and the code contained in its modules, to begin using these new functionalities.

It got my vote above.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
LVL 59

Expert Comment

by:Saurabh Singh Teotia
Patrick,

This something which i was looking forward to learn and there you go, You made my life easy as i dont have to refer to anywhere else to learn it.

Saurabh...
0
 
LVL 50

Expert Comment

by:Dave
It's more then a beautiful article, it's a regexp masterclass.

Bravo Patrick!
0
 
LVL 22

Expert Comment

by:spattewar
absolutely fantastic!!!

Knowledge and your experience make this article number one!

Thanks. Patrick for sharing.
0
 
LVL 21

Expert Comment

by:alainbryden
I actually had to use this today. Someone needed to extract file paths from a vlookup formula to an external workbook and the first thing I did was go to e-e, look up this article, and after 10 minutes and frequent references to the article for refresher, it was done.
0
 
LVL 21

Expert Comment

by:alainbryden
Too bad I already voted the article as helpful a couple days ago, because I would have liked to do so again today! :p

Alain
0
 
LVL 93

Author Comment

by:Patrick Matthews
Alain,

So glad you found it useful!  Which question was it?

Patrick
0
 
LVL 21

Expert Comment

by:alainbryden
Oh it was something in my own office, not on e-e. Even more practical!
0
 
LVL 45

Expert Comment

by:patrickab
Patrick,

That's an excellent and admirable piece of work. Very readable and most useful. I will keep it as a reference document and hopefully get to apply some of it. And you're a Genius to boot in 6 zones. Dunno where you find the time - amazing.

Patrick(A-B)
0
 
LVL 61

Expert Comment

by:Kevin Cross
Nice work, Patrick!

Voted yes above.
0
 
LVL 58

Expert Comment

by:harfang
Hi Patrick,

Since EE has an international audience, I think you should add a warning for non-English users. The sad thing about RegExp it that it has never been updated to work even with 8-bit ASCII or Windows code pages, let alone Unicode; it still believes all text is 7-bit.

    [a-z] doesn't match accented characters
    \b treats any accented character as a word boundary

To extract all words from the French sentence below and similar Western European texts, you need to use a rather indigestible pattern, which is code-page dependant.

    Text: « Ça va être la fête ce Noël à Genève ! »
    pattern: [a-zaS~ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]+

Working with Unicode (e.g. Arabic, Hebrew, Chinese) is a nightmare...

Your article is already excellent, but a section titled "limitations" or "problems" would make it even better.

(°v°)
0
 
LVL 58

Expert Comment

by:harfang
EE Unicode bug bite!
Instead of "aS~" above, read: inverted circumflex s, oe ligature, inverted circumflex z.
(°v°)
0
 
LVL 93

Author Comment

by:Patrick Matthews
Markus,

Thank you so much for your comments!  You are absolutely right about limitations around character sets.

Of course, I couldn't have said it better than you did :)

Patrick
0
 
LVL 58

Expert Comment

by:harfang
Feel free to plagiarize/recycle if you want! Exercise: can you find an expression matching true (i.e. Windows Western) word boundaries? -- (^v°)
0
 

Expert Comment

by:Glenn Stearns
Great article.  I'm going to pore over it a while and try to learn a little bit about regular expressions.  Thanks so much for posting your article...very helpful!
0
 
LVL 21

Expert Comment

by:alainbryden
I wish VB6 had a positive/negative lookbehind right now : (
0
 
LVL 41

Expert Comment

by:ralmada
Excellent article matthew! I've always find a big limitation not being able to work with regular expressions in Access. Not anymore with your RegExFind function !!!
0
 
LVL 50

Expert Comment

by:Dave
>I wish VB6 had a positive/negative lookbehind right now :

It's a fundamental limitation - even a crippling - for vbscript regular expressions  

Regards

Davew
0
 
LVL 1

Expert Comment

by:Alex Campbell
Very impressive to have use of RegEx in Excel.
0
 
LVL 2

Expert Comment

by:billb1057
Patrick,
Many thanks for a great article -- clearly written with lots of detailed explanations.  This is superb, and extremely useful.  I was glad to vote Yes!
Bill B.
0
 

Expert Comment

by:appinities
Thanks for share a nice one article..
0
 
LVL 21

Expert Comment

by:alainbryden
Seriously Patrick, this marks probably the 20th time I've come to this article for a refresher. I wish I could up-vote this a hundred times.
0
 
LVL 42

Expert Comment

by:dlmille
I was so involved reading your article, I forgot the question you recently answered with it!  Well worth the read and I can't wait to find applications for it.

This was the second I've found interesting and useful.  The first was the Dictionary class.  Accolades!

Dave
0
 
LVL 8

Expert Comment

by:lomo74
Patrick,
very nice article, thank you so much for sharing your knowledge with us.
- Lorenzo -
0
 

Expert Comment

by:zhshqzyc
Great artivle.
0
 
LVL 2

Expert Comment

by:Jeffrey Smith
Well, I won't pretend I've read and understood it all yet, but I've used Regular Expressions elsewhere and it's good to know that you have found a way to make that capability available in Excel and shared same.  Thanks again, Patrick, for another great article that will go in my tool bag.

Jeff
0
 
LVL 93

Author Comment

by:Patrick Matthews
Thanks for the kind words, Jeff
0
 
LVL 5

Expert Comment

by:rberke
I once created something similar,  but it is not as robust as yours.

Mine does have one feature you might consider adding.

I have a small library of regular expressions that I use almost every day.  My library allows me to extract integers, real numbers, email addresses, domain names and a few other special purpose items.

I created two macros which made it very easy to do this.

Sub CallPSR   ' < this is assigned to an Alt M hotkey by OnKeys
Command = inputbox ("enter command")
Call PasteSpecialRegularExpression(cmd)
End sub

Sub PasteSpecialRegularExpression(cmd)
Param = split(cmd)
…. Code that interprets the command and pastes the appropriate regular expression to Selection
End sub

For instance, lets say cell A1 contains   "Part 12455 can be found on pages 1, 3 and 5 ".

Lets say I want to get the page numbers but not the part number. Since page numbers are between 1 and 3 digits long, I do the following:

select [b1:d3] then key Alt M "psrdigits 1 3" <enter>

The following formulas get pasted into the selected cells

=RGXEXTRACT($A1,"(?:\D|\b)(\d{1,3})(?=\D|\b)",,,,,COLUMN(B1)-COLUMN($B1))
=RGXEXTRACT($A1,"(?:\D|\b)(\d{1,3})(?=\D|\b)",,,,,COLUMN(C1)-COLUMN($B1))
=RGXEXTRACT($A1,"(?:\D|\b)(\d{1,3})(?=\D|\b)",,,,,COLUMN(D1)-COLUMN($B1))

My original code used a single array formula stretched across all 3 cells, but I now feel an explicit formula in each cell is more flexible because it allows me to easily drag the right hand cell to get more cells.  The performance hit is irrelevant because I usually only have a couple of hundred rows and cells.

Normally I would paste my code here, but it would be too time consuming to convert it to use RegExpExtract instead of my version of it.




0
 
LVL 5

Expert Comment

by:rberke
Great examples in "Regular-Expressions-Examples.xls"  !!!

0
 
LVL 93

Author Comment

by:Patrick Matthews
rberke,

Great suggestion!

Patrick
0
 
LVL 93

Author Comment

by:Patrick Matthews
From http://www.experts-exchange.com/Software/Internet_Email/Web_Browsers/Q_27422790.html



In RegExpReplace, I indicated that you can use $1, $2, $3 etc to incorporate a submatch result in the replacement string.  If you need to use a literal $ and then a digit, simply double up the $.
0
 
LVL 42

Expert Comment

by:dlmille
Patrick - I use several of your articles as a regular resource (and some methods just roll off the fingertips, today :).  In my most recent article, http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/A_8933-How-to-quickly-and-accurately-populate-Word-documents-with-Excel-data-charts-and-images-including-Automated-Bookmark-generation.html, I used your code (see comment below):

*Note:  The code for finding bookmarks in Word was achieved leveraging RegEx – specifically the RegExpFind function (see matthewspatrick's most excellent article on the subject: http:\A_1336.html).  My RegEx pattern \[{2}[A-Za-z0-9_]+\]{2} looks for the [[ ]] brackets to define where the bookmark name would be, with alphanumeric and underscore allowed inside the brackets.  Bookmark indicators with any other characters or spaces are included inside the brackets will be ignored (Word doesn't allow spaces in Bookmark names, and through convenience, disallowed any other characters).  Check your template_BM.dot to ensure all your Bookmarks were created, just to be sure you didn’t use any disallowed characters!

I did, however, need to make one fairly minor change to work most effectively in my routines, basically to return NULL as opposed to empty string (example below):
 
' If there are no matches, return empty string
    
    Else
        RegExpFind = Null 'updated by dlmille of E-E for testing of null result if empty or array without generating error
    End If

Open in new window


This seems fairly obvious, and anyone can modify to suit the needs of their calling apps with reuseable code.  The point I'd like to make is that your code was so well documented, it was a pleasure jumping right in and tweaking with literally no trouble at all.  In fact, your code has made me a bit lazy, as I've not put much attention into learning the RegEx object as your routines come to mind, every time, lol ;)

Cheers,

Dave
' If there are no matches, return empty string
    
    Else
        RegExpFind = Null 'updated by dlmille of E-E for testing of null result if empty or array without generating error
    End If

Open in new window

0
 
LVL 93

Author Comment

by:Patrick Matthews
Dave,

I can think of no higher praise for my articles than for colleagues like you to say that you use them and learn from them!

Nice work on amending that to return a null rather than an empty string.  I can see how that would be handy.

You've become quite the accomplished author yourself!

Cheers,

Patrick
0
 
LVL 93

Author Comment

by:Patrick Matthews
This article has been updated to indicate that "lazy" quantifiers--that is, non-greedy--are supported, and that the following additional tokens are available in the Replace method:

$+   Last submatch
$`   Text prior to match
$'   Text after match
$_   Entire input string

Open in new window

0
 
LVL 1

Expert Comment

by:Stephen Byrom
This computer hobby of mine is slowly becoming very time consuming with stuff for work, and other people. It's a good job there are people out there like Patrick who are willing to impart invaluable knowledge for us lesser mortals, who are just scraping at the edge of coding.

Thanks
0
 
LVL 24

Expert Comment

by:Steve
As ever a very concise clear and useful article.
Prety much all I needed to know in one place.
Nice job,
Steve.
0

Featured Post

Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

Join & Write a Comment

Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month