By Patrick Matthews (matthewspatrick, http://www.experts-exchang
Introduction
Regular Expressions (RegExp) is a class that offers incredibly powerful text parsing capabilities, in particular allowing users to find (and even replace) substrings within larger text entries that fit a particular pattern. This article provides basic information on the VBScript implementation of RegExp, and how you can use it to gain robust text parsing capabilities in Visual Basic 6 (VB6) and Visual Basic for Applications (VBA) projects. With a little knowledge of the rules and some imagination, you can build incredibly flexible and/or complex patterns.
While this article may be most useful to users with intermediate or advanced programming skills, users of all skill levels (beginners included) can benefit from using RegExp by implementing the "wrapper" functions included in this article. (This article includes a sample Excel workbook that provides examples for each function.) By adding these functions to your VB6/VBA projects, you can quickly and easily incorporate RegExp functionality into your applications. If you are a Microsoft Office user, you can even use these functions in worksheet formulas for Microsoft Excel or in queries in Microsoft Access.
These new functions are:
- RegExpFind (Returns information for a single match from the input text, or an array of all the matches. Info returned can be the match Value, starting character position, or Length for a single match, or for all matches)
- RegExpFindExtended (Returns a 2-D array of the Value, starting character position, or Length)
- RegExpFindSubmatch (Returns a 2-D array of submatches for the various matches in the input text)
- RegExpReplace (Returns a string in which either all matches are replaced with the replacement text, or just the first match)
- RegExpReplaceExpression (Excel only, allows you to build an expression using arithmetic operators and/or Excel functions to manipulate the replacement text)
- RegExpReplaceRange (Returns a string in which the Xth to Yth matches are replaced)
Quick Examples Using RegExp
The following two examples briefly illustrate the usefulness of RegExp, and also you might use some of the included wrapper functions to address each problem.
Consider the following text:
"He hit the ball. Then he ran. The crowd was cheering! How did he feel? I felt so energized!"
Suppose you needed to extract the first word from each sentence. You could use a pattern such as:
1: | "(^|([\.!?] +))([a-z]+)" |
That pattern matches the following:
- The beginning of the string OR (period, exclamation point, or question mark followed by 1+ spaces) {This whole item taken together is the first submatch; the period, exclamation point, or question mark followed by 1+ spaces is the 2nd submatch}
- Then, 1+ characters from a-z {3rd submatch}
Using one of the wrapper functions, this would return an array of the first words in each sentence:
1: 2: | RegExpFindSubmatch(MyText, "(^|([\.!?] +))([a-z]+)", , 3, False) Returns "He", "Then", "The", "How", "I" |
If you wanted to extract the last word in each sentence, you could use this pattern instead:
1: | "[a-z]+(?=[\.!\?])" |
That pattern matches the following:
- 1+ characters from a-z
- Then, a "look ahead" to see if it is followed by a period, exclamation point, or question mark. In a look ahead, RegExp checks to see whether the look ahead condition is met, but does not include the look ahead in the match that gets returned
Using a wrapper function, this returns an array of the last word in each sentence:
1: 2: | RegExpFind(MyText, "[a-z]+(?=[\.!\?])", , False) Returns "ball", "ran", "cheering", "feel", "energized" |
For another example (based on EE question http://www.experts-exchang
http://www.abc.com
www.xyz.com
123.mno.net
http://www.abc.com/categor
www.abc.net/category/sub-c
xyz.info/index.php
www.abc.com:8082/
Suppose that, for each entry, you had to extract the site address (e.g., www.abc.com). You could use the following pattern, replacing all text matching that pattern with a zero-length string:
1: | "http://|[:/].*" |
That pattern matches the following:
- http:// OR
- (colon or forward slash) followed by zero or more characters of any type
If you were looping through an array of log entries, you could use this to extract the address:
1: | RegExpReplace(LogArr(i), "http://|[:/].*", "", True, False) |
If instead you needed to extract just the top-level domain (e.g., com, net, info), you could use this pattern instead:
1: | "(http://)?([\w-]+\.)([\w-]+\.)?([\w-]+)([:/].*)?" |
That pattern matches the following:
- http:// appearing zero or one time {1st submatch}
- Then, a word character or dash appearing 1+ times followed by a period {2nd submatch}
- Then, a word character or dash appearing 1+ times followed by a period, the whole block appearing zero or one time {3rd submatch}
- Then, a word character or dash appearing 1+ times {4th submatch}
- Then, a colon or forward slash followed by zero or more characters of any type {5th submatch}
In this case, the 4th submatch will be the top-level domain. To get it, you could use one of two expressions:
1: 2: 3: 4: 5: | RegExpReplace(LogArr(i), "(http://)?([\w-]+\.)([\w-]+\.)?([\w-]+)([:/].*)?", "$4", True, False) or =RegExpFindSubmatch(LogArr(i), "(http://)?([\w-]+\.)([\w-]+\.)?([\w-]+)([:/].*)?", 1, 4, False, False) |
If instead this log were in an Access table, you could use these wrapper functions in a query such as this:
1: 2: 3: 4: 5: 6: |
SELECT LogEntry,
RegExpReplace(LogArr(i), "http://|[:/].*", "", True, False) AS Address,
RegExpReplace(LogArr(i), "(http://)?([\w-]+\.)([\w-]+\.)?([\w-]+)([:/].*)?",
"$4", True, False) AS TopLevel
FROM LogTable |
The VBScript RegExp Class
Regular Expressions may be most familiar to users of the Perl programming language, but many other languages incorporate similar functionality. This article concerns how to use the RegExp class implemented in VBScript 5.5 to enhance your text parsing capabilities in your Visual Basic for Applications (and thus Microsoft Excel, Access, or any other application that supports VBA) and Visual Basic 6 projects. Because this article focuses on the specific VBScript implementation, there may be some differences in syntax and/or features when compared to other Regular Expressions implementations. (For example, the VBScript implementation does not support the "look behind" capabilities in Perl and the .Net framework.)
To use RegExp in your VB6 or VBA projects, you must instantiate a RegExp object. If you are using early binding, then you would add a reference to the "Microsoft VBScript Regular Expressions 5.5" library. (Do not use the version 1.0 library, as it lacks many of the best features of RegExp and is provided solely for backward compatibility.) You would then declare the variable in your code this way:
1: 2: | Dim RegX As VBScript_RegExp_55.RegExp Set RegX = New VBScript_RegExp_55.RegExp |
Using late binding, you would do it this way:
1: 2: | Dim RegX As Object
Set RegX = CreateObject("VBScript.RegExp")
|
Once you have the object instantiated, you will have to set certain properties of the RegExp class:
- Global: This determines whether a RegExp method will try to find only the first match (False), or all matches (True). The default value is False.
- IgnoreCase: This determines whether the pattern match is case-sensitive; the default is False (thus making pattern matching case-sensitive by default). If IgnoreCase is True, then a pattern of [a-z] or [A-Z] is equivalent to a pattern of [a-zA-Z].
- Multiline: Indicates how RegExp should handle input text with line breaks. In particular, it determines how the ^ (start of input) and $ (end of input) behave: if False, then these match only the very beginning and very end of the entire input string. If True, then ^ and $ match the beginning and end of each line in the input string. The default value is False.
- Pattern: This is the pattern that RegExp tries to match. Patterns can be relatively simple, or highly complex. For example, the pattern
1:
circus
will look only for the six consecutive characters "circus" (and, if IgnoreCase = False, then those characters would have to be lower case to make a match). On the other hand:1:
circus(es)?
will match "circus" or "circuses". Further, the following pattern matches circus, circuses, pony, or ponies:1:
circus(es)?|pon(y|ies)
Please see the Quick Primer on Pattern Syntax for a few simple tips on Pattern syntax to get you started. The functionality is incredibly rich, however, and for fuller treatments of the subject you should look in the Further Reading section of this article for web sites and books that provide excellent information on the RegExp Pattern syntax.
After you have set the RegExp properties to appropriate values, you then have to decide which method to use. The RegExp class exposes three methods:
- Execute: This method returns a Matches collection (of Match objects), containing the various matches to the specified pattern found in the input string (which is passed to the Execute method as an argument). The Global property determines whether the collection contains all matches (True) or just the first match (False).
- Replace: This method takes two arguments. The first argument is the string to be searched for pattern matches, and the second is the replacement text to be used if a match is found. The Global property determines whether RegExp replaces all matches (True) or just the first match (False). The Replace method returns a string that includes all applicable replacements. (Note that if there are no matches found, then the output and the original input are identical.) To incorporate the entire match into the replacement text, use $& to stand in for the match, and to use a submatch in the replacement text, use $1, $2, $3, etc. to use the nth submatch.
- Test: This method takes a single argument, the input string, and returns a Boolean True/False to indicate whether there is at least one match in the input string.
If you need to work with the various matches, then you work through the Matches collection and the various Match objects it contains. The Matches collection has the following properties:
- Count: This is the number of Match objects contained in the collection
- Item: This returns the indicated Match object based on an index (the argument for this property). Please note that the first item takes the index number zero, and thus the index for the last item is one less than the Count. The Item property is the default property for the Matches collection, so is equivalent to
1:
MyMatches(0)
.1:
MyMatches.Item(0)
For example, this code enumerates the Matches:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: | Dim RegX As Object, Mats As Object, Counter As Long
Set RegX = CreateObject("VBScript.RegExp")
With RegX
.Global = True
.Pattern = "\d+"
Set Mats = .Execute("1 turtle, 15 rabbits, and 1078 swans")
End With
For Counter = 0 To Mats.Count - 1
MsgBox Mats(Counter)
Next
Set Mats = Nothing
Set RegX = Nothing
|
That code will present three MsgBoxes, showing 1, 15, and 1078 respectively.
To work further with the individual Match objects, you can use the following properties:
- FirstIndex: The starting position of the Match value within the original input string. Please be aware that this property treats the first character in the input string as position zero, whereas VB6 and VBA (as well as Access and Excel) typically treat the first character as position 1. The sample functions I provide here correct for this.
- Length: The number of characters in the Match value
- Submatches: This returns a Submatches collection composed of the various Submatch objects found within the Match.
- Value: The string of the Match itself. Value is the Match object's default property, so in the code example above is equivalent to
1:
Mats(Counter)
.1:
Mats(Counter).Value
Drilling down even further, a Match may have Submatches. A Submatch is a portion of a Match that corresponds to part of the Pattern bracketed by parentheses. For example, you could use a pattern such as this to match IP addresses in an input string:
1: | (\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})
|
Suppose that the input string were:
I thought her IP address was 100.200.300.400, but it was really 99.99.99.99
There would be two matches, 100.200.300.400 and 99.99.99.99. In addition, each match would have four submatches; in the first case these submatches are 100, 200, 300, and 400. These submatches correspond to the portions of the pattern enclosed in parentheses.
The Submatches collection has the following properties:
- Count: This is the number of Submatch objects contained in the collection. There will always be as many Submatch objects as there are groups in the pattern enclosed in parentheses.
- Item: This returns the indicated Submatch object based on an index (the argument for this property). Please note that the first item takes the index number zero, and thus the index for the last item is one less than the Count. The Item property is the default property for the Submatches collection, so is equivalent to
1:
MyMatch.Submatches(0)
.1:
MyMatch.Submatches.Item(0)
The Submatch object has a single property, Value. This Value is simply the text comprising the Submatch. If there is no match found for the indicated group in the pattern, then the Submatch value is a zero-length string.
Quick Primer on Pattern Syntax
This section includes some basic pointers on Pattern syntax to help you get started. It is, however, by no means complete, and for more detailed treatments of the subject you should review the Web sites and books in the Further Reading section of this article.
Keep in mind that in all cases, whether a match is case sensitive or not depends on the setting for RegExp.IgnoreCase.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: |
Symbol Notes
--------------------------------------------------------------------------------------------
abc, ABC Matches "abc" or "ABC"
[abc] Matches any character within the brackets
[^abc] Matches any character not within the brackets
[a-n], [0-5] Matches any character within the range of letters or digits in brackets
[^a-n], [^0-5] Matches any character not within the range of letters or digits in brackets
. Matches any single character except a new line character
A|B Matches "A" or "B"
() Creates a subgroup within the pattern
{x} Matches the preceding element exactly x times
{x,y} Matches the preceding element between x and y times
{x,} Matches the preceding element at least x times
{,y} Matches the preceding element between zero and y times
* Matches preceding element zero or more times
? Matches preceding element zero or one times
+ Matches preceding element one or more times
^ Matches start of input (or start of line if RegExp.Multiline = True)
$ Matches end of input (or end of line if RegExp.Multiline = True)
\d Matches a digit (same as [0-9])
\D Matches a non-digit (same as [^0-9])
\b Matches a word boundary
\B Matches a non-word boundary
\f Form feed
\n New line
\r Carriage return
\t Tab
\v Vertical tab
\s Matches a "white space" character, same as [ \f\n\r\t\v]
\S Matches a non-"white space" character, same as [^ \f\n\r\t\v]
\w Matches a "word" character plus underscore, same as [A-Za-z0-9_]
\W Matches a non-"word" character plus underscore, same as [^A-Za-z0-9_]
\ "Escapes" a special character.
For example:
To match the backslash, use \\
To match the $, use \$
To match the period, use \.
x(?=y) Positive lookahead
X is a pattern you want to match
Y is a pattern you want to see immediately follow the pattern you are
looking for, but is not included in the match
Example: Suppose you have a part number that is always four digits,
with an optional suffix a, b, c, etc. You want to grab all matches
that look like part numbers with suffixes, but you want to exclude
the suffix from the match.
You could use a pattern \d{4}(?=[a-z]).
If the input string is "1234, 2345x, 3456", it matches "2345".
x(?!y) Negative lookahead
X is a pattern you want to match
Y is a pattern you do not want to see immediately follow the pattern
you are looking for, but is not included in the match
Example: Suppose you have a part number that is always four digits,
with an optional suffix a, b, c, etc. You want to grab all matches that
look like part numbers, but you want to exclude part numbers with suffixes.
You could use a pattern \d{4}(?![a-z])
If the input string is "1234, 2345x, 3456", it matches "1234" and "3456". |
Further Reading
For further reading, I recommend the following web sites and books:
For an excellent tutorial on RegExp, including information on the various language-specific implementations, go to: "Using Regular Expressions with Client Side Scripting", http://www.regular-express
This site provides a very complete listing of the syntax for RegExp as implemented in VBScript and Jscript: "Regular Expression Syntax (Scripting)", http://msdn.microsoft.com/
For general information on VBScript RegExp: "Microsoft Beefs Up VBScript with Regular Expressions", http://msdn.microsoft.com/
Another site for general information: "An Introduction to Regular Expression with VBScript", http://www.4guysfromrolla.
I keep the following book at my desk for quick reference on RegExp and other VBScript matters: VBScript in a Nutshell, written by Lomax, Childs, and Petrusha. Its descriptions of the various RegExp-related objects, properties, and methods are useful, but are a bit incomplete. (For example, the book makes no mention of submatches or "look ahead".)
If you really want to master RegExp, then I suggest Mastering Regular Expressions, by Friedl. That book is not for the faint-hearted, but if you really want to know about how RegExp works its magic, and how you can get the most out of RegExp, it is worth the effort.
For additional examples and practical applications, I also recommend the Regular Expressions Cookbook, by Goyvaerts and Levithan.
Acknowledgements
I would like to thank EE Member brettdj (http://www.experts-exchan
Code Samples
Below is source code for six "wrapper" user defined functions you may use and freely distribute, as long as you properly attribute authorship and where you found the code. You will also find these functions in the included sample file "Regular Expressions Examples.xls".
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 64: 65: 66: 67: 68: 69: 70: 71: 72: 73: 74: 75: 76: 77: 78: 79: 80: 81: 82: 83: 84: 85: 86: 87: 88: 89: 90: 91: 92: 93: 94: 95: 96: 97: 98: 99: 100: 101: 102: 103: 104: 105: 106: 107: 108: 109: 110: 111: 112: 113: 114: 115: 116: 117: 118: 119: 120: 121: 122: 123: 124: 125: 126: 127: 128: 129: 130: 131: 132: 133: 134: 135: 136: 137: 138: 139: 140: 141: 142: 143: 144: 145: 146: 147: 148: 149: 150: 151: 152: 153: 154: 155: 156: 157: 158: 159: | Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
Optional MatchCase As Boolean = True, Optional ReturnType As Long = 0, _
Optional MultiLine As Boolean = False)
' Function written by Patrick G. Matthews. You may use and distribute this code freely,
' as long as you properly credit and attribute authorship and the URL of where you
' found the code
' This function relies on the VBScript version of Regular Expressions, and thus some of
' the functionality available in Perl and/or .Net may not be available. The full extent
' of what functionality will be available on any given computer is based on which version
' of the VBScript runtime is installed on that computer
' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
' pattern (PatternStr). Use Pos to indicate which match you want:
' Pos omitted : function returns a zero-based array of all matches
' Pos = 1 : the first match
' Pos = 2 : the second match
' Pos = <positive integer> : the Nth match
' Pos = 0 : the last match
' Pos = -1 : the last match
' Pos = -2 : the 2nd to last match
' Pos = <negative integer> : the Nth to last match
' If Pos is non-numeric, or if the absolute value of Pos is greater than the number of
' matches, the function returns an empty string. If no match is found, the function returns
' an empty string. (Earlier versions of this code used zero for the last match; this is
' retained for backward compatibility)
' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
' ReturnType indicates what information you want to return:
' ReturnType = 0 : the matched values
' ReturnType = 1 : the starting character positions for the matched values
' ReturnType = 2 : the lengths of the matched values
' If you use this function in Excel, you can use range references for any of the arguments.
' If you use this in Excel and return the full array, make sure to set up the formula as an
' array formula. If you need the array formula to go down a column, use TRANSPOSE()
' Note: RegExp counts the character positions for the Match.FirstIndex property as starting
' at zero. Since VB6 and VBA has strings starting at position 1, I have added one to make
' the character positions conform to VBA/VB6 expectations
' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
' where a large number of calls to this function are made, making RegX a static variable that
' preserves its state in between calls significantly improves performance
Static RegX As Object
Dim TheMatches As Object
Dim Answer()
Dim Counter As Long
' Evaluate Pos. If it is there, it must be numeric and converted to Long
If Not IsMissing(Pos) Then
If Not IsNumeric(Pos) Then
RegExpFind = ""
Exit Function
Else
Pos = CLng(Pos)
End If
End If
' Evaluate ReturnType
If ReturnType < 0 Or ReturnType > 2 Then
RegExpFind = ""
Exit Function
End If
' Create instance of RegExp object if needed, and set properties
If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
With RegX
.Pattern = PatternStr
.Global = True
.IgnoreCase = Not MatchCase
.MultiLine = MultiLine
End With
' Test to see if there are any matches
If RegX.Test(LookIn) Then
' Run RegExp to get the matches, which are returned as a zero-based collection
Set TheMatches = RegX.Execute(LookIn)
' Test to see if Pos is negative, which indicates the user wants the Nth to last
' match. If it is, then based on the number of matches convert Pos to a positive
' number, or zero for the last match
If Not IsMissing(Pos) Then
If Pos < 0 Then
If Pos = -1 Then
Pos = 0
Else
' If Abs(Pos) > number of matches, then the Nth to last match does not
' exist. Return a zero-length string
If Abs(Pos) <= TheMatches.Count Then
Pos = TheMatches.Count + Pos + 1
Else
RegExpFind = ""
GoTo Cleanup
End If
End If
End If
End If
' If Pos is missing, user wants array of all matches. Build it and assign it as the
' function's return value
If IsMissing(Pos) Then
ReDim Answer(0 To TheMatches.Count - 1)
For Counter = 0 To UBound(Answer)
Select Case ReturnType
Case 0: Answer(Counter) = TheMatches(Counter)
Case 1: Answer(Counter) = TheMatches(Counter).FirstIndex + 1
Case 2: Answer(Counter) = TheMatches(Counter).Length
End Select
Next
RegExpFind = Answer
' User wanted the Nth match (or last match, if Pos = 0). Get the Nth value, if possible
Else
Select Case Pos
Case 0 ' Last match
Select Case ReturnType
Case 0: RegExpFind = TheMatches(TheMatches.Count - 1)
Case 1: RegExpFind = TheMatches(TheMatches.Count - 1).FirstIndex + 1
Case 2: RegExpFind = TheMatches(TheMatches.Count - 1).Length
End Select
Case 1 To TheMatches.Count ' Nth match
Select Case ReturnType
Case 0: RegExpFind = TheMatches(Pos - 1)
Case 1: RegExpFind = TheMatches(Pos - 1).FirstIndex + 1
Case 2: RegExpFind = TheMatches(Pos - 1).Length
End Select
Case Else ' Invalid item number
RegExpFind = ""
End Select
End If
' If there are no matches, return empty string
Else
RegExpFind = ""
End If
Cleanup:
' Release object variables
Set TheMatches = Nothing
End Function
|
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 64: 65: 66: 67: 68: 69: 70: 71: 72: 73: 74: 75: 76: 77: 78: 79: 80: 81: 82: 83: 84: 85: 86: 87: 88: 89: 90: 91: 92: 93: 94: 95: 96: 97: 98: 99: 100: 101: 102: 103: 104: 105: 106: 107: 108: 109: 110: 111: 112: 113: 114: 115: 116: 117: 118: 119: 120: 121: 122: 123: 124: 125: 126: 127: 128: 129: 130: 131: 132: 133: 134: 135: 136: 137: 138: 139: 140: 141: 142: 143: 144: | Function RegExpFindExtended(LookIn As String, PatternStr As String, Optional Pos, _
Optional MatchCase As Boolean = True, Optional MultiLine As Boolean = False)
' Function written by Patrick G. Matthews. You may use and distribute this code freely,
' as long as you properly credit and attribute authorship and the URL of where you
' found the code
' This function relies on the VBScript version of Regular Expressions, and thus some of
' the functionality available in Perl and/or .Net may not be available. The full extent
' of what functionality will be available on any given computer is based on which version
' of the VBScript runtime is installed on that computer
' This function uses Regular Expressions to parse a string (LookIn), and returns a 0-(N-1), 0-2
' array of the matched values (position 0 for the 2nd dimension), the starting character
' positions (position 1 for the 2nd dimension), and the length of the matched values (position 2
' for the 2nd dimension)
' Use Pos to indicate which match you want:
' Pos omitted : function returns a zero-based array of all matches
' Pos = 1 : the first match
' Pos = 2 : the second match
' Pos = <positive integer> : the Nth match
' Pos = 0 : the last match
' Pos = -1 : the last match
' Pos = -2 : the 2nd to last match
' Pos = <negative integer> : the Nth to last match
' If Pos is non-numeric, or if the absolute value of Pos is greater than the number of
' matches, the function returns an empty string. If no match is found, the function returns
' an empty string.
' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
' If you use this function in Excel, you can use range references for any of the arguments.
' If you use this in Excel and return the full array, make sure to set up the formula as an
' array formula. If you need the array formula to go down a column, use TRANSPOSE()
' Note: RegExp counts the character positions for the Match.FirstIndex property as starting
' at zero. Since VB6 and VBA has strings starting at position 1, I have added one to make
' the character positions conform to VBA/VB6 expectations
' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
' where a large number of calls to this function are made, making RegX a static variable that
' preserves its state in between calls significantly improves performance
Static RegX As Object
Dim TheMatches As Object
Dim Answer()
Dim Counter As Long
' Evaluate Pos. If it is there, it must be numeric and converted to Long
If Not IsMissing(Pos) Then
If Not IsNumeric(Pos) Then
RegExpFindExtended = ""
Exit Function
Else
Pos = CLng(Pos)
End If
End If
' Create instance of RegExp object
If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
With RegX
.Pattern = PatternStr
.Global = True
.IgnoreCase = Not MatchCase
.MultiLine = MultiLine
End With
' Test to see if there are any matches
If RegX.Test(LookIn) Then
' Run RegExp to get the matches, which are returned as a zero-based collection
Set TheMatches = RegX.Execute(LookIn)
If Not IsMissing(Pos) Then
If Pos < 0 Then
If Pos = -1 Then
Pos = 0
Else
' If Abs(Pos) > number of matches, then the Nth to last match does not
' exist. Return a zero-length string
If Abs(Pos) <= TheMatches.Count Then
Pos = TheMatches.Count + Pos + 1
Else
RegExpFindExtended = ""
GoTo Cleanup
End If
End If
End If
End If
' If Pos is missing, user wants array of all matches. Build it and assign it as the
' function's return value
If IsMissing(Pos) Then
ReDim Answer(0 To TheMatches.Count - 1, 0 To 2)
For Counter = 0 To UBound(Answer)
Answer(Counter, 0) = TheMatches(Counter)
Answer(Counter, 1) = TheMatches(Counter).FirstIndex + 1
Answer(Counter, 2) = TheMatches(Counter).Length
Next
RegExpFindExtended = Answer
' User wanted the Nth match (or last match, if Pos = 0). Get the Nth value, if possible
Else
Select Case Pos
Case 0 ' Last match
ReDim Answer(0 To 0, 0 To 2)
Answer(0, 0) = TheMatches(TheMatches.Count - 1)
Answer(0, 1) = TheMatches(TheMatches.Count - 1).FirstIndex + 1
Answer(0, 2) = TheMatches(TheMatches.Count - 1).Length
RegExpFindExtended = Answer
Case 1 To TheMatches.Count ' Nth match
ReDim Answer(0 To 0, 0 To 2)
Answer(0, 0) = TheMatches(Pos - 1)
Answer(0, 1) = TheMatches(Pos - 1).FirstIndex + 1
Answer(0, 2) = TheMatches(Pos - 1).Length
RegExpFindExtended = Answer
Case Else ' Invalid item number
RegExpFindExtended = ""
End Select
End If
' If there are no matches, return empty string
Else
RegExpFindExtended = ""
End If
Cleanup:
' Release object variables
Set TheMatches = Nothing
End Function |
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 64: 65: 66: 67: 68: 69: 70: 71: 72: 73: 74: 75: 76: 77: 78: 79: 80: 81: 82: 83: 84: 85: 86: 87: 88: 89: 90: 91: 92: 93: 94: 95: 96: 97: 98: 99: 100: 101: 102: 103: 104: 105: 106: 107: 108: 109: 110: 111: 112: 113: 114: 115: 116: 117: 118: 119: 120: 121: 122: 123: 124: 125: 126: 127: 128: 129: 130: 131: 132: 133: 134: 135: 136: 137: 138: 139: 140: 141: 142: 143: 144: 145: 146: 147: 148: 149: 150: 151: 152: 153: 154: 155: 156: 157: 158: 159: 160: 161: 162: 163: 164: 165: 166: 167: 168: 169: 170: 171: 172: 173: 174: 175: 176: 177: 178: 179: 180: 181: 182: 183: 184: 185: 186: 187: 188: 189: 190: 191: 192: 193: 194: 195: 196: 197: 198: 199: 200: 201: 202: 203: 204: 205: 206: 207: 208: 209: 210: 211: 212: 213: 214: 215: 216: 217: 218: 219: 220: 221: 222: 223: 224: 225: 226: 227: 228: 229: 230: 231: 232: 233: 234: 235: 236: 237: 238: 239: 240: 241: 242: 243: |
Function RegExpFindSubmatch(LookIn As String, PatternStr As String, Optional MatchPos, _
Optional SubmatchPos, Optional MatchCase As Boolean = True, _
Optional MultiLine As Boolean = False)
' Function written by Patrick G. Matthews. You may use and distribute this code freely,
' as long as you properly credit and attribute authorship and the URL of where you
' found the code
' This function relies on the VBScript version of Regular Expressions, and thus some of
' the functionality available in Perl and/or .Net may not be available. The full extent
' of what functionality will be available on any given computer is based on which version
' of the VBScript runtime is installed on that computer
' This function uses Regular Expressions to parse a string (LookIn), and return "submatches"
' from the various matches to a pattern (PatternStr). In RegExp, submatches within a pattern
' are defined by grouping portions of the pattern within parentheses.
' Use MatchPos to indicate which match you want:
' MatchPos omitted : function returns results for all matches
' MatchPos = 1 : the first match
' MatchPos = 2 : the second match
' MatchPos = <positive integer> : the Nth match
' MatchPos = 0 : the last match
' MatchPos = -1 : the last match
' MatchPos = -2 : the 2nd to last match
' MatchPos = <negative integer> : the Nth to last match
' Use SubmatchPos to indicate which match you want:
' SubmatchPos omitted : function returns results for all submatches
' SubmatchPos = 1 : the first submatch
' SubmatchPos = 2 : the second submatch
' SubmatchPos = <positive integer> : the Nth submatch
' SubmatchPos = 0 : the last submatch
' SubmatchPos = -1 : the last submatch
' SubmatchPos = -2 : the 2nd to last submatch
' SubmatchPos = <negative integer> : the Nth to last submatch
' The return type for this function depends on whether your choice for MatchPos is looking for
' a single value or for potentially many. All arrays returned by this function are zero-based.
' When the function returns a 2-D array, the first dimension is for the matches and the second
' dimension is for the submatches
' MatchPos omitted, SubmatchPos omitted: 2-D array of submatches for each match. First dimension
' based on number of matches (0 to N-1), second dimension
' based on number of submatches (0 to N-1)
' MatchPos omitted, SubmatchPos used : 2-D array (0 to N-1, 0 to 0) of the specified submatch
' from each match
' MatchPos used, SubmatchPos omitted : 2-D array (0 to 0, 0 to N-1) of the submatches from the
' specified match
' MatchPos used, SubmatchPos used : String with specified submatch from specified match
' For any submatch that is not found, the function treats the result as a zero-length string
' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
' If you use this function in Excel, you can use range references for any of the arguments.
' If you use this in Excel and return the full array, make sure to set up the formula as an
' array formula. If you need the array formula to go down a column, use TRANSPOSE()
' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
' where a large number of calls to this function are made, making RegX a static variable that
' preserves its state in between calls significantly improves performance
Static RegX As Object
Dim TheMatches As Object
Dim Mat As Object
Dim Answer() As String
Dim Counter As Long
Dim SubCounter As Long
' Evaluate MatchPos. If it is there, it must be numeric and converted to Long
If Not IsMissing(MatchPos) Then
If Not IsNumeric(MatchPos) Then
RegExpFindSubmatch = ""
Exit Function
Else
MatchPos = CLng(MatchPos)
End If
End If
' Evaluate SubmatchPos. If it is there, it must be numeric and converted to Long
If Not IsMissing(SubmatchPos) Then
If Not IsNumeric(SubmatchPos) Then
RegExpFindSubmatch = ""
Exit Function
Else
SubmatchPos = CLng(SubmatchPos)
End If
End If
' Create instance of RegExp object
If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
With RegX
.Pattern = PatternStr
.Global = True
.IgnoreCase = Not MatchCase
.MultiLine = MultiLine
End With
' Test to see if there are any matches
If RegX.Test(LookIn) Then
' Run RegExp to get the matches, which are returned as a zero-based collection
Set TheMatches = RegX.Execute(LookIn)
' If MatchPos is missing, user either wants array of all the submatches for each match, or an
' array of all the specified submatches for each match. Build it and assign it as the
' function's return value
If IsMissing(MatchPos) Then
' Return value is a 2-D array of all the submatches for each match
If IsMissing(SubmatchPos) Then
For Counter = 0 To TheMatches.Count - 1
Set Mat = TheMatches(Counter)
' To determine how many submatches there are we need to first evaluate a match. That
' is why we redim the array inside the for/next loop
If Counter = 0 Then
ReDim Answer(0 To TheMatches.Count - 1, 0 To Mat.Submatches.Count - 1) As String
End If
' Loop through the submatches and populate the array. If the Nth submatch is not
' found, RegExp returns a zero-length string
For SubCounter = 0 To UBound(Answer, 2)
Answer(Counter, SubCounter) = Mat.Submatches(SubCounter)
Next
Next
' Return value is a 2-D array of the specified submatch for each match.
Else
For Counter = 0 To TheMatches.Count - 1
Set Mat = TheMatches(Counter)
' To determine how many submatches there are we need to first evaluate a match. That
' is why we redim the array inside the for/next loop. If SubmatchPos = 0, then we want
' the last submatch. In that case reset SubmatchPos so it equals the submatch count.
' Negative number indicates Nth to last; convert that to applicable "positive" position
If Counter = 0 Then
ReDim Answer(0 To TheMatches.Count - 1, 0 To 0) As String
Select Case SubmatchPos
Case Is > 0: 'no adjustment needed
Case 0, -1: SubmatchPos = Mat.Submatches.Count
Case Is < -Mat.Submatches.Count: SubmatchPos = -SubmatchPos
Case Else: SubmatchPos = Mat.Submatches.Count + SubmatchPos + 1
End Select
End If
' Populate array with the submatch value. If the submatch value is not found, or if
' SubmatchPos > the count of submatches, populate with a zero-length string
If SubmatchPos <= Mat.Submatches.Count Then
Answer(Counter, 0) = Mat.Submatches(SubmatchPos - 1)
Else
Answer(Counter, 0) = ""
End If
Next
End If
RegExpFindSubmatch = Answer
' User wanted the info associated with the Nth match (or last match, if MatchPos = 0)
Else
' If MatchPos = 0 then make MatchPos equal the match count. If negative (indicates Nth
' to last), convert to equivalent position.
Select Case MatchPos
Case Is > 0: 'no adjustment needed
Case 0, -1: MatchPos = TheMatches.Count
Case Is < -TheMatches.Count: MatchPos = -MatchPos
Case Else: MatchPos = TheMatches.Count + MatchPos + 1
End Select
' As long as MatchPos does not exceed the match count, process the Nth match. If the
' match count is exceeded, return a zero-length string
If MatchPos <= TheMatches.Count Then
Set Mat = TheMatches(MatchPos - 1)
' User wants a 2-D array of all submatches for the specified match; populate array. If
' a particular submatch is not found, RegExp treats it as a zero-length string
If IsMissing(SubmatchPos) Then
ReDim Answer(0 To 0, 0 To Mat.Submatches.Count - 1)
For SubCounter = 0 To UBound(Answer, 2)
Answer(0, SubCounter) = Mat.Submatches(SubCounter)
Next
RegExpFindSubmatch = Answer
' User wants a single value
Else
' If SubmatchPos = 0 then make it equal count of submatches. If negative, this
' indicates Nth to last; convert to equivalent positive position
Select Case SubmatchPos
Case Is > 0: 'no adjustment needed
Case 0, -1: SubmatchPos = Mat.Submatches.Count
Case Is < -Mat.Submatches.Count: SubmatchPos = -SubmatchPos
Case Else: SubmatchPos = Mat.Submatches.Count + SubmatchPos + 1
End Select
' If SubmatchPos <= count of submatches, then get that submatch for the specified
' match. If the submatch value is not found, or if SubmathPos exceeds count of
' submatches, return a zero-length string. In testing, it appeared necessary to
' use CStr to coerce the return to be a zero-length string instead of zero
If SubmatchPos <= Mat.Submatches.Count Then
RegExpFindSubmatch = CStr(Mat.Submatches(SubmatchPos - 1))
Else
RegExpFindSubmatch = ""
End If
End If
Else
RegExpFindSubmatch = ""
End If
End If
' If there are no matches, return empty string
Else
RegExpFindSubmatch = ""
End If
Cleanup:
' Release object variables
Set Mat = Nothing
Set TheMatches = Nothing
End Function |
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: |
Function RegExpReplace(LookIn As String, PatternStr As String, Optional ReplaceWith As String = "", _
Optional ReplaceAll As Boolean = True, Optional MatchCase As Boolean = True, _
Optional MultiLine As Boolean = False)
' Function written by Patrick G. Matthews. You may use and distribute this code freely,
' as long as you properly credit and attribute authorship and the URL of where you
' found the code
' This function relies on the VBScript version of Regular Expressions, and thus some of
' the functionality available in Perl and/or .Net may not be available. The full extent
' of what functionality will be available on any given computer is based on which version
' of the VBScript runtime is installed on that computer
' This function uses Regular Expressions to parse a string, and replace parts of the string
' matching the specified pattern with another string. The optional argument ReplaceAll
' controls whether all instances of the matched string are replaced (True) or just the first
' instance (False)
' If you need to replace the Nth match, or a range of matches, then use RegExpReplaceRange
' instead
' By default, RegExp is case-sensitive in pattern-matching. To keep this, omit MatchCase or
' set it to True
' If you use this function from Excel, you may substitute range references for all the arguments
' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
' where a large number of calls to this function are made, making RegX a static variable that
' preserves its state in between calls significantly improves performance
Static RegX As Object
If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
With RegX
.Pattern = PatternStr
.Global = ReplaceAll
.IgnoreCase = Not MatchCase
.MultiLine = MultiLine
End With
RegExpReplace = RegX.Replace(LookIn, ReplaceWith)
End Function |
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 64: 65: 66: 67: 68: 69: 70: 71: 72: 73: 74: 75: 76: 77: 78: 79: 80: 81: 82: 83: 84: 85: 86: 87: 88: 89: 90: 91: 92: 93: 94: 95: 96: 97: 98: 99: 100: 101: 102: 103: 104: 105: 106: 107: 108: 109: 110: 111: 112: 113: 114: 115: 116: 117: 118: 119: 120: 121: 122: 123: 124: 125: 126: 127: 128: 129: 130: 131: 132: 133: 134: 135: 136: 137: 138: 139: 140: 141: 142: 143: 144: 145: 146: 147: 148: 149: 150: 151: 152: 153: 154: 155: |
Function RegExpReplaceRange(LookIn As String, PatternStr As String, Optional ReplaceWith As String = "", _
Optional StartAt As Long = 1, Optional EndAt As Long = 0, Optional MatchCase As Boolean = True, _
Optional MultiLine As Boolean = False)
' Function written by Patrick G. Matthews. You may use and distribute this code freely,
' as long as you properly credit and attribute authorship and the URL of where you
' found the code
' This function relies on the VBScript version of Regular Expressions, and thus some of
' the functionality available in Perl and/or .Net may not be available. The full extent
' of what functionality will be available on any given computer is based on which version
' of the VBScript runtime is installed on that computer
' This function uses Regular Expressions to parse a string, and replace parts of the string
' matching the specified pattern with another string. In particular, this function replaces
' the specified range of matched values with the designated replacement string.
' StartAt indicates the start of the range of matches to be replaced. Thus, 2 indicates
' that the second match gets replaced starts the range of matches to be replaced. Use zero
' to specify the last match.
' EndAt indicates the end of the range of matches to be replaced. Thus, a 5 would indicate
' that the 5th match is the last one to be replaced. Use zero to specify the last match.
' Thus, if you use StartAt = 2 and EndAt = 5, then the 2nd through 5th matches will be
' replaced.
' By default, RegExp is case-sensitive in pattern-matching. To keep this, omit MatchCase
' or set it to True
' If you use this function from Excel, you may substitute range references for all the
' arguments
' Note: Match.FirstIndex assumes that the first character position in a string is zero.
' This differs from VBA and VB6, which has the first character at position 1
' Normally as an object variable I would set the RegX variable to Nothing; however, in
' cases where a large number of calls to this function are made, making RegX a static
' variable that preserves its state in between calls significantly improves performance
Static RegX As Object
Dim TheMatches As Object
Dim StartStr As String
Dim WorkingStr As String
Dim Counter As Long
Dim arr() As String
Dim StrStart As Long
Dim StrEnd As Long
' Instantiate RegExp object
If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
With RegX
.Pattern = PatternStr
' First search needs to find all matches
.Global = True
.IgnoreCase = Not MatchCase
.MultiLine = MultiLine
' Run RegExp to find the matches
Set TheMatches = .Execute(LookIn)
' If there are no matches, no replacement need to happen
If TheMatches.Count = 0 Then
RegExpReplaceRange = LookIn
GoTo Cleanup
End If
' Reset StartAt and EndAt if necessary based on matches actually found. Escape if StartAt > EndAt
' 0 or -1 indicates last match. Negative number indicates Nth to last
Select Case StartAt
Case Is > 0: 'no adjustment needed
Case 0, -1: StartAt = TheMatches.Count
Case Is < -TheMatches.Count: StartAt = -StartAt
Case Else: StartAt = TheMatches.Count + StartAt + 1
End Select
Select Case EndAt
Case Is > 0: 'no adjustment needed
Case 0, -1: EndAt = TheMatches.Count
Case Is < -TheMatches.Count: EndAt = -EndAt
Case Else: EndAt = TheMatches.Count + EndAt + 1
End Select
If StartAt > EndAt Then
RegExpReplaceRange = LookIn
GoTo Cleanup
End If
' Now create an array for the partial strings. The elements of the array correspond to...
' 0 : text before the 1st match
' 1 : the first match
' 2 * N - 2 : text between the (N - 1)th and the Nth match (repeat as needed)
' 2 * N - 1 : the Nth match (repeat as needed)
' X : text after the last match (X = 2 * number of matches)
ReDim arr(0 To 2 * TheMatches.Count) As String
' Loop through the matches to populate the array
For Counter = 1 To TheMatches.Count
' If Counter = 1 then it's the first match, and we need the text before the first match.
' If not, then we need the text between the (N - 1)th and the Nth match
If Counter = 1 Then
arr(0) = Left(LookIn, TheMatches(0).FirstIndex)
Else
' Starting character position for text between the (N - 1)th and the Nth match
StrStart = TheMatches(Counter - 2).FirstIndex + TheMatches(Counter - 2).Length + 1
' Length of text between the (N - 1)th and the Nth match
StrEnd = TheMatches(Counter - 1).FirstIndex - StrStart + 1
arr(2 * Counter - 2) = Mid(LookIn, StrStart, StrEnd)
End If
' Now we process the match. If the match number is within the replacement range,
' then put the replacement value into the array. If not, put in the match value
If Counter >= StartAt And Counter <= EndAt Then
' arr(2 * Counter - 1) = ReplaceWith
arr(2 * Counter - 1) = .Replace(TheMatches(Counter - 1), ReplaceWith)
Else
arr(2 * Counter - 1) = TheMatches(Counter - 1)
End If
' If Counter = TheMatches.Count then we need to get the text after the last match
If Counter = TheMatches.Count Then
StrStart = TheMatches(Counter - 1).FirstIndex + TheMatches(Counter - 1).Length + 1
arr(UBound(arr)) = Mid(LookIn, StrStart)
End If
Next
End With
' Use Join to concatenate the elements of the array for our answer
RegExpReplaceRange = Join(arr, "")
Cleanup:
' Clear object variables
Set TheMatches = Nothing
End Function |
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 64: 65: 66: 67: 68: 69: 70: 71: 72: 73: 74: 75: 76: 77: 78: 79: 80: 81: 82: 83: 84: 85: 86: 87: 88: 89: 90: 91: 92: 93: 94: 95: 96: 97: 98: 99: 100: 101: 102: 103: 104: 105: 106: 107: 108: 109: 110: 111: 112: 113: 114: 115: 116: 117: 118: 119: 120: 121: 122: 123: 124: 125: 126: 127: 128: 129: 130: 131: 132: 133: 134: 135: 136: 137: 138: 139: 140: 141: 142: 143: 144: 145: 146: 147: 148: 149: 150: 151: 152: 153: 154: 155: 156: 157: 158: 159: 160: 161: 162: 163: 164: 165: 166: 167: 168: 169: 170: 171: 172: |
Function RegExpReplaceExpression(LookIn As String, PatternStr As String, Expression As String, _
Optional StartAt As Long = 1, Optional EndAt As Long = 0, _
Optional MatchCase As Boolean = True, Optional MultiLine As Boolean = False)
' Function written by Patrick G. Matthews. You may use and distribute this code freely,
' as long as you properly credit and attribute authorship and the URL of where you
' found the code
' This function relies on the VBScript version of Regular Expressions, and thus some of
' the functionality available in Perl and/or .Net may not be available. The full extent
' of what functionality will be available on any given computer is based on which version
' of the VBScript runtime is installed on that computer
' This function is intended for use only in Excel-based VBA projects, since it relies on the
' Excel Application.Evaluate method to process the expression. The expression must use only
' normal arithmetic operators and/or native Excel functions. Use $& to indicate where the
' entire match value should go, or $1 through $9 to use submatches 1 through 9
' This function uses Regular Expressions to parse a string, and replace parts of the string
' matching the specified pattern with another string. In particular, this function replaces
' the specified range of matched values with the designated replacement string. In a twist,
' though
' StartAt indicates the start of the range of matches to be replaced. Thus, 2 indicates that
' the second match gets replaced. Use zero to specify the last match. Negative numbers
' indicate the Nth to last: -1 is the last, -2 the 2nd to last, etc
' EndAt indicates the end of the range of matches to be replaced. Thus, a 5 would indicate
' that the 5th match is the last one to be replaced. Use zero to specify the last match.
' Negative numbers indicate the Nth to last: -1 is the last, -2 the 2nd to last, etc
' Thus, if you use StartAt = 2 and EndAt = 5, then the 2nd through 5th matches will be replaced.
' By default, RegExp is case-sensitive in pattern-matching. To keep this, omit MatchCase or
' set it to True
' Note: Match.FirstIndex assumes that the first character position in a string is zero. This
' differs from VBA and VB6, which has the first character at position 1
' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
' where a large number of calls to this function are made, making RegX a static variable that
' preserves its state in between calls significantly improves performance
Static RegX As Object
Dim TheMatches As Object
Dim StartStr As String
Dim WorkingStr As String
Dim Counter As Long
Dim arr() As String
Dim StrStart As Long
Dim StrEnd As Long
Dim Counter2 As Long
' Instantiate RegExp object
If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
With RegX
.Pattern = PatternStr
' First search needs to find all matches
.Global = True
.IgnoreCase = Not MatchCase
.MultiLine = MultiLine
' Run RegExp to find the matches
Set TheMatches = .Execute(LookIn)
' If there are no matches, no replacement need to happen
If TheMatches.Count = 0 Then
RegExpReplaceExpression = LookIn
GoTo Cleanup
End If
' Reset StartAt and EndAt if necessary based on matches actually found. Escape if StartAt > EndAt
Select Case StartAt
Case Is > 0: 'no adjustment needed
Case 0, -1: StartAt = TheMatches.Count
Case Is < -TheMatches.Count: StartAt = -StartAt
Case Else: StartAt = TheMatches.Count + StartAt + 1
End Select
Select Case EndAt
Case Is > 0: 'no adjustment needed
Case 0, -1: EndAt = TheMatches.Count
Case Is < -TheMatches.Count: EndAt = -EndAt
Case Else: EndAt = TheMatches.Count + EndAt + 1
End Select
If StartAt > EndAt Then
RegExpReplaceExpression = LookIn
GoTo Cleanup
End If
' Now create an array for the partial strings. The elements of the array correspond to...
' 0 : text before the 1st match
' 1 : the first match
' 2 * N - 2 : text between the (N - 1)th and the Nth match (repeat as needed)
' 2 * N - 1 : the Nth match (repeat as needed)
' X : text after the last match (X = 2 * number of matches)
ReDim arr(0 To 2 * TheMatches.Count) As String
' Loop through the matches to populate the array
For Counter = 1 To TheMatches.Count
' If Counter = 1 then it's the first match, and we need the text before the first match.
' If not, then we need the text between the (N - 1)th and the Nth match
If Counter = 1 Then
arr(0) = Left(LookIn, TheMatches(0).FirstIndex)
Else
' Starting character position for text between the (N - 1)th and the Nth match
StrStart = TheMatches(Counter - 2).FirstIndex + TheMatches(Counter - 2).Length + 1
' Length of text between the (N - 1)th and the Nth match
StrEnd = TheMatches(Counter - 1).FirstIndex - StrStart + 1
arr(2 * Counter - 2) = Mid(LookIn, StrStart, StrEnd)
End If
' Now we process the match. If the match number is within the replacement range,
' then put the replacement value into an Evaluate expression, and place the result
' into the array. If not, put in the match value
If Counter >= StartAt And Counter <= EndAt Then
' $& stands in for the entire match
Expression = Replace(Expression, "$&", TheMatches(Counter - 1))
' Now loop through the Submatches, if applicable, and make replacements
For Counter2 = 1 To TheMatches(Counter - 1).Submatches.Count
Expression = Replace(Expression, "$" & Counter2, TheMatches(Counter - 1).Submatches(Counter2 - 1))
Next
' Evaluate the expression
arr(2 * Counter - 1) = Evaluate(Expression)
Else
arr(2 * Counter - 1) = TheMatches(Counter - 1)
End If
' If Counter = TheMatches.Count then we need to get the text after the last match
If Counter = TheMatches.Count Then
StrStart = TheMatches(Counter - 1).FirstIndex + TheMatches(Counter - 1).Length + 1
arr(UBound(arr)) = Mid(LookIn, StrStart)
End If
Next
End With
' Use Join to concatenate the elements of the array for our answer
RegExpReplaceExpression = Join(arr, "")
Cleanup:
' Clear object variables
Set TheMatches = Nothing
Set RegX = Nothing
End Function |
