Improve company productivity with a Business Account.Sign Up

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 821
  • Last Modified:

VBScript: Regular Expressions

Hi there,

I need to understand the meaning (in words), every aspects and by disecting in detail, the following regular expression:
"((coconut|two)?(?:[^A-Za-z0-9\n\r\t]*))?\(?(\d{3})\)?[- \.]?(\d{3})[- \.]?(\d{4})"

For example:
- ((coconut|two) = look for "coconut" and "two" as literal text due to the parentheses
- ?=Used here for ...

If you have a better and clearer way than my example to explain it, please go ahaid.

Thanks for your help,
1 Solution
The job of a regular expression is to match or 'capture' 1 or more patterns of characters in a string.  The parts of the regex that are in parentheses are 'groups'.  Each group that can capture would be a submatch.  Submatches are labelled $1, $2 etc.  In VBScript regex, submatches are zero referenced as oMatch.SubMatches(0), oMatch.SubMatches(1), etc.

Capturing groups in your sample regex:

Group1: ((coconut|two)?(?:[^A-Za-z0-9\n\r\t]*))?
Group2: (coconut|two)?
Group3: (\d{3})
Group4: (\d{3})
Group5: (\d{4})

You may have spotted that I did not specify this one (?:[^A-Za-z0-9\n\r\t]*) as a group.  That is because the ?: at the start of the group means match but don't capture (although the way it is here it is actually captured as part of group 1, but it does not have it's own group).

So the whole regex goes like this:

Unless the whole regex fits a pattern in the string then nothing is captured.  But some matches are also set as optional.

Find a pattern within the test string that:
1. Starts with coconut OR two.  Place the result in submatch $2.  The pipe means OR.  This match is OPTIONAL because there is a question mark directly after the group, so a matching pattern does not have to start with either.  But if it does start with one, we want to capture it.

2. The next part is to match but not capture zero or more characters that are NOT A-Z or a-z or numbers or carriage return or tab.  This is done by:  
   ?:  means match if there are characters that qualify, but don't capture them.  Why would we want that?  Well if the whole string we want to capture could possibly contain characters like that we still want to have the whole string, but if there is a letter or number in here then we don't.  Either way we don't want this part in the result.
   [^A-Za-z0-9\n\r\t]  the ^ character means anything in these square brackets must NOT be in this part of the string.  the \n means new line, \r means return and \t means tab.
   *  the asterisk means match any amount of characters from zero to many.

SO the result of 1 plus 2 above combined is placed in Group1 or $1.

3. The question mark at the end of the surrounding parenthesis means the match is optional.  The resulting capture does not necessarily have coconut or two or any of the characters not in the square brackets.

4. the \(? means capture a left bracket but the bracket is optional.  There is a leading backslash to escape the bracket so that the regex engine does not think we are starting a new group.  The capture is not assigned to a submatch group.

5. (\d{3})  this means capture 3 numbers together.  This is not optional, if there are not three numbers together at this point the section is discarded.  The numbers are placed in submatch $3

6. the  \)?  means capture a right bracket but the bracket is optional. The capture is not assigned to a submatch group.

7. the  [- \.]?  means after the 3 numbers capture any *one* of the characters in the square brackets, either hyphen, space or dot.  To specify a dot literally, you must escape it with a backslash because dots mean any character when they are on their own.  Because this section is not in parentheses, it is not assigned to a submatch group.  The question mark means this character is optional.

8. (\d{3}) as before this matches any 3 numbers.  The numbers are placed in submatch $4

9. Another optional character [- \.]?  hyphen or space or dot, optional, not placed in a submatch group.

10. The last part of the pattern that must match  (\d{4})  means any 4 numbers together.  not optional and this is placed in submatch group $5.

SO if there was a string like this:
lkfdgi klh lakjfh slakjghdlaskjgh slzkjghlaskjghs lkjhg lazkjdh two £$%7371-11-2222 lasjud
hg lajhdalzudshv lajdhf lakjdshf two£$%^ (767).543-6262 djasdhg 767656.652652 kjhsdg kjsdfyh gaksjf
hy gazkjk gajfsg coconut£$%f(643)-767-1233 kajs gfkauysgf akiuyf gdkahdsf ga

Open in new window

... it would look through the string looking for the first compulsory match, which was 3 numbers together.  If there is a two or coconut followed by some non letters or numbers and possibly a left bracket then take those in two.  It checks if there is an optional right bracket and if there is an optional character of hyphen, space or dot.  It then needs a compulsory 3 numbers together if there is to be a pattern capture.  Then it knows there may be an optional space, dot or hyphen and finally a compulsory 4 numbers together.  If all these factors are true, then the pattern is captured.  In the sample it would capture "two£$%^ (767).543-6262" AND "767656.6526"but NOT "two £$%7371-11-2222" because there are not 3 numbers together, then 3 numbers together, then 4.  And NOT "coconut£$%f(643)-767-1233" because there is a letter between the coconut and the first set of numbers, but even though the letter would exclude the coconut part, it is optional, remember, so the rest of the number DOES match! Confused?  This is really quite hard to get your head round.

Here is a list of settings that can be used in VBScript regular expressions.

More info:

Regular expressions tutorial:

A brilliant web page to instantly test regex:
.. you paste the string to test in the main window, and paste in the regex pattern in the bit at the top.  If you paste in the regex string from your question, then paste in the test string I created in the code box above , you should see the three matches I described above.  See screen shot.

Hope this helps a bit - it took me a very long time to get my head around regular expressions, and there is often more than one correct answer to set one up.
ReneGeAuthor Commented:
Hey Daz,

I am more than impressed by all the dedication and efforts you put in helping me.

I'll pass through it this week end.

Thanks a lot!

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now