Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


VBScript: Regular Expressions

Posted on 2011-03-18
Medium Priority
Last Modified: 2012-05-11
Hi there,

I need to understand the meaning (in words), every aspects and by disecting in detail, the following regular expression:
"((coconut|two)?(?:[^A-Za-z0-9\n\r\t]*))?\(?(\d{3})\)?[- \.]?(\d{3})[- \.]?(\d{4})"

For example:
- ((coconut|two) = look for "coconut" and "two" as literal text due to the parentheses
- ?=Used here for ...

If you have a better and clearer way than my example to explain it, please go ahaid.

Thanks for your help,
Question by:ReneGe
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 13

Accepted Solution

Daz_1234 earned 2000 total points
ID: 35169219
The job of a regular expression is to match or 'capture' 1 or more patterns of characters in a string.  The parts of the regex that are in parentheses are 'groups'.  Each group that can capture would be a submatch.  Submatches are labelled $1, $2 etc.  In VBScript regex, submatches are zero referenced as oMatch.SubMatches(0), oMatch.SubMatches(1), etc.

Capturing groups in your sample regex:

Group1: ((coconut|two)?(?:[^A-Za-z0-9\n\r\t]*))?
Group2: (coconut|two)?
Group3: (\d{3})
Group4: (\d{3})
Group5: (\d{4})

You may have spotted that I did not specify this one (?:[^A-Za-z0-9\n\r\t]*) as a group.  That is because the ?: at the start of the group means match but don't capture (although the way it is here it is actually captured as part of group 1, but it does not have it's own group).

So the whole regex goes like this:

Unless the whole regex fits a pattern in the string then nothing is captured.  But some matches are also set as optional.

Find a pattern within the test string that:
1. Starts with coconut OR two.  Place the result in submatch $2.  The pipe means OR.  This match is OPTIONAL because there is a question mark directly after the group, so a matching pattern does not have to start with either.  But if it does start with one, we want to capture it.

2. The next part is to match but not capture zero or more characters that are NOT A-Z or a-z or numbers or carriage return or tab.  This is done by:  
   ?:  means match if there are characters that qualify, but don't capture them.  Why would we want that?  Well if the whole string we want to capture could possibly contain characters like that we still want to have the whole string, but if there is a letter or number in here then we don't.  Either way we don't want this part in the result.
   [^A-Za-z0-9\n\r\t]  the ^ character means anything in these square brackets must NOT be in this part of the string.  the \n means new line, \r means return and \t means tab.
   *  the asterisk means match any amount of characters from zero to many.

SO the result of 1 plus 2 above combined is placed in Group1 or $1.

3. The question mark at the end of the surrounding parenthesis means the match is optional.  The resulting capture does not necessarily have coconut or two or any of the characters not in the square brackets.

4. the \(? means capture a left bracket but the bracket is optional.  There is a leading backslash to escape the bracket so that the regex engine does not think we are starting a new group.  The capture is not assigned to a submatch group.

5. (\d{3})  this means capture 3 numbers together.  This is not optional, if there are not three numbers together at this point the section is discarded.  The numbers are placed in submatch $3

6. the  \)?  means capture a right bracket but the bracket is optional. The capture is not assigned to a submatch group.

7. the  [- \.]?  means after the 3 numbers capture any *one* of the characters in the square brackets, either hyphen, space or dot.  To specify a dot literally, you must escape it with a backslash because dots mean any character when they are on their own.  Because this section is not in parentheses, it is not assigned to a submatch group.  The question mark means this character is optional.

8. (\d{3}) as before this matches any 3 numbers.  The numbers are placed in submatch $4

9. Another optional character [- \.]?  hyphen or space or dot, optional, not placed in a submatch group.

10. The last part of the pattern that must match  (\d{4})  means any 4 numbers together.  not optional and this is placed in submatch group $5.

SO if there was a string like this:
lkfdgi klh lakjfh slakjghdlaskjgh slzkjghlaskjghs lkjhg lazkjdh two £$%7371-11-2222 lasjud
hg lajhdalzudshv lajdhf lakjdshf two£$%^ (767).543-6262 djasdhg 767656.652652 kjhsdg kjsdfyh gaksjf
hy gazkjk gajfsg coconut£$%f(643)-767-1233 kajs gfkauysgf akiuyf gdkahdsf ga

Open in new window

... it would look through the string looking for the first compulsory match, which was 3 numbers together.  If there is a two or coconut followed by some non letters or numbers and possibly a left bracket then take those in two.  It checks if there is an optional right bracket and if there is an optional character of hyphen, space or dot.  It then needs a compulsory 3 numbers together if there is to be a pattern capture.  Then it knows there may be an optional space, dot or hyphen and finally a compulsory 4 numbers together.  If all these factors are true, then the pattern is captured.  In the sample it would capture "two£$%^ (767).543-6262" AND "767656.6526"but NOT "two £$%7371-11-2222" because there are not 3 numbers together, then 3 numbers together, then 4.  And NOT "coconut£$%f(643)-767-1233" because there is a letter between the coconut and the first set of numbers, but even though the letter would exclude the coconut part, it is optional, remember, so the rest of the number DOES match! Confused?  This is really quite hard to get your head round.

Here is a list of settings that can be used in VBScript regular expressions.

More info:

Regular expressions tutorial:

A brilliant web page to instantly test regex:
.. you paste the string to test in the main window, and paste in the regex pattern in the bit at the top.  If you paste in the regex string from your question, then paste in the test string I created in the code box above , you should see the three matches I described above.  See screen shot.

Hope this helps a bit - it took me a very long time to get my head around regular expressions, and there is often more than one correct answer to set one up.
LVL 10

Author Closing Comment

ID: 35169280
Hey Daz,

I am more than impressed by all the dedication and efforts you put in helping me.

I'll pass through it this week end.

Thanks a lot!


Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
The Windows functions GetTickCount and timeGetTime retrieve the number of milliseconds since the system was started. However, the value is stored in a DWORD, which means that it wraps around to zero every 49.7 days. This article shows how to solve t…
The viewer will learn how to count occurrences of each item in an array.
Starting up a Project
Suggested Courses

597 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question