Regex Explanation

Can anybody please explain what this regex accepts. Please break it down.

("^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$")
LVL 28
MacroShadowAsked:
Who is Participating?
 
Angelp1ayCommented:
@samirbhogayta - I think you've just dumped this into an editor and copied an error message.

@ienaxxx - I'm assuming this regex is a parameter for a function and the (" ") bits are just the function wrapping it. The ^ at the start and $ at the end are too convenient - these are the start of line and end of line regex expressions.

Let me extend my answer more fully:

^
- start of line (or string you're comparing too)

[\w!#$%&'*+\-/=?\^_`{|}~]+
- any of the items between the square brackets, with the plus meaning 1 or more times (i.e. a string at least 1 char long)
- "\w" stands for "word character", usually equivalent to this set of chars [A-Za-z0-9_]

(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*
- the outer brackets define a group with the * meaning the whole group can repeat 0 or more times
- the first item in this group, "\." is just an escaped "."
- the inner part is exactly as before, basically any string inc. those symbols

@
- the @ sign, exactly once

((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))
- this one is tricky because of the pipe in the middle, it's essentially 2 patterns, either one must match, i.e. it means:

   ((([\-\w]+\.)+[a-zA-Z]{2,4})
    OR
   (([0-9]{1,3}\.){3}[0-9]{1,3}))

- the first is a dash or word char, one or more times, followed by a dot...
- ...with this whole piece being repeated one or more times...
- ...and finally 2-4 alpha chars
(I think this is meant to represent a web domain e.g. abc-123.mydomain.com)

- the second is any numeric digit, repeated 1-3 times (e.g. 1, 12 or 123)...
- ...followed by a dot...
- ...with this whole piece repeated exactly 3 times...
- and one last sequence of digits without a dot
(I think this is meant to represent an IP e.g. 168.10.192.1)

$
- end of line (or string you're comparing too)
0
 
Angelp1ayCommented:
Looks to me email related, although I'm not convinced it's a correct email validation.

From left to right:
- Any word
- A dot followed by a word (any number of times)
- The @ sign

...and then it gets a bit funky... I think the top right part is essentially:

- Any word followed by a dot (any number of times)
- a 2-4 len word

...and the bottom right is an IP address

- 1-3 digits followed by a dot (3 times)
- 1-3 digits
0
Cloud Class® Course: SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

 
Angelp1ayCommented:
Examples:
longwordwithnospaces@my-website-name.us
bob.jones@blah.com
jim@168.10.1.1
0
 
Angelp1ayCommented:
This editor allows you to enter test data and visualise the match.
https://www.debuggex.com

...it doesn't seem to understand the line start and line end in the regex though (the leading ^ and trailing $). Works well with those stripped i.e:
[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))

Open in new window

0
 
ienaxxxCommented:
("^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$")
Anything starting with " (double Quotes), immed. followed (^) by a word with more than a char (+ at the and) that CAN also contain any of the chars following \w.


("^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$")
The first part CAN, any number of times, (not MUST, this is given by the * outta the ending ")") be followed by a DOT (must be escaped with backslash "\" cause it has a special meaning in the regexps) and any other word as per before.

("^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$")
Must be followed by a "@"

 ("^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$")
Must be followed by something that can be:
(([\-\w]+\.)+[a-zA-Z]{2,4})  = any word without special chars, followed by a dot and any UPPERCASE or lowercase char from two to four occurrences (example ".com")
| = OR
(([0-9]{1,3}\.){3}[0-9]{1,3})) = an IP address (any number with one to three digits, followed by a dot, for exactly 3 times and then followed again by another number from one to three digits.


 ("^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$")
Must END then, with  " (double quotes)

Hope this helps
0
 
SAMIR BHOGAYTAFreelancer and IT ConsultantCommented:
hi.. it is your answer

Expected one of *, +, ?, {, {,, (, [, ., \, $, |, ) at line 1, column 3 (byte 3) after ("
0
 
Derek JensenCommented:
Ok, so I had to do a little creative interpretation of it, but yes, at first glance it does seem to be email-related. Any regexes that have the @ sign in them are almost always email-related.

^                                 -- Beginning of string
[\w!#$%&'*+\-/=?\^_`{|}~]+        -- Basically look for one or more non-number, non-space chars, or ([^0-9]|\S)+
(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*   -- Look for any number of optional(*) strings consisting of a period followed by one or more(+) non-number/space chars (and store it in capture group 1)
@                                 -- Find the @ symbol
(                                 -- Capture Group 2
    (                             -- Capture group 3
        ([\-\w]+\.)+              -- Look for at least one group of one or more alpha or dash chars, followed by a period (store the last string found in capture group 4)
        [a-zA-Z]{2,4}             -- Find between two and four alpha chars
    )                             -- End group 3
    |                             -- Find the above capture group 3, OR:
    (                             -- Capture group 5
        ([0-9]{1,3}\.){3}         -- Find exactly 3 groups of between one and 3 numbers, followed by a period (store in capture group 6)
        [0-9]{1,3}                -- Find between one and 3 numbers
    )                             -- End group 5 (Note: this group does not appear to have anything to do with validating emails, so I don't immediately see the relevancy in this expression)
)                                 -- End group 2
$                                 -- End of string

Open in new window

I may have confused the order of the groupings inside group 2; if so, I apologize. Some flavors handle the order of encountered parentheses differently.
0
 
Terry WoodsIT GuruCommented:
@bigdogman, that's an excellent explanation.

It's worth adding that given that it looks like we're dealing with an email address, the domain of the address can be an ip address (ip4, not ip6) or just a standard domain name, though @ienaxxx already mentioned this.
0
 
MacroShadowAuthor Commented:
Wow! I wasn't expecting so many detailed explanations. Thank you all.
0
 
Derek JensenCommented:
@Terry, interesting, I wasn't aware of that. Thanks for the explanation. :-)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.