Solved

Regex Explanation

Posted on 2013-12-09
11
373 Views
Last Modified: 2013-12-10
Can anybody please explain what this regex accepts. Please break it down.

("^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$")
0
Comment
Question by:MacroShadow
11 Comments
 
LVL 11

Expert Comment

by:Angelp1ay
Comment Utility
0
 
LVL 11

Expert Comment

by:Angelp1ay
Comment Utility
Looks to me email related, although I'm not convinced it's a correct email validation.

From left to right:
- Any word
- A dot followed by a word (any number of times)
- The @ sign

...and then it gets a bit funky... I think the top right part is essentially:

- Any word followed by a dot (any number of times)
- a 2-4 len word

...and the bottom right is an IP address

- 1-3 digits followed by a dot (3 times)
- 1-3 digits
0
 
LVL 11

Expert Comment

by:Angelp1ay
Comment Utility
Examples:
longwordwithnospaces@my-website-name.us
bob.jones@blah.com
jim@168.10.1.1
0
 
LVL 11

Expert Comment

by:Angelp1ay
Comment Utility
This editor allows you to enter test data and visualise the match.
https://www.debuggex.com

...it doesn't seem to understand the line start and line end in the regex though (the leading ^ and trailing $). Works well with those stripped i.e:
[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))

Open in new window

0
 
LVL 10

Assisted Solution

by:ienaxxx
ienaxxx earned 100 total points
Comment Utility
("^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$")
Anything starting with " (double Quotes), immed. followed (^) by a word with more than a char (+ at the and) that CAN also contain any of the chars following \w.


("^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$")
The first part CAN, any number of times, (not MUST, this is given by the * outta the ending ")") be followed by a DOT (must be escaped with backslash "\" cause it has a special meaning in the regexps) and any other word as per before.

("^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$")
Must be followed by a "@"

 ("^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$")
Must be followed by something that can be:
(([\-\w]+\.)+[a-zA-Z]{2,4})  = any word without special chars, followed by a dot and any UPPERCASE or lowercase char from two to four occurrences (example ".com")
| = OR
(([0-9]{1,3}\.){3}[0-9]{1,3})) = an IP address (any number with one to three digits, followed by a dot, for exactly 3 times and then followed again by another number from one to three digits.


 ("^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$")
Must END then, with  " (double quotes)

Hope this helps
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 11

Expert Comment

by:SAMIR BHOGAYTA
Comment Utility
hi.. it is your answer

Expected one of *, +, ?, {, {,, (, [, ., \, $, |, ) at line 1, column 3 (byte 3) after ("
0
 
LVL 11

Accepted Solution

by:
Angelp1ay earned 300 total points
Comment Utility
@samirbhogayta - I think you've just dumped this into an editor and copied an error message.

@ienaxxx - I'm assuming this regex is a parameter for a function and the (" ") bits are just the function wrapping it. The ^ at the start and $ at the end are too convenient - these are the start of line and end of line regex expressions.

Let me extend my answer more fully:

^
- start of line (or string you're comparing too)

[\w!#$%&'*+\-/=?\^_`{|}~]+
- any of the items between the square brackets, with the plus meaning 1 or more times (i.e. a string at least 1 char long)
- "\w" stands for "word character", usually equivalent to this set of chars [A-Za-z0-9_]

(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*
- the outer brackets define a group with the * meaning the whole group can repeat 0 or more times
- the first item in this group, "\." is just an escaped "."
- the inner part is exactly as before, basically any string inc. those symbols

@
- the @ sign, exactly once

((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))
- this one is tricky because of the pipe in the middle, it's essentially 2 patterns, either one must match, i.e. it means:

   ((([\-\w]+\.)+[a-zA-Z]{2,4})
    OR
   (([0-9]{1,3}\.){3}[0-9]{1,3}))

- the first is a dash or word char, one or more times, followed by a dot...
- ...with this whole piece being repeated one or more times...
- ...and finally 2-4 alpha chars
(I think this is meant to represent a web domain e.g. abc-123.mydomain.com)

- the second is any numeric digit, repeated 1-3 times (e.g. 1, 12 or 123)...
- ...followed by a dot...
- ...with this whole piece repeated exactly 3 times...
- and one last sequence of digits without a dot
(I think this is meant to represent an IP e.g. 168.10.192.1)

$
- end of line (or string you're comparing too)
0
 
LVL 9

Assisted Solution

by:Derek Jensen
Derek Jensen earned 100 total points
Comment Utility
Ok, so I had to do a little creative interpretation of it, but yes, at first glance it does seem to be email-related. Any regexes that have the @ sign in them are almost always email-related.

^                                 -- Beginning of string
[\w!#$%&'*+\-/=?\^_`{|}~]+        -- Basically look for one or more non-number, non-space chars, or ([^0-9]|\S)+
(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*   -- Look for any number of optional(*) strings consisting of a period followed by one or more(+) non-number/space chars (and store it in capture group 1)
@                                 -- Find the @ symbol
(                                 -- Capture Group 2
    (                             -- Capture group 3
        ([\-\w]+\.)+              -- Look for at least one group of one or more alpha or dash chars, followed by a period (store the last string found in capture group 4)
        [a-zA-Z]{2,4}             -- Find between two and four alpha chars
    )                             -- End group 3
    |                             -- Find the above capture group 3, OR:
    (                             -- Capture group 5
        ([0-9]{1,3}\.){3}         -- Find exactly 3 groups of between one and 3 numbers, followed by a period (store in capture group 6)
        [0-9]{1,3}                -- Find between one and 3 numbers
    )                             -- End group 5 (Note: this group does not appear to have anything to do with validating emails, so I don't immediately see the relevancy in this expression)
)                                 -- End group 2
$                                 -- End of string

Open in new window

I may have confused the order of the groupings inside group 2; if so, I apologize. Some flavors handle the order of encountered parentheses differently.
0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
@bigdogman, that's an excellent explanation.

It's worth adding that given that it looks like we're dealing with an email address, the domain of the address can be an ip address (ip4, not ip6) or just a standard domain name, though @ienaxxx already mentioned this.
0
 
LVL 26

Author Closing Comment

by:MacroShadow
Comment Utility
Wow! I wasn't expecting so many detailed explanations. Thank you all.
0
 
LVL 9

Expert Comment

by:Derek Jensen
Comment Utility
@Terry, interesting, I wasn't aware of that. Thanks for the explanation. :-)
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

In my previous article (http://www.experts-exchange.com/Programming/Languages/.NET/.NET_Framework_3.x/A_4362-Serialization-in-NET-1.html) we saw the basics of serialization and how types/objects can be serialized to Binary format. In this blog we wi…
Wouldn’t it be nice if you could test whether an element is contained in an array by using a Contains method just like the one available on List objects? Wouldn’t it be good if you could write code like this? (CODE) In .NET 3.5, this is possible…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now