Can anyone explain this email validation regex

This regex is often used to validate email addresses:

\w+([-+!$%&*/=?{|}.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*

Open in new window


Can anyone break it down and explain exactly what it is doing?
purplesoupAsked:
Who is Participating?
 
stergiumConnect With a Mentor Commented:
hello. i use this site whenever i have that kind of questions. http://regex101.com/#python 

    /\w+([-+!$%&*/=?{|}.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*/

    \w+ match any word character [a-zA-Z0-9_]
    1st Capturing group ([-+!$%&*/=?{|}.']\w+)*
    @ matches the character @ literally
    \w+ match any word character [a-zA-Z0-9_]
    2nd Capturing group ([-.]\w+)*
    \. matches the character . literally
    \w+ match any word character [a-zA-Z0-9_]
    3rd Capturing group ([-.]\w+)*

Open in new window


the copy paste of it .
i hope that helps
0
 
purplesoupAuthor Commented:
Thanks - I'm not clear about capturing groups - I just tried reading this

http://www.regular-expressions.info/named.html

but I couldn't link it to what you had - can you make it any clearer for me?

Sorry!
0
 
stergiumCommented:
what part of this expression is not clear to you ?  please explain
0
 
purplesoupAuthor Commented:
I didn't understand what capturing groups were, as I think I mentioned, and the initial link I looked up didn't seem to explain it very well, however I did some more searching and I think I have the hang of it.

This is what I made of it:

([-+!$%&*/=?{|}.']\w+)*

The bits between ( and ) are the group. The * at the end refers to zero or more characters, if it had + on the end it would be one or more.

So now looking at the contents of the group, we have

[-+!$%&*/=?{|}.']\w+

Well the \w+ at the end is easy enough - a word character, one or more times.

So what of

[-+!$%&*/=?{|}.']

?

This I believe refers to any one of these characters is ok.

So valid matches might be

(nothing)

since the capturing group has * at the end, zero characters are a valid match.

+a

the plus (+) character is one of the allowed characters, but it has to be followed by at least one word character (in this case "a")

&abcd

the ampersand (&) character is one of the allowed characters, and it must be followed by one or more word characters, "abcd" is therefore acceptable.

This wouldn't be allowed:

%&*

because only one of the special characters is allowed and it isn't followed by one or more word characters.

That was the sort of explanation I was looking for.
0
 
stergiumCommented:
[-+!$%&*/=?{|}.']    -> one of these characters.  
The  link that posted , which is a reference to me also , explains/breaks every regular expresion .
If you feel that your are not satisfied with the answer , you can request the help of a moderator.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.