JavaScript RegEx Idiosyncracy

I'm using regular expressions in JavaScript to do on-the-fly validation of form input.  As far as I can tell, the following expression should match only valid email addresses, but it's matching an obviously invalid address too.

Expression:
^([\-\.0-9A-Z_a-z]+)@([\-0-9A-Za-z]+\.)+[A-Za-z]{2,4}$

Open in new window


Incorrectly matches:  user@domain OR user@domain.toolongextension

The problem is the validation after the @ sign.  The regex there isn't complex.  It requires one or more alphanumeric (optionally hyphened) domain/subdomain names of various lengths followed by a period, then ending with an alphabetic 2-, 3-, or 4- digit extension (e.g. .cc, .com, .info).  But it's not working.  And this RegEx works perfectly fine, as expected, in the .NET framework.

Points for anyone that show me, most importantly, a workable fix.  Points also for showing me if I'm doing anything wrong OR online documentation of this as a known bug in JS implementation.
LVL 8
Bobaran98Asked:
Who is Participating?
 
Derek JensenConnect With a Mentor Commented:
No...what??

Okay. Let me clear the air a little bit here...

In my experience, I have never known JS to ignore a backslash in front of a period. JS regex is closer to perl regex than PCRE is.

Looking at the original regex, the period was *not* matching any character. The problem was elsewhere in the regex...but we'll get to that in a minute. ;-)

Regarding the difference in \. and [\.], what was said by Leakim:
So in the brackets, backslash are considered like a character and not as escape character
is absolutely incorrect. At least as far as JS/PCRE regex is concerned.

The truth of the matter is,
\. === [.] == [\.]

Open in new window

period.

That isn't to say the backslash inside brackets severs no purpose; on the contrary, a backslash is always an escape character *unless immediately followed by another backslash*.
\\ === [\\] !== [\]

Open in new window

[\] will produce an error in every regex tester I know of.

For example, try this regex out in the regex tester of your choice, using this post as the haystack:
/[a-z\]-]/

Open in new window

*That* is what the purpose of a backslash inside brackets serves.
But not just that. ;-)

Now, about this regex...
^([\-\.0-9A-Z_a-z]+)@([\-0-9A-Za-z]+\.)+[A-Za-z]{2,4}$

The problem lies in the last plus. Try this regex out:
^([-.0-9A-Z_a-z]+)@[-.0-9A-Za-z]+\.[A-Za-z]{2,4}$

Open in new window

;-)

One last note about brackets(as I noticed you'd escaped the dash also):

Inside brackets, a dash serves as a range indicator *unless immediately following the opening or immediately preceding the closing bracket*.
[-a-z] === [a-z-] === [a\-b-z]

Open in new window

Thus, although there was an error in your regex, the \. was not it. :-)
0
 
Bobaran98Author Commented:
LOL.  After an hour messing with this, I think I just solved my own issue... mere moments after posting it here.  Looking at the code again, I wondered if maybe JavaScript was ignoring the escape slash before the period-- if so, that period would match one or more of any character, which would make pretty much anything to the right of the @ sign valid.
I put the \. within range braces, even specifying (unnecessarily) a single character, and it now works as expected.  Like so:
[code]^([\-\.0-9A-Z_a-z]+)@([\-0-9A-Za-z]+[\.]{1})+[A-Za-z]{2,4}$[/code]
I will still grant points to anyone who can show me why (a) my original code was wrong and this is valid, or (b) online documentation of this as a known issue.  Because my understanding of RegEx says this is silly.
0
 
leakim971PluritechnicianCommented:
A good regex here : http://www.marketingtechblog.com/programming/javascript-regex-emailaddress/



/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/

Open in new window

0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
Bobaran98Author Commented:
Sorry, let's try this again
^([\-\.0-9A-Z_a-z]+)@([\-0-9A-Za-z]+[\.]{1})+[A-Za-z]{2,4}$

With the operative part being:  [\.]{1}

Instead of simply:  \.

Open in new window

0
 
leakim971PluritechnicianCommented:
no problem Bobaran98, if you're satisfied by your regex just accept your last post as answer to close the question
0
 
Bobaran98Author Commented:
@leakim, can you answer any of my questions?  Your expression works too, obviously.  But I notice your period is outside of braces and has no escape character.  In any other framework I've worked within (.NET, PHP, etc.), such a period would be treated as a wildcard, and escaping it would make it a standard period.  But I'm seeing now in JavaScript the behavior appears the exact opposite.
Can you comment on this?  It would certainly explain my original issue.  My problem is I would like to use the same expression for both JavaScript and .NET validation (the JS check happens on the fly, with the .NET happening again, just to be sure, before any data is written to the DB).
0
 
leakim971PluritechnicianCommented:
Like .net and PHP, period in js : http://www.w3schools.com/jsref/jsref_regexp_dot.asp
A good link : http://www.w3schools.com/jsref/jsref_obj_regexp.asp

(([a-zA-Z0-9-])+.)

I read one or more than on alphanumerical character or - followed by any single character not alphanumerical. For example : -
No only a dot/period

Something like : bad@expert?excff is valid with this regex.
You regex is best.
0
 
Bobaran98Author Commented:
@mods-- I'd like to accept my own post (#33468270) as the solution, but award leakim971's post (#33468782) 50 points.  I've tried doing this using "Accept and Award Points," but it gives me the error message that the minimum point split is 20 points (which is a bogus message).  Thanks!
-----------
@leakim-- Thanks for your willingness to help, but I was really looking for some discussion or links about the differences between JavaScript regex and other implementations.  At least a reason why the escape character gets ignored in front of the period.  Your links were very basic.  No worries-- obviously you're very busy on here; I just wanted you to know why I'm not awarding any more points than this.
Have a good day!
0
 
Bobaran98Author Commented:
@mods-- oh, and a grade of A.  Thanks!
0
 
käµfm³d 👽Commented:
>>  (([a-zA-Z0-9-])+.)
>>  I read one or more than on alphanumerical character or - followed by any single character not
>>  alphanumerical. For example : -
>>  No only a dot/period

That pattern actually says one-or-more alphanumerics followed by ANY character, not just a character which is "not alphanumeric."


Can you explain what wasn't working about the pattern? I tested it locally and it appeared to work fine for me. The only changes I made are that inside character classes ( [] ), you do not need to escape hyphens or periods. With regard to literal hyphens, you only need to make sure that the hyphen is at either the beginning or end of the class (i.e. the first character after the opening bracket OR the last character just prior to the closing bracket. Here is what I tested with:
alert(x.match(/^([-.0-9A-Z_a-z]+)@([-0-9A-Za-z]+\.)+[A-Za-z]{2,4}$/));

Open in new window

0
 
leakim971PluritechnicianCommented:
>That pattern actually says one-or-more alphanumerics followed by ANY character, not just a character which is "not alphanumeric."
  That pattern actually says one-or-more alphanumerics or - followed by ANY character, not just a character which is "not alphanumeric."
0
 
käµfm³d 👽Commented:
I'm sorry, but where do you see an "or" in that pattern?
0
 
leakim971PluritechnicianCommented:
> where do you see an "or" in that pattern?

after the 9 in (([a-zA-Z0-9-])+.)
0
 
leakim971PluritechnicianCommented:
not this << or >> : |

but << one of this >> there's alphanum and -

bad language, sorry
0
 
käµfm³d 👽Commented:
Ah. I missed the dash in your explanation myself  :)
0
 
leakim971PluritechnicianCommented:
@Bobaran98 said :
>At least a reason why the escape character gets ignored in front of the period

Look again : http://www.w3schools.com/jsref/jsref_obj_regexp.asp
the Brackets part

So in the brackets, backslash are considered like a character and not as escape character
0
 
leakim971PluritechnicianCommented:
@kaufmed

I was unable to find the word << dash >>
Shame on me brother...

Have fun, see you later.
0
All Courses

From novice to tech pro — start learning today.