Link to home
Start Free TrialLog in
Avatar of jaysolomon
jaysolomon

asked on

Email RegExp

Ok so knightEknight and i were talking about email regs
below you will see what he came up with
We are trying to get this to catch 99.99% of errors.
There are tons and tons of patterns out there, but i have yet to see
one that will catch 99.99% of errors

So if anyone would like to contribute then that would be fine, but we do not want to have to
use 2 expressions hence isEmail(str) && isEmail2(str) (I think you know what i mean)

This is a continuation from http:Q_20899186.html
===================
<script type="text/javascript">
<!--
function isEmail(str){
    var objRegExp =  /^((?:(?:(?:(\w|~)[~\.\-\+]?)*)(\w|~))+)\@((?:(?:(?:\w[\.\-\+]?){0,62})\w)+)\.([a-zA-Z]{2,6})$|^$/;
    return objRegExp.test(str);
}    
res1 = "me@here.commacommacom";
res2 = "me.you@here.c_m";
res3 = "me@-here.com";
res4 = "-me@here.com";
res5 = "me.you@here.com";
res6 = "me_here_there.everywhere@here_-jay.com";
res7 = "m_e.you@h__4.com";
res8 = "00000-@0-0000-0000.com";
res9 = "yabba-dabba~doo@scooby_doo.com";
res0 = "_~~~~~~~@_____.com";

document.write(res1 +" = "+ isEmail(res1) +"<br />");
document.write(res2 +" = "+ isEmail(res2) +"<br />");
document.write(res3 +" = "+ isEmail(res3) +"<br />");
document.write(res4 +" = "+ isEmail(res4) +"<br />");
document.write(res5 +" = "+ isEmail(res5) +"<br />");
document.write(res6 +" = "+ isEmail(res6) +"<br />");
document.write(res7 +" = "+ isEmail(res7) +"<br />");
document.write(res8 +" = "+ isEmail(res8) +"<br />");
document.write(res9 +" = "+ isEmail(res9) +"<br />");
document.write(res0 +" = "+ isEmail(res0) +"<br />");
// -->
</script>


jAy
Avatar of jaysolomon
jaysolomon

ASKER

BTW

Quote from the thread from kEk

>>>
Jay, although technically valid, I think I would like to prevent two identical special characters in a row, so none of this:

 x--x@here.com
 x__x@here.com
 x~~x@here.com
 x..x@here.com  // the expression already handles this well!

but this should be ok because the special chars are not identical:

  x~.x@here.com
<<<
Avatar of Zvonko
Here my version:

<script type="text/javascript">
<!--
function isEmail(str){
    var objRegExp =  /^((?:(?:(?:(\w|~)[~\.\-\+]?)*)(\w|~))+)\@((?:(?:(?:\w[\.\-\+]?){0,62})\w)+)\.([a-zA-Z]{2,6})$|^$/;
    return objRegExp.test(str);
}    
function isEmail2(str){
    var objRegExp =  /^[a-z][a-z0-9]+([\.\-\_\~][a-z]+)*\@([a-z][a-z0-9]+[\.\-\_\~])*[a-z][a-z0-9]+\.([a-z]{2,6})$/i;
    return objRegExp.test(str);
}    
res1 = "me@here.commacommacom";
res2 = "me.you@here.c_m";
res3 = "me@-here.com";
res4 = "-me@here.com";
res5 = "me.you@here.com";
res6 = "me_here_there.everywhere@here_-jay.com";
res7 = "m_e.you@h__4.com";
res8 = "00000-@0-0000-0000.com";
res9 = "yabba-dabba~doo@scooby_doo.com";
res0 = "_~~~~~~~@_____.com";

document.write(res1 +" = "+ isEmail(res1) +"<br />");
document.write(res2 +" = "+ isEmail(res2) +"<br />");
document.write(res3 +" = "+ isEmail(res3) +"<br />");
document.write(res4 +" = "+ isEmail(res4) +"<br />");
document.write(res5 +" = "+ isEmail(res5) +"<br />");
document.write(res6 +" = "+ isEmail(res6) +"<br />");
document.write(res7 +" = "+ isEmail(res7) +"<br />");
document.write(res8 +" = "+ isEmail(res8) +"<br />");
document.write(res9 +" = "+ isEmail(res9) +"<br />");
document.write(res0 +" = "+ isEmail(res0) +"<br />");
document.write("===============================<br>");
document.write(res1 +" = "+ isEmail2(res1) +"<br />");
document.write(res2 +" = "+ isEmail2(res2) +"<br />");
document.write(res3 +" = "+ isEmail2(res3) +"<br />");
document.write(res4 +" = "+ isEmail2(res4) +"<br />");
document.write(res5 +" = "+ isEmail2(res5) +"<br />");
document.write(res6 +" = "+ isEmail2(res6) +"<br />");
document.write(res7 +" = "+ isEmail2(res7) +"<br />");
document.write(res8 +" = "+ isEmail2(res8) +"<br />");
document.write(res9 +" = "+ isEmail2(res9) +"<br />");
document.write(res0 +" = "+ isEmail2(res0) +"<br />");
// -->
</script>



And by the way, I steal believe that one and only realy usefull RegExp for doing Email address checking is this:
http://javascript.internet.com/forms/email-validation---basic.html

if (/^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$/.test(myForm.emailAddr.value)){
  return (true)
}

Of course are also ip addresses allowed and nonsese characters. But do you realy need such customers?
/^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$/

is not a good one in my opinion as it allows


me.you@here.c_m = true
me.you@here.com = true // which is valid
me_here_there.everywhere@here_-jay.com = true // questionable
m_e.you@h__4.com = true // not likely but questionable
The types of valid email addresses is near endless (LANS generate really odd ones) no one has come up with the perfect regex yet. The best one around is by Jeferrey Friedl (author of th O'Reilly book Regular Expressions) it is pages long and is generated by a perl program linked of his home page. (can not find the link just yet).To ensure all valid e-mails are passed it is inevitable that illegal ones will go through. All you can do is to try and minimise this. I use this

<script>
function isValidEmail(emailAddress) {
    var re = /^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/
    return re.test(emailAddress);
}

alert(isValidEmail('Dick_R_Gratton/SPM/USDAFS@notes.fs.fed.us'))
</script>


example valid email addresses

~Dick_R_Gratton/SPM/USDAFS@notes.fs.fed.us
Dick_R_Gratton/SPM/USDAFS@notes.fs.fed.us
www@bull.bullnet.co.uk
frederick@stat.ucsa.edu
bob.mcalpine@mpr.gov.on.ca
tpjones/psw_rfl@fs.fed.us
17584c.gross/inra@123.03.201.12
*abc/cde@fgh.com
abc%def@123.1.1.1
abcdef@123.1.1.1
17584c.gross.ag.poster/inra@123.03.201.12
Here are some of my test addresses.  I would expect the first group to be valid and the second group to be invalid.

// expecting true;
""
"me@here.com"
"me+you@there.com"
"me-n-you@there.com"
"me1.you2@every.where.com"
"1me.2you3@no-where.com"
"~tilde1@here.com"
"tilde2~@here.com"
"tilde~3@here.com"
"x@y.zz"

// expecting false;
null
"me@here.c_m"
"me@here@com"
"me@.here.com"
"me.@here.com"
"me&you@here.com"
"me..you@here.com"
"me@here..com"
"me@here"
"here.com"
"@here.com"
"me.you@"
"me.you@here."
"me you@there.com"
"meNyou@the re.com"
"me@here.commacommacom"
"xx@yy.z"
sidebar ...

Hi Gwyn, I signed onto ICC late last night and registered my favorite handle "knightEknight" ... it was still available to I took it :)   I played terrible though -- it was after midnight.  What is your husbands handle?  Tell him to check out www.uschesslive.org because the interface is similar but it is free!
The handle is something like  knightE_knight or KnightEKnight or some variation on that. I will pass the link on. Hubby is on EE ( a trouble maker :-) ) and his handle is not hard to guess if you know who he is. (I better ask him first before giving it out)
he plays mindless bullet chess mostly . The family shares the account to some extent (unbeknowest to him) fortunately he  rarely notices when his rating goes down by a couple of 100.
what you mean EE has trouble maker's?

no you do not say :)
my best guess would be Xxavier but perhaps it is Shekerra, no?
sorry ... once the curiosity bug bit me I couldn't resist outing him.  :)
Here is an updated version of the first expression that will allow / in the first part of the address, which Gwyn pointed out is valid and even common in government circles.

var objRegExp =  /^((?:(?:(?:(\w|~)[~\.\-\+]?)*)(\w|~))+)\@((?:(?:(?:\w[\.\-\+]?){0,62})\w)+)\.([a-zA-Z]{2,6})$|^$/;

Also, I notice that this expression already does not allow two identical special characters in a row (except for _ and ~) which is what I wanted anyway.  I am happy with it as is unless someone can point out a major flaw in it.  Surely anyone who wants to take the time to intentionally create an invalid address that will pass this expression or a valid address that won't pass will be able to do so in time, but as it is this will catch nearly all the possible user typos.
ASKER CERTIFIED SOLUTION
Avatar of knightEknight
knightEknight
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
and I'm working on a different approach:


var replied = false;

function isEmail(str)
{
   location = "mailto://" + str + "&BODY=Please reply to the email asap!";

   setInterval("if(checkForReply(str))replied=true",300000);

   if (replied)
      return true;
   else
      return false;
}
that's hilarious :o)
there has got to be a gajillion different ways you could do this :P

heres another more server sided approach, not really a regexp. dont know if people do this - check if its all real chars and what not, and then ping the domain and see if its real/up as a final validation.

#!/usr/bin/perl -w
print "Content-type: text/html\n\n";
@email = split(/=/,$ENV{'QUERY_STRING'}); #?email=somewhere@myhome.com
@hostname = split(/@/,$email[1]);
$ping = `nmap -sP $hostname[1]`;
if($ping=~/appears to be up/s){
 ...continue with email...
} else {
 ...send back...
}

can use nmap or whatever utitility i suppose.
here is one that matches the RFC822 standard: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
thanks ren!  that's great!  now if I can just figure out how to deal with the out-of-memory exception  ;)
I wonder if it can be optimised. But wow what regex
i agree

WOW!!!!!!!! what a regexp
May be try this one:
function checkemail(a)
{var testresults;
 var str=a;
 var filter=/^.+@.+\..{2,3}$/
 if (filter.test(str))
   testresults=true;
 else
  {
   alert("Please input a valid email address!");
   testresults=false;
  }
 return (testresults)
}
me@localhost, mom@moms_g4.local, etc. :P perfectly valid, working emails at my home.
kEk

Have you came to a conclusion on which you think we should go with?
kEk

Have you came to a conclusion on which you think we should go with?
sorry for dbl post EE oldlook error
They are all good Jay.  Split them with everyone (except me, ofcourse)
... but I'm going with the last one I posted.  It does what I need it to do and is short enough to maintain if necessary.
c'mon Jay, split these points up.  You already had my expression before you posted the question (save for a few minor updates:)