Link to home
Start Free TrialLog in
Avatar of tim_carter
tim_carter

asked on

How do i validate length of entire regex?

I am trying to build a regex to check an email address

/\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$/

works fine, but i only want it to accept up to 256 chars.

something like

/\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*${3,254}/


but ofcourse it does not work, how do i do this?
ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You might find this function useful.
<?php // RAY_email_validation.php
error_reporting(E_ALL);


// A FUNCTION TO TEST FOR A VALID EMAIL ADDRESS, RETURN TRUE OR FALSE
// SEE MAN PAGE: http://php.net/manual/en/intro.filter.php
function check_valid_email($email)
{
    // FROM THE POST AT EE, CONSIDER ONLY 256 CHARACTERS
    $email = substr(trim($email),0,256);
    
    // LIST OF BLOCKED DOMAINS
    $bogus = array
    ( '@unknown.com'
    , '@example.com'
    , '@gooseball.org'
    )
    ;

    // IF PHP 5.2 OR ABOVE, WE CAN USE THE FILTER
    if (strnatcmp(phpversion(),'5.2') >= 0)
    {
        if(filter_var($email, FILTER_VALIDATE_EMAIL) === FALSE) return FALSE;
    }

    // IF LOWER-LEVEL PHP, WE CAN CONSTRUCT A REGULAR EXPRESSION
    else
    {
        $regex
        = '/'                        // START REGEX DELIMITER
        . '^'                        // START STRING
        . '[A-Z0-9_-]'               // AN EMAIL - SOME CHARACTER(S)
        . '[A-Z0-9._-]*'             // AN EMAIL - SOME CHARACTER(S) PERMITS DOT
        . '@'                        // A SINGLE AT-SIGN
        . '([A-Z0-9][A-Z0-9-]*\.)+'  // A DOMAIN NAME PERMITS DOT, ENDS DOT
        . '[A-Z\.]'                  // A TOP-LEVEL DOMAIN PERMITS DOT
        . '{2,6}'                    // TLD LENGTH >= 2 AND =< 6
        . '$'                        // ENDOF STRING
        . '/'                        // ENDOF REGEX DELIMITER
        . 'i'                        // CASE INSENSITIVE
        ;
        // TEST THE STRING FORMAT
        if (!preg_match($regex, $email)) return FALSE;
    }

    // TEST TO SEE IF THE DOMAIN IS IN OUR BLOCKED LIST
    foreach ($bogus as $badguy)
    {
        if (stripos($email, $badguy)) return FALSE;
    }

    // FILTER_VAR OR PREG_MATCH DOES NOT TEST IF THE DOMAIN IS ROUTABLE
    $domain = explode('@', $email);

    // MAN PAGE: http://php.net/manual/en/function.checkdnsrr.php
    if ( checkdnsrr($domain[1], "MX") || checkdnsrr($domain[1], "A") ) return TRUE;

    // EMAIL IS NOT ROUTABLE
    return FALSE;
}



// DEMONSTRATE THE FUNCTION IN ACTION
$e = NULL;
if (!empty($_GET["e"]))
{
    $e = $_GET["e"];
    if (check_valid_email($e))
    {
        echo "<br/>VALID: $e \n";
    }
    else
    {
        echo "<br/>BOGUS: $e \n";
    }
}


// END OF PROCESSING - CREATE THE FORM USING HEREDOC NOTATION
$form = <<<ENDFORM
<form>
TEST A STRING FOR A VALID EMAIL ADDRESS:
<input name="e" value="$e" />
<input type="submit" />
</form>
ENDFORM;

echo $form;

Open in new window

If you accept to code two different regexes, you can first perform a test on the length with
/.{257}/

Open in new window

and then perform a text on the structure with your initial working regex
/\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$/

Open in new window


If the first expression matches, the address is too large.
fred&barney@stonehenge.com
is a valid email address
Avatar of tim_carter
tim_carter

ASKER

/^(?=.{3,254}$)\w+([-+&.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$/

This regex is perfect
Thanks.

/^(?=.{3,254}$)\w+([-+&.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$/

Modified with & because that is legal in local part
This regex is perfect
I am always amused when I see something like that in writing!  The internet is littered with regular expressions that do not quite work right.  Many of the worst offenders are regular expressions designed to tell if an email address is valid.  That is probably the most important reason for development of this function:
http://php.net/manual/en/function.filter-var.php

Filter_var() has been available since the years-ago release of PHP 5.2.  See:
http://php.net/manual/en/filter.filters.validate.php

PHP 5.2 is now obsolete and is not supported any more. PHP 5.2 is so old it is not even given security releases.  So this is a good time to reconsider whether writing your own regex to validate emails still makes sense.  It's kind of like adjusting a carburetor.  It doesn't really matter how good you are with a carburetor.  Carburetor adjustment is an obsolete skill.  We don't have to do that any more because we have fuel injected engines.  And we don't have to tinker with email validation because we have filter_var().

Filter_var() does not check whether the email is routable.  I choose to include that test in the function posted at ID: 37749502 on line 56.  If you're sending email to yourself at localhost, you might or might not want to include that functionality.
Hi Ray,

I actually use filter_var in php to check email addresses.

Im am just practicing on regex. I do not main that it is perfect to validate an email, because there is still emails it will not validate.
Filter_var() is the right tool.  

You might enjoy this article if you're learning about regular expressions.
https://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html
Thanks ray, I will look at that article