Solved

Need a wise strategy for validating names with special characters: preg_match()

Posted on 2012-04-05
9
686 Views
Last Modified: 2012-04-06
Hello,

php beginner here again...

I want to validate names of people and cities, as user input. Some names will have special characters. (Jónsdóttir, Québec, etc.)

I realize I really cannot stop someone from inputting:
Mickey Mouse, Orlando, Florida
in the first-name, last-name, city, and state fields...

But, it would be nice to keep out
JJ#*#H, 1=1, DROP TABLE CUSTOMER--

In addition to using mysql_real_escape_string() on each field, what else makes sense to try to stop some nonsense input?

(The member does have to input a real email address, and a validation code is sent there. But, as we all know, someone can have as many email addresses as they want.)

Is this startegy, (along with mysql_real_escape_string), enough:
$preg_match('/^[A-Za-z-'.àáâãäåæçèéêëìíîïðñò]{2,50}$/', $string);

That is just a partial list. Should I add in all the special characters that I want to allow - that is, special characters that could be in names of cities and people around the world?

Obviously, that list doesn't include Chinese, Japanese, Vietnamese, et. etc. characters... but I have not really seen a US-based website that people's names were shown in Mandarin Chinese, for example. I would think it is typical Americentric bad etiquette to force people to anglicize their names...but again, it is a US-based website.

Thanks for any ideas on how to handle this both from a (server-side php) security standpoint, and for a friendly way for the site to take and display the names of people using the special characters they normally use.

Dennis
0
Comment
Question by:dtleahy
  • 5
  • 4
9 Comments
 
LVL 108

Assisted Solution

by:Ray Paseur
Ray Paseur earned 500 total points
ID: 37811659
You might want to learn about this function.
http://us.php.net/manual/en/function.filter-var.php

As far as regular expressions go, I think you might want a character class that looks like this: [^A-Z] -- that says match anything that is NOT a member of the range from A to Z.  Obviously you would want to put more characters into the class, including blanks, commas, dots, apostrophe, etc. (eg: Winston O'Churchill, Esq.).  Put every character you want to keep inside the brackets, and use preg_replace() to remove all the characters you do not want to keep.

After sanitizing the input values, use MySQL_Real_Escape_String() and your data base will be safe.

If you're dealing with human client input from strangers, you might want to consider having a "report inappropriate content" button, too.
0
 

Author Comment

by:dtleahy
ID: 37812411
Hi Ray, and thanks for the reply.

So, email would be handled like this:
$emailadr = trim($_POST['email'])
if(!filter_var($emailadr, FILTER_VALIDATE_EMAIL))
  {
  ## return an error message "E-mail is not valid";
  }
else
  {  
	$emailadr=mysql_real_escape_string($_POST['email']);
  }

Open in new window


I'm not quite sure what you meant by this:
As far as regular expressions go, I think you might want a character class that looks like this: [^A-Z] -- that says match anything that is NOT a member of the range from A to Z.  Obviously you would want to put more characters into the class, including blanks, commas, dots, apostrophe, etc. (eg: Winston O'Churchill, Esq.).  Put every character you want to keep inside the brackets, and use preg_replace() to remove all the characters you do not want to keep.

Do you think it's a good idea to replace characters, or is it better to provide an error that says illegal characters were entered?

Rather than using preg_match, should I be using FILTER_VALIDATE_REGEXP ?

(The following pattern is an attempt to only allow a-z, A-Z, a bunch of accented characters, space, apostrophe, and hyphen

$pattern= "/^[a-zA-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùú ûüýþÿŒœŠšŸŽžƒ-'.\s]*/";

$fname = trim($_POST['firstname']);

if(filter_var($fname, FILTER_VALIDATE_REGEXP, array("options"=>array("regexp"=>$pattern))) === false)
  {
	  ## return an error message "E-mail is not valid";
	  ## echo "First Name has invalid characters";
  }
else
  {  
		$fname=mysql_real_escape_string($fname);
  }

Open in new window


Am I on the right track, or maybe a better question is, do I have it?

Thanks!

Dennis
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 37812593
Here is my standard email validation example.
<?php // RAY_email_validation.php
error_reporting(E_ALL);


// A FUNCTION TO TEST FOR A VALID EMAIL ADDRESS, RETURN TRUE OR FALSE
// SEE MAN PAGE: http://php.net/manual/en/intro.filter.php
function check_valid_email($email, $rout=TRUE)
{
    // LIST OF BLOCKED DOMAINS
    $bogus = array
    ( '@unknown.com'
    , '@example.com'
    , '@gooseball.org'
    )
    ;

    // IF PHP 5.2 OR ABOVE, WE CAN USE THE FILTER
    if (strnatcmp(phpversion(),'5.2') >= 0)
    {
        if(filter_var($email, FILTER_VALIDATE_EMAIL) === FALSE) return FALSE;
    }

    // IF LOWER-LEVEL PHP, WE CAN CONSTRUCT A REGULAR EXPRESSION
    else
    {
        $regex
        = '/'                        // START REGEX DELIMITER
        . '^'                        // START STRING
        . '[A-Z0-9_-]'               // AN EMAIL - SOME CHARACTER(S)
        . '[A-Z0-9._-]*'             // AN EMAIL - SOME CHARACTER(S) PERMITS DOT
        . '@'                        // A SINGLE AT-SIGN
        . '([A-Z0-9][A-Z0-9-]*\.)+'  // A DOMAIN NAME PERMITS DOT, ENDS DOT
        . '[A-Z\.]'                  // A TOP-LEVEL DOMAIN PERMITS DOT
        . '{2,6}'                    // TLD LENGTH >= 2 AND =< 6
        . '$'                        // ENDOF STRING
        . '/'                        // ENDOF REGEX DELIMITER
        . 'i'                        // CASE INSENSITIVE
        ;
        // TEST THE STRING FORMAT
        if (!preg_match($regex, $email)) return FALSE;
    }

    // TEST TO SEE IF THE DOMAIN IS IN OUR BLOCKED LIST
    foreach ($bogus as $badguy)
    {
        if (stripos($email, $badguy)) return FALSE;
    }

    // FILTER_VAR OR PREG_MATCH DOES NOT TEST IF THE DOMAIN IS ROUTABLE
    if ($rout)
    {
        $domain = explode('@', $email);

        // MAN PAGE: http://php.net/manual/en/function.checkdnsrr.php
        if ( checkdnsrr($domain[1], "MX") || checkdnsrr($domain[1], "A") ) return TRUE;

        // EMAIL IS NOT ROUTABLE
        return FALSE;
    }
    return TRUE;
}



// DEMONSTRATE THE FUNCTION IN ACTION
$e = NULL;
if (!empty($_GET["e"]))
{
    $e = $_GET["e"];
    if (check_valid_email($e))
    {
        echo "<br/>VALID: $e \n";
    }
    else
    {
        echo "<br/>BOGUS: $e \n";
    }
}


// END OF PROCESSING - CREATE THE FORM USING HEREDOC NOTATION
$form = <<<ENDFORM
<form>
TEST A STRING FOR A VALID EMAIL ADDRESS:
<input name="e" value="$e" />
<input type="submit" />
</form>
ENDFORM;

echo $form;

Open in new window

As far as the illegal character in the names goes, I would just replace the illegal characters.  Nobody is named ?/* and those characters can just be dropped out (replaced with NULL or blank).  Why bother with an error message like who cares?

I think this might be more on point for the pattern (not 100% sure, but it would be easy to test)
[$pattern
= "/"    // REGEX DELIMITER
. '['    // START CHARACTER CLASS
. '^'    // NEGATION (MATCH NONE OF THESE)
. 'a-zA-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùú ûüýþÿŒœŠšŸŽžƒ\-'
. "'"    // APOSTROPHE
. '\.'   // PERIOD
. '\s'   // WHITESPACE
. ']'    // END CHARACTER CLASS
. "/"    // REGEX DELIMITER
;

Open in new window

0
 

Author Comment

by:dtleahy
ID: 37813134
Thank you, thank you, Ray!

While I am working out the details, I'm such a newbie at php that I assume the leading opening bracket ( [ ) before $pattern is a typo... but if not, please school me on just what that is for.

so, is this correct?

$pattern
= "/"    // REGEX DELIMITER
. '['    // START CHARACTER CLASS
. '^'    // NEGATION (MATCH NONE OF THESE)
. 'a-zA-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿŒœŠšŸŽžƒ\-'
. "'"    // APOSTROPHE
. '\.'   // PERIOD
. '\s'   // WHITESPACE
. ']'    // END CHARACTER CLASS
. "/"    // REGEX DELIMITER
;  
 
$replace = "";
$new_string = preg_replace($pattern, $replace, $string);
$new_string=mysql_real_escape_string(new_string);

Open in new window


Thanks!

Dennis

{edited: removed something about smart quotes}
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 108

Expert Comment

by:Ray Paseur
ID: 37813351
The left bracket is the "start of character class" indicator.  The ^ means start of string when it is outside the character class, and means negation when it is inside the character class.  A couple of examples may show what is going on here.

/RAY/ will match the three letters R,A,Y if an only if they are adjacent, spelling RAY.  It does not have to be a word, so StingRAY would match.
/^RAY/ will match the three letters RAY if they are adjacent and at the beginning of the string.  StingRAY would not match
/[^RAY]/ will match any letter that is not one of R,A,Y.  So preg_replace('/[^RAY]/', NULL, 'StingRAY') would eliminate the unmatched characters, making StingRAY into RAY.

Make sense?

Here's the PHP man page.
http://us2.php.net/manual/en/reference.pcre.pattern.syntax.php

Here is an article that is only tangentially about regular expressions (its' more about ways of approaching a problem) but it has some examples that might be helpful.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html

And in case you're interested, this article covers "magic quotes" - one of the worst ideas to get baked into PHP.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_6630-Magic-Quotes-a-bad-idea-from-day-one.html

Best regards, ~Ray
0
 

Author Comment

by:dtleahy
ID: 37813710
I stumbled across some info dealing with "magic quotes." Not pretty.

Thank you very much for all of your help. You are a very helpful person! I very much appreciate that you're not just supplying answers, but connecting me with good resources.

The opening bracket I was referring to was in your second reply (ID: 37812593), in the beginning of the second block of code.
[$pattern

Open in new window

, not the normal start of class indicator.

I'm going to do a little bit more reading, then plug the code in and start testing.

Thanks!

Dennis
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 37813752
Yeah, the bracket in [$pattern is a typo.
0
 

Author Comment

by:dtleahy
ID: 37815079
Thanks again, Ray, for all the help and the resources that you pointed me to!

Dennis
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 37816276
Thanks for the points - great question! ~Ray
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now