Solved

Need a wise strategy for validating names with special characters: preg_match()

Posted on 2012-04-05
9
692 Views
Last Modified: 2012-04-06
Hello,

php beginner here again...

I want to validate names of people and cities, as user input. Some names will have special characters. (Jónsdóttir, Québec, etc.)

I realize I really cannot stop someone from inputting:
Mickey Mouse, Orlando, Florida
in the first-name, last-name, city, and state fields...

But, it would be nice to keep out
JJ#*#H, 1=1, DROP TABLE CUSTOMER--

In addition to using mysql_real_escape_string() on each field, what else makes sense to try to stop some nonsense input?

(The member does have to input a real email address, and a validation code is sent there. But, as we all know, someone can have as many email addresses as they want.)

Is this startegy, (along with mysql_real_escape_string), enough:
$preg_match('/^[A-Za-z-'.àáâãäåæçèéêëìíîïðñò]{2,50}$/', $string);

That is just a partial list. Should I add in all the special characters that I want to allow - that is, special characters that could be in names of cities and people around the world?

Obviously, that list doesn't include Chinese, Japanese, Vietnamese, et. etc. characters... but I have not really seen a US-based website that people's names were shown in Mandarin Chinese, for example. I would think it is typical Americentric bad etiquette to force people to anglicize their names...but again, it is a US-based website.

Thanks for any ideas on how to handle this both from a (server-side php) security standpoint, and for a friendly way for the site to take and display the names of people using the special characters they normally use.

Dennis
0
Comment
Question by:dtleahy
  • 5
  • 4
9 Comments
 
LVL 109

Assisted Solution

by:Ray Paseur
Ray Paseur earned 500 total points
ID: 37811659
You might want to learn about this function.
http://us.php.net/manual/en/function.filter-var.php

As far as regular expressions go, I think you might want a character class that looks like this: [^A-Z] -- that says match anything that is NOT a member of the range from A to Z.  Obviously you would want to put more characters into the class, including blanks, commas, dots, apostrophe, etc. (eg: Winston O'Churchill, Esq.).  Put every character you want to keep inside the brackets, and use preg_replace() to remove all the characters you do not want to keep.

After sanitizing the input values, use MySQL_Real_Escape_String() and your data base will be safe.

If you're dealing with human client input from strangers, you might want to consider having a "report inappropriate content" button, too.
0
 

Author Comment

by:dtleahy
ID: 37812411
Hi Ray, and thanks for the reply.

So, email would be handled like this:
$emailadr = trim($_POST['email'])
if(!filter_var($emailadr, FILTER_VALIDATE_EMAIL))
  {
  ## return an error message "E-mail is not valid";
  }
else
  {  
	$emailadr=mysql_real_escape_string($_POST['email']);
  }

Open in new window


I'm not quite sure what you meant by this:
As far as regular expressions go, I think you might want a character class that looks like this: [^A-Z] -- that says match anything that is NOT a member of the range from A to Z.  Obviously you would want to put more characters into the class, including blanks, commas, dots, apostrophe, etc. (eg: Winston O'Churchill, Esq.).  Put every character you want to keep inside the brackets, and use preg_replace() to remove all the characters you do not want to keep.

Do you think it's a good idea to replace characters, or is it better to provide an error that says illegal characters were entered?

Rather than using preg_match, should I be using FILTER_VALIDATE_REGEXP ?

(The following pattern is an attempt to only allow a-z, A-Z, a bunch of accented characters, space, apostrophe, and hyphen

$pattern= "/^[a-zA-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùú ûüýþÿŒœŠšŸŽžƒ-'.\s]*/";

$fname = trim($_POST['firstname']);

if(filter_var($fname, FILTER_VALIDATE_REGEXP, array("options"=>array("regexp"=>$pattern))) === false)
  {
	  ## return an error message "E-mail is not valid";
	  ## echo "First Name has invalid characters";
  }
else
  {  
		$fname=mysql_real_escape_string($fname);
  }

Open in new window


Am I on the right track, or maybe a better question is, do I have it?

Thanks!

Dennis
0
 
LVL 109

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 37812593
Here is my standard email validation example.
<?php // RAY_email_validation.php
error_reporting(E_ALL);


// A FUNCTION TO TEST FOR A VALID EMAIL ADDRESS, RETURN TRUE OR FALSE
// SEE MAN PAGE: http://php.net/manual/en/intro.filter.php
function check_valid_email($email, $rout=TRUE)
{
    // LIST OF BLOCKED DOMAINS
    $bogus = array
    ( '@unknown.com'
    , '@example.com'
    , '@gooseball.org'
    )
    ;

    // IF PHP 5.2 OR ABOVE, WE CAN USE THE FILTER
    if (strnatcmp(phpversion(),'5.2') >= 0)
    {
        if(filter_var($email, FILTER_VALIDATE_EMAIL) === FALSE) return FALSE;
    }

    // IF LOWER-LEVEL PHP, WE CAN CONSTRUCT A REGULAR EXPRESSION
    else
    {
        $regex
        = '/'                        // START REGEX DELIMITER
        . '^'                        // START STRING
        . '[A-Z0-9_-]'               // AN EMAIL - SOME CHARACTER(S)
        . '[A-Z0-9._-]*'             // AN EMAIL - SOME CHARACTER(S) PERMITS DOT
        . '@'                        // A SINGLE AT-SIGN
        . '([A-Z0-9][A-Z0-9-]*\.)+'  // A DOMAIN NAME PERMITS DOT, ENDS DOT
        . '[A-Z\.]'                  // A TOP-LEVEL DOMAIN PERMITS DOT
        . '{2,6}'                    // TLD LENGTH >= 2 AND =< 6
        . '$'                        // ENDOF STRING
        . '/'                        // ENDOF REGEX DELIMITER
        . 'i'                        // CASE INSENSITIVE
        ;
        // TEST THE STRING FORMAT
        if (!preg_match($regex, $email)) return FALSE;
    }

    // TEST TO SEE IF THE DOMAIN IS IN OUR BLOCKED LIST
    foreach ($bogus as $badguy)
    {
        if (stripos($email, $badguy)) return FALSE;
    }

    // FILTER_VAR OR PREG_MATCH DOES NOT TEST IF THE DOMAIN IS ROUTABLE
    if ($rout)
    {
        $domain = explode('@', $email);

        // MAN PAGE: http://php.net/manual/en/function.checkdnsrr.php
        if ( checkdnsrr($domain[1], "MX") || checkdnsrr($domain[1], "A") ) return TRUE;

        // EMAIL IS NOT ROUTABLE
        return FALSE;
    }
    return TRUE;
}



// DEMONSTRATE THE FUNCTION IN ACTION
$e = NULL;
if (!empty($_GET["e"]))
{
    $e = $_GET["e"];
    if (check_valid_email($e))
    {
        echo "<br/>VALID: $e \n";
    }
    else
    {
        echo "<br/>BOGUS: $e \n";
    }
}


// END OF PROCESSING - CREATE THE FORM USING HEREDOC NOTATION
$form = <<<ENDFORM
<form>
TEST A STRING FOR A VALID EMAIL ADDRESS:
<input name="e" value="$e" />
<input type="submit" />
</form>
ENDFORM;

echo $form;

Open in new window

As far as the illegal character in the names goes, I would just replace the illegal characters.  Nobody is named ?/* and those characters can just be dropped out (replaced with NULL or blank).  Why bother with an error message like who cares?

I think this might be more on point for the pattern (not 100% sure, but it would be easy to test)
[$pattern
= "/"    // REGEX DELIMITER
. '['    // START CHARACTER CLASS
. '^'    // NEGATION (MATCH NONE OF THESE)
. 'a-zA-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùú ûüýþÿŒœŠšŸŽžƒ\-'
. "'"    // APOSTROPHE
. '\.'   // PERIOD
. '\s'   // WHITESPACE
. ']'    // END CHARACTER CLASS
. "/"    // REGEX DELIMITER
;

Open in new window

0
NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

 

Author Comment

by:dtleahy
ID: 37813134
Thank you, thank you, Ray!

While I am working out the details, I'm such a newbie at php that I assume the leading opening bracket ( [ ) before $pattern is a typo... but if not, please school me on just what that is for.

so, is this correct?

$pattern
= "/"    // REGEX DELIMITER
. '['    // START CHARACTER CLASS
. '^'    // NEGATION (MATCH NONE OF THESE)
. 'a-zA-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿŒœŠšŸŽžƒ\-'
. "'"    // APOSTROPHE
. '\.'   // PERIOD
. '\s'   // WHITESPACE
. ']'    // END CHARACTER CLASS
. "/"    // REGEX DELIMITER
;  
 
$replace = "";
$new_string = preg_replace($pattern, $replace, $string);
$new_string=mysql_real_escape_string(new_string);

Open in new window


Thanks!

Dennis

{edited: removed something about smart quotes}
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 37813351
The left bracket is the "start of character class" indicator.  The ^ means start of string when it is outside the character class, and means negation when it is inside the character class.  A couple of examples may show what is going on here.

/RAY/ will match the three letters R,A,Y if an only if they are adjacent, spelling RAY.  It does not have to be a word, so StingRAY would match.
/^RAY/ will match the three letters RAY if they are adjacent and at the beginning of the string.  StingRAY would not match
/[^RAY]/ will match any letter that is not one of R,A,Y.  So preg_replace('/[^RAY]/', NULL, 'StingRAY') would eliminate the unmatched characters, making StingRAY into RAY.

Make sense?

Here's the PHP man page.
http://us2.php.net/manual/en/reference.pcre.pattern.syntax.php

Here is an article that is only tangentially about regular expressions (its' more about ways of approaching a problem) but it has some examples that might be helpful.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html

And in case you're interested, this article covers "magic quotes" - one of the worst ideas to get baked into PHP.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_6630-Magic-Quotes-a-bad-idea-from-day-one.html

Best regards, ~Ray
0
 

Author Comment

by:dtleahy
ID: 37813710
I stumbled across some info dealing with "magic quotes." Not pretty.

Thank you very much for all of your help. You are a very helpful person! I very much appreciate that you're not just supplying answers, but connecting me with good resources.

The opening bracket I was referring to was in your second reply (ID: 37812593), in the beginning of the second block of code.
[$pattern

Open in new window

, not the normal start of class indicator.

I'm going to do a little bit more reading, then plug the code in and start testing.

Thanks!

Dennis
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 37813752
Yeah, the bracket in [$pattern is a typo.
0
 

Author Comment

by:dtleahy
ID: 37815079
Thanks again, Ray, for all the help and the resources that you pointed me to!

Dennis
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 37816276
Thanks for the points - great question! ~Ray
0

Featured Post

Are your AD admin tools letting you down?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Generating table dynamically is the most common issue faced by php developers.... So it seems there is a need of an article that explains the basic concept of generating tables dynamically. It just requires a basic knowledge of html and little maths…
This article discusses how to create an extensible mechanism for linked drop downs.
The viewer will learn how to count occurrences of each item in an array.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

803 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question