Solved

Perl Regular Expersions in PHP

Posted on 2011-09-08
8
350 Views
Last Modified: 2012-05-12
I have a php script that uses preg_match to test for certain inputs.  I had something like this,

elseif(preg_match("/^[^a-zA-Z,-_0-9\. ]+$/D", $string))
 return FALSE

Meaning anything but those characters will return false, but it wasn't behaivng as I would expect.

 So for this question I'll break it down to the most simple example:
preg_match('|^[^0-9]{1,}$|', '$string');

I understand this to mean that anything that does NOT start or end with a number will be matched.
So if string is "gggg" it is matched.  If string is "9999" it is not matched (do to the carrot in the bracket).

But if string is "g0g" it is not matched.  The string begins and ends with a letter so my thought is that it would be matched.  Why does adding a number between the two letters cause this to not match.  To me it seems that the beggining and end of line anchors are not respected.

Even passing characters like ^^^^ gets matched, But as soon as a number is added somewhere it is not matched.  

I assume it's working correctly, but I'd like an explanation as to why it behaves like this.  I am assuming that the anchors (^$) in this case does not actually mean begins with and ends with?
0
Comment
Question by:credog
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
8 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36504611
The way your patterns are constructed, the target string must be a series of characters that are:

Pattern 1: Not a letter, not a number and not a period
    Examples:
        !@#$%^&
        (*)(*)
        -

Pattern 2: Not a number:
    Examples:
        !@#$%^&
        (*)(*)
        -
        hello world

I'm not entirely sure what the goal was for your first pattern. Perhaps you can elaborate.

In reading the description of what you'd like to achieve in the second pattern, it sounds like you want alternation, using the vertical bar ( | ). I would suggest, however, changing your pattern delimiters since the bar is a special character in regex. Try this change:

preg_match('#^\D|\D$#', '$string');

Open in new window


which means:
#      -  Pattern delimiter
^      -  Beginning of string
\D     -  Any character NOT a digit
|      -  OR (alternation)
\D     -  Any character NOT a digit
$      -  End of string
#      -  Pattern delimiter

Open in new window

0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36504635
Actually, I think I misinterpreted the 2nd pattern's intent. I think this is what you are after:

preg_match('#^\D.*\D$#', '$string');

Open in new window


and it's meaning:
#      -  Pattern delimiter
^      -  Beginning of string
\D     -  Any character NOT a digit
.*     -  Zero-or-more ( * ) of any character ( . )
\D     -  Any character NOT a digit
$      -  End of string
#      -  Pattern delimiter

Open in new window

0
 

Author Comment

by:credog
ID: 36504778
Good explanation, but I'm still confused on what the following does:

preg_match('#^[^0-9]{1,}$#', '$string');

The carrot inside the the bracket says NOT a number. I get that.
The carrot and the dollar outside the brackets I thought were anchors that would say:
Anything that does not begin or end in a number is matched.  So the string g5g should be matched becouse the beginning and ending does not contain a number, but it is not matched.  Obviously I'm confused by what the ^ and $ are actually doing outside the brakets.

It appears that if a number exists anywhere in the string than the patter in not matched.
0
Don't Cry: How Liquid Web is Ensuring Security

WannaCry is just the start. Read how Liquid Web is protecting itself and its customers against new threats.

 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 400 total points
ID: 36504855
^ means start of string (or start of line if you turn on the appropriate modifier); $ means end of string (or end of line if you turn on the appropriate modifier). Combining the two (sans modifiers) essentially says, "match the entire string". For example, given the pattern:

^hello world$

Open in new window


and the string variable:

$value = "hello world";

Open in new window


your preg_match call would succeed. However, if you change the string variable to:

$value = "hello joe";

Open in new window


your preg_match call would fail because the pattern expects the entire string to be "hello world". Now if we keep the latter string variable:

$value = "hello joe";

Open in new window


but we change the pattern:

^hello|world$

Open in new window


now "hello joe" would match because our pattern says, "any string that starts with hello ( ^hello ) or ( | ) ends with "world" ( world$ ).
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36504906
P.S.

"hello world" would also match with the last pattern. If you used a preg_match_all, you would actually see two matches: one for "hello" and one for "world".
0
 
LVL 110

Assisted Solution

by:Ray Paseur
Ray Paseur earned 100 total points
ID: 36516415
Looking at this:

preg_match("/^[^a-zA-Z,-_0-9\. ]+$/D", $string))

I don't know about that trailing "D" - that is usually the location of a pattern modifier.

Let's assume for a moment that you want to disqualify any string that does not contain letters, numbers, the space, the underscore, the comma, the dash, and the dot.  You could put these elements together into a character class that is wrapped in brackets.  Using the caret ^ metacharacter as the first character inside a character class means negation - the regular expression will match anything that is not part of the class.

To make it more confusing, the caret ^ metacharacter, when used at the beginning of a regular expression does not mean negation - it tells the regex engine to start the match at the first character of the string.  If you omit the caret, where will the regex start matching?  At the first character of the string!  And whenever you use a metacharacter inside a regular expression, you need an escape (backslash).  The dash, though not technically a metacharacter, means "from this to that" in regex, so it needs to be escaped too, if it is to mean the literal character hyphen.  Who thought this sort of syntax up?  Oh, I guess it must have been a 1950's mathematician ;-)
http://en.wikipedia.org/wiki/Stephen_Cole_Kleene

See http://www.laprbass.com/RAY_temp_credog.php
Outputs something like:
This ought to work.
But this will fail! HAS BAD CHARACTER(S)
SOS ... --- ...
Pi or maybe Pie 3.14159
Pi or maybe Pie? 3.14159 HAS BAD CHARACTER(S)
<?php // RAY_temp_credog.php
error_reporting(E_ALL);

// A REGULAR EXPRESSION
$rgx
= '/'          // A REGEX DELIMITER
. '['          // START A CHARACTER CLASS
. '^'          // NONE OF THE FOLLOWING MATCH
. 'A-Z'        // LETTERS
. '0-9'        // NUMBERS
. ' _,\-\.'    // SPACE, UNDERSCORE, COMMA, (ESCAPED) DASH, (ESCAPED) DOT
. ']'          // END A CHARACTER CLASS
. '/'          // END REGEX DELIMITER
. 'i'          // MODIFIER FOR CASE-INSENSITIVE
;

// SOME TEST DATA
$dat = array
( 'This ought to work.'
, 'But this will fail!'
, 'SOS ... --- ...'
, 'Pi or maybe Pie 3.14159'
, 'Pi or maybe Pie? 3.14159'
)
;

// TEST THE DATA WITH THE REGEX TO FIND BAD STRINGS
echo "<pre>";
foreach ($dat as $str)
{
    echo PHP_EOL . $str;
    if (preg_match($rgx, $str))
    {
        echo " HAS BAD CHARACTER(S)";
    }
}

// SHOW THE REGEX WE USED
echo PHP_EOL . "THE REGEX CONTAINS: ";
echo htmlentities($rgx);

Open in new window

Grab yourself a copy of this.  Very helpful.
http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/

Best regards to all, ~Ray
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36517795
@Ray_Paseur
I don't know about that trailing "D" - that is usually the location of a pattern modifier.
Well with regard to PHP, it actually is a pattern modifier--though I can't recall if it affected this situation or not.

http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 36518666
PCRE_DOLLAR_ENDONLY - up until now I had remained blissfully ignorant of that modifier!  But then, I tend to remain pedestrian when it comes to programming.  Thanks for the link!
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question