Link to home
Start Free TrialLog in
Avatar of credog
credog

asked on

Perl Regular Expersions in PHP

I have a php script that uses preg_match to test for certain inputs.  I had something like this,

elseif(preg_match("/^[^a-zA-Z,-_0-9\. ]+$/D", $string))
 return FALSE

Meaning anything but those characters will return false, but it wasn't behaivng as I would expect.

 So for this question I'll break it down to the most simple example:
preg_match('|^[^0-9]{1,}$|', '$string');

I understand this to mean that anything that does NOT start or end with a number will be matched.
So if string is "gggg" it is matched.  If string is "9999" it is not matched (do to the carrot in the bracket).

But if string is "g0g" it is not matched.  The string begins and ends with a letter so my thought is that it would be matched.  Why does adding a number between the two letters cause this to not match.  To me it seems that the beggining and end of line anchors are not respected.

Even passing characters like ^^^^ gets matched, But as soon as a number is added somewhere it is not matched.  

I assume it's working correctly, but I'd like an explanation as to why it behaves like this.  I am assuming that the anchors (^$) in this case does not actually mean begins with and ends with?
Avatar of kaufmed
kaufmed
Flag of United States of America image

The way your patterns are constructed, the target string must be a series of characters that are:

Pattern 1: Not a letter, not a number and not a period
    Examples:
        !@#$%^&
        (*)(*)
        -

Pattern 2: Not a number:
    Examples:
        !@#$%^&
        (*)(*)
        -
        hello world

I'm not entirely sure what the goal was for your first pattern. Perhaps you can elaborate.

In reading the description of what you'd like to achieve in the second pattern, it sounds like you want alternation, using the vertical bar ( | ). I would suggest, however, changing your pattern delimiters since the bar is a special character in regex. Try this change:

preg_match('#^\D|\D$#', '$string');

Open in new window


which means:
#      -  Pattern delimiter
^      -  Beginning of string
\D     -  Any character NOT a digit
|      -  OR (alternation)
\D     -  Any character NOT a digit
$      -  End of string
#      -  Pattern delimiter

Open in new window

Actually, I think I misinterpreted the 2nd pattern's intent. I think this is what you are after:

preg_match('#^\D.*\D$#', '$string');

Open in new window


and it's meaning:
#      -  Pattern delimiter
^      -  Beginning of string
\D     -  Any character NOT a digit
.*     -  Zero-or-more ( * ) of any character ( . )
\D     -  Any character NOT a digit
$      -  End of string
#      -  Pattern delimiter

Open in new window

Avatar of credog
credog

ASKER

Good explanation, but I'm still confused on what the following does:

preg_match('#^[^0-9]{1,}$#', '$string');

The carrot inside the the bracket says NOT a number. I get that.
The carrot and the dollar outside the brackets I thought were anchors that would say:
Anything that does not begin or end in a number is matched.  So the string g5g should be matched becouse the beginning and ending does not contain a number, but it is not matched.  Obviously I'm confused by what the ^ and $ are actually doing outside the brakets.

It appears that if a number exists anywhere in the string than the patter in not matched.
ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
P.S.

"hello world" would also match with the last pattern. If you used a preg_match_all, you would actually see two matches: one for "hello" and one for "world".
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@Ray_Paseur
I don't know about that trailing "D" - that is usually the location of a pattern modifier.
Well with regard to PHP, it actually is a pattern modifier--though I can't recall if it affected this situation or not.

http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
PCRE_DOLLAR_ENDONLY - up until now I had remained blissfully ignorant of that modifier!  But then, I tend to remain pedestrian when it comes to programming.  Thanks for the link!