Link to home
Start Free TrialLog in
Avatar of christamcc
christamccFlag for United States of America

asked on

PHP Regular Expression

Hello,
Can someone code the regular expression for these field restrictions:

Must be between 4 and 20 characters long and should contain only letters, numbers, and the underscore symbol.

Thanks!
Avatar of DustinKikuchi
DustinKikuchi
Flag of United States of America image

preg_match('/^[\w\d_]{4,20}$/',$field);

Open in new window

Case-insensitive match.  Remove the "i" if you want only uppercase.
preg_match('/^[A-Z0-9_]{4,20}$/i',$field);

Open in new window

HTH, ~Ray
Ray's is the one you'll want to use.  Used \w in my haste which will match spaces, tabs, etc.
Avatar of kaufmed
Used \w in my haste which will match spaces, tabs, etc.
No, "\w" matches [a-zA-Z0-9_]. "\s" matches spaces, tabs, and other whitespace  = )
This is what I get for answering a regex question when I haven't used them in awhile :(

preg_match('/^[\w]{4,20}$/',$field);

Open in new window


Does appear to work as expected when testing.
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
My understanding is that in most regex flavours, \w is identical to [a-zA-Z0-9_]

My understanding of \b is that it essentially matches the (zero-length) gap between \w characters and non-\w characters (\W if you prefer), as well as the start and end of the string, which is exactly equivalent to this:
(?:(?<!\w)(?=\w)|(?<=\w)(?!\w))

Open in new window

It's unfortunate in a lot of situations that the ' character isn't included, because \b\w+\b fails to treat cases like "don't" and "O'Connor" as  single words.

Anyway, the most concise way to meet the author's requirements is:
preg_match('/^\w{4,20}$/',$field);

Open in new window

Avatar of christamcc

ASKER

Thanks for you due diligence!
According to my notes, \w matches any "word"
Ray, I agree with Terry: "\w" matches any word character, not word. You could get a "word" by using "\w+". See "Shorthand Character Classes" for more information.

The reality is that all three approaches (DustinKikuchi, Ray's, and Terry's) are exactly equivalent. While I acknowledge that Ray took the extra time to document the regex, I argue that if one takes the time to understand exactly what "\w" means, then that extra documentation is unnecessary. (Admittedly it is helpful for persons unfamiliar with regex coming behind the original author.)
@Kaufmed: That is a good link for the Shorthand Character Classes.  I forget about this site sometimes.  It deserves to be remembered!
http://www.regular-expressions.info/charclass.html

And you're right about \w versus \w+ with the latter matching more than one alpha-numeric.  But in a character class expression that also includes explicit lengths I did not see any difference in my tests.