Richard Lloyd
asked on
Help with REGEX with 2 different strings
Hi.
I am having difficulty validating a text input in a form.
The entry needs to be either LYBLA000000X (LYBLA Fixed, 000000 any combination of number less that 200000, and a fixed X) or TMP00000000000000 (Fixed TMP, followed by 14 numbers).
I have the first part cracked with
/LYBLA[0-9]{6}X/i
But not sure how to combine them both.
Thanks!
I am having difficulty validating a text input in a form.
The entry needs to be either LYBLA000000X (LYBLA Fixed, 000000 any combination of number less that 200000, and a fixed X) or TMP00000000000000 (Fixed TMP, followed by 14 numbers).
I have the first part cracked with
/LYBLA[0-9]{6}X/i
But not sure how to combine them both.
Thanks!
Don't combine them both! Instead write a little validation function that makes two separate tests. Much easier to write and much easier to test :-)
Here's some of the background thinking.
https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
Please see: https://iconoun.com/demo/temp_rwlloyd71.php
https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
Please see: https://iconoun.com/demo/temp_rwlloyd71.php
<?php // demo/temp_rwlloyd71.php
/**
* https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
*
* https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
*/
error_reporting(E_ALL);
echo '<pre>';
// VALIDATION FUNCTION
function rwl_validation($str)
{
// ADD REGULAR EXPRESSIONS AS NEEDED HERE
$regexes = array
( '/LYBLA[0-9]{6}X/i'
, '/TMP[0-9]{14}/i'
)
;
// TEST THE INPUT WITH EACH REGEX
foreach ($regexes as $rgx)
{
if (preg_match($rgx, $str)) return TRUE;
}
return FALSE;
}
// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'TMP00000000000000'
, 'Gooseball'
)
;
// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
echo PHP_EOL . $test;
if (rwl_validation($test)) echo ' PASSED';
}
Output shows both success and failure.
LYBLA000000X PASSED
TMP00000000000000 PASSED
Gooseball
The part about "less than 200000" might require and additional bit of testing, but hopefully this gives you a good starting point.
Here's a little more sophisticated example. Normally, we would not want multiple return statements from a function, but in the case of very short functions like this, it's OK to overlook that detail.
<?php // demo/temp_rwlloyd71.php
/**
* https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
*
* https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
*/
error_reporting(E_ALL);
echo '<pre>';
// VALIDATION FUNCTION
function rwl_validation($str)
{
// STRAIGHT REGEX VALIDATION - ADD REGULAR EXPRESSIONS AS NEEDED HERE
$regexes = array
( '/TMP[0-9]{14}/i'
)
;
// TEST THE INPUT WITH EACH REGEX
foreach ($regexes as $rgx)
{
if (preg_match($rgx, $str)) return TRUE;
}
// MORE COMPLEX REGEX + VALUES VALIDATION
$rgx = '/LYBLA([0-9]{6})X/i';
if (!preg_match($rgx, $str, $mat)) return FALSE;
if ($mat[1] <= 200000) return TRUE;
return FALSE;
}
// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'lybla200000X'
, 'lybla200001X'
, 'TMP00000000000000'
, 'Gooseball'
)
;
// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
echo PHP_EOL . $test;
if (rwl_validation($test)) echo ' PASSED';
}
Personally I am in favour of doing the match in as few lines as possible - making use of the full power of the regular expression format
Try this
Try this
/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/
Code<?php
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";
$input = array(
'LYBLA000000X',
'xLYBLA000000X',
'LYBLA00C000X',
'LYBLA00000X',
'LYBLA0000000X',
'LYBLA100000X',
'LYBLA10000VX',
'LYBLA10000X',
'LYBLA1000000X',
'LYBLA200000X',
'LYBLA20000CX',
'LYBLA20000X',
'LYBLA2000000X',
'LYBLA300000X',
'TMP00000000000000',
'yTMP00000000000000',
'TMP00000000000000abc',
'TMP000000000000000',
'TMP000000000000000000',
'TMP0000000000000'
);
foreach($input as $in) {
$result = preg_match($regex, $in, $matches);
if ($result) {
echo "{$in} <b>MATCHES</b><br/>";
}
else {
echo "{$in} does <b>NOT</b> match<br/>";
}
}
Output
LYBLA000000X matches
xLYBLA000000X does not match
LYBLA00C000X does not match
LYBLA00000X does not match
LYBLA0000000X does not match
LYBLA100000X matches
LYBLA10000VX does not match
LYBLA10000X does not match
LYBLA1000000X does not match
LYBLA200000X matches
LYBLA20000CX does not match
LYBLA20000X does not match
LYBLA2000000X does not match
LYBLA300000X does not match
TMP00000000000000 matches
yTMP00000000000000 does not match
TMP00000000000000abc does not match
TMP000000000000000 does not match
TMP000000000000000000 does not match
TMP0000000000000 does not match
Working sample here
@Dan you mean like this?
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";
- ^ -
Yup. Exactly like that.
Regular expressions are hard to get right, and in general, string processing is hard because of the free-form nature of the inputs. For example, consider this regex from the initial post:
/LYBLA[0-9]{6}X/i
I "bolded" the pattern modifier "i" to call attention to it, because it makes the regular expression case-insensitive, so both LYBLA and LyBLa would be acceptable substrings. We do not really know whether case-sensitive matching is important, and some of the solutions proposed here will be case-sensitive.
What would we make of something like this, with a trailing blank?
"LYBLA000000X "
Should that string fail? Or should we use trim() to remove the trailing whitespace and accept the result?
It's these kinds of questions that are easy to overlook, but that can produce false negatives or false positives. That's why I prefer to look a little deeper and not limit myself to a single regular expression. YMMV, but a good set of test cases is always useful. If you want to use a formal testing system, you need to be able to mock the inputs and compare the outputs. In my experience, this is easier if you've got your validation routines packaged in a class method, or at least in a function.
Here's another code sample. You might want to run it and see if it meets your needs, or needs to be tweaked some more.
/LYBLA[0-9]{6}X/i
I "bolded" the pattern modifier "i" to call attention to it, because it makes the regular expression case-insensitive, so both LYBLA and LyBLa would be acceptable substrings. We do not really know whether case-sensitive matching is important, and some of the solutions proposed here will be case-sensitive.
What would we make of something like this, with a trailing blank?
"LYBLA000000X "
Should that string fail? Or should we use trim() to remove the trailing whitespace and accept the result?
It's these kinds of questions that are easy to overlook, but that can produce false negatives or false positives. That's why I prefer to look a little deeper and not limit myself to a single regular expression. YMMV, but a good set of test cases is always useful. If you want to use a formal testing system, you need to be able to mock the inputs and compare the outputs. In my experience, this is easier if you've got your validation routines packaged in a class method, or at least in a function.
Here's another code sample. You might want to run it and see if it meets your needs, or needs to be tweaked some more.
<?php // demo/temp_rwlloyd71_julian.php
/**
* https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
*
* https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
*/
error_reporting(E_ALL);
echo '<pre>';
// COMPLICATED REGEX - DOES IT REALLY WORK WELL?
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";
// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'lybla200000X'
, 'lybla200001X'
, 'LYBLA200001X' // THIS PASSES, BUT SHOULD IT PASS?
, 'TMP00000000000000'
, 'Gooseball'
)
;
// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
echo PHP_EOL . $test;
if (preg_match($regex, $test)) echo ' PASSED';
}
/^LYBLA[0|1|2]\d{5}X$|^TMPThis will give a good starting point though - and will address the requirement as stated. If it turns out there are other factors then I don't believe it is difficult to change.\d{14}$/
There is merit in splitting the expressions out in some cases - but when it is a simple case as we have now then I see no problem with combining them. Given that RegEx is geared to more complicated functions such as look ahead and look behind - things can get extremely complicated - especially from a support perspective - but in this case the matching is relatively straight forward.
ASKER
Thanks all for your comments - very helpful.
I am going to go with the
/^LYBLA[0|1|2]\d{5}X$|^TMP \d{14}$/i
example from Dan and Julian as it suits me best. I have other validation on the form, such as min and max length and "strtoupper" in the code so all angles are covered.
I'll report back when I have it working.
I am going to go with the
/^LYBLA[0|1|2]\d{5}X$|^TMP
example from Dan and Julian as it suits me best. I have other validation on the form, such as min and max length and "strtoupper" in the code so all angles are covered.
I'll report back when I have it working.
ASKER
All working. Thanks
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you!