Help with REGEX with 2 different strings

Hi.

I am having difficulty validating a text input in a form.

The entry needs to be either LYBLA000000X (LYBLA Fixed, 000000 any combination of number less that 200000, and a fixed X) or TMP00000000000000 (Fixed TMP, followed by 14 numbers).

I have the first part cracked with

/LYBLA[0-9]{6}X/i

But not sure how to combine them both.

Thanks!
rwlloyd71Asked:
Who is Participating?
 
Ray PaseurConnect With a Mentor Commented:
Here is an example of why it is important to state the problem clearly and create good test cases.  Professional programmers who understand automated testing usually try to avoid writing complicated rules for string validation.  We prefer to get closer to the original problems in the data and isolate the issues before they become complicated string validation rules.  This helps us avoid regular expressions with holes that let unwanted data fall through into the soup!
The entry needs to be either LYBLA000000X (LYBLA Fixed, 000000 any combination of number less that 200000, and a fixed X) ...
https://iconoun.com/demo/temp_rwlloyd71_julian.php
<?php // demo/temp_rwlloyd71_julian.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html#a41761231
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// COMPLICATED REGEX - DOES IT REALLY WORK WELL?
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/i";

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'LYBLA200001X' // THIS PASSES, BUT SHOULD IT PASS ("...less that 200000")?
, 'LYBLA244444X' // THIS PASSES, BUT SHOULD IT PASS ("...less that 200000")?
, 'LYBLA299999X' // THIS PASSES, BUT SHOULD IT PASS ("...less that 200000")?
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (preg_match($regex, $test)) echo ' PASSED';
}

Open in new window

Outputs:
LYBLA000000X PASSED
LYBLA200001X PASSED
LYBLA244444X PASSED
LYBLA299999X PASSED
TMP00000000000000 PASSED
Gooseball

Open in new window

0
 
Ray PaseurCommented:
Don't combine them both!  Instead write a little validation function that makes two separate tests.  Much easier to write and much easier to test :-)
0
 
Ray PaseurCommented:
Here's some of the background thinking.
https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html

Please see: https://iconoun.com/demo/temp_rwlloyd71.php
<?php // demo/temp_rwlloyd71.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// VALIDATION FUNCTION
function rwl_validation($str)
{
    // ADD REGULAR EXPRESSIONS AS NEEDED HERE
    $regexes = array
    ( '/LYBLA[0-9]{6}X/i'
    , '/TMP[0-9]{14}/i'
    )
    ;

    // TEST THE INPUT WITH EACH REGEX
    foreach ($regexes as $rgx)
    {
        if (preg_match($rgx, $str)) return TRUE;
    }
    return FALSE;
}

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (rwl_validation($test)) echo ' PASSED';
}

Open in new window

Output shows both success and failure.
LYBLA000000X PASSED
TMP00000000000000 PASSED
Gooseball

Open in new window

The part about "less than 200000" might require and additional bit of testing, but hopefully this gives you a good starting point.
0
2018 Annual Membership Survey

Here at Experts Exchange, we strive to give members the best experience. Help us improve the site by taking this survey today! (Bonus: Be entered to win a great tech prize for participating!)

 
Ray PaseurCommented:
Here's a little more sophisticated example.  Normally, we would not want multiple return  statements from a function, but in the case of very short functions like this, it's OK to overlook that detail.
<?php // demo/temp_rwlloyd71.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// VALIDATION FUNCTION
function rwl_validation($str)
{
    // STRAIGHT REGEX VALIDATION - ADD REGULAR EXPRESSIONS AS NEEDED HERE
    $regexes = array
    ( '/TMP[0-9]{14}/i'
    )
    ;

    // TEST THE INPUT WITH EACH REGEX
    foreach ($regexes as $rgx)
    {
        if (preg_match($rgx, $str)) return TRUE;
    }

    // MORE COMPLEX REGEX + VALUES VALIDATION
    $rgx = '/LYBLA([0-9]{6})X/i';
    if (!preg_match($rgx, $str, $mat)) return FALSE;
    if ($mat[1] <= 200000) return TRUE;

    return FALSE;
}

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'lybla200000X'
, 'lybla200001X'
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (rwl_validation($test)) echo ' PASSED';
}

Open in new window

0
 
Julian HansenCommented:
Personally I am in favour of doing the match in as few lines as possible - making use of the full power of the regular expression format

Try this

/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/

Open in new window

Code
<?php
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";
$input = array(
  'LYBLA000000X',
  'xLYBLA000000X',
  'LYBLA00C000X',
  'LYBLA00000X',
  'LYBLA0000000X',
  'LYBLA100000X',
  'LYBLA10000VX',
  'LYBLA10000X',
  'LYBLA1000000X',
  'LYBLA200000X',
  'LYBLA20000CX',
  'LYBLA20000X',
  'LYBLA2000000X',
  'LYBLA300000X',
  'TMP00000000000000',
  'yTMP00000000000000',
  'TMP00000000000000abc',
  'TMP000000000000000',
  'TMP000000000000000000',
  'TMP0000000000000'
);

foreach($input as $in) {
  $result = preg_match($regex, $in, $matches);

  if ($result) {
    echo "{$in} <b>MATCHES</b><br/>";
  }
  else {
    echo "{$in} does <b>NOT</b> match<br/>";
  }
}

Open in new window

Output
LYBLA000000X matches
xLYBLA000000X does not match
LYBLA00C000X does not match
LYBLA00000X does not match
LYBLA0000000X does not match
LYBLA100000X matches
LYBLA10000VX does not match
LYBLA10000X does not match
LYBLA1000000X does not match
LYBLA200000X matches
LYBLA20000CX does not match
LYBLA20000X does not match
LYBLA2000000X does not match
LYBLA300000X does not match
TMP00000000000000 matches
yTMP00000000000000 does not match
TMP00000000000000abc does not match
TMP000000000000000 does not match
TMP000000000000000000 does not match
TMP0000000000000 does not match

Open in new window

Working sample here
0
 
Dan CraciunIT ConsultantCommented:
When you have multiple possible expressions, use an alternation. Think of it as an "OR".

HTH,
Dan
0
 
Julian HansenCommented:
@Dan you mean like this?
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";
                             - ^ -

Open in new window

0
 
Dan CraciunIT ConsultantCommented:
Yup. Exactly like that.
0
 
Ray PaseurCommented:
Regular expressions are hard to get right, and in general, string processing is hard because of the free-form nature of the inputs.  For example, consider this regex from the initial post:

/LYBLA[0-9]{6}X/i

I "bolded" the pattern modifier "i" to call attention to it, because it makes the regular expression case-insensitive, so both LYBLA and LyBLa would be acceptable substrings.  We do not really know whether case-sensitive matching is important, and some of the solutions proposed here will be case-sensitive.

What would we make of something like this, with a trailing blank?

"LYBLA000000X "

Should that string fail?  Or should we use trim() to remove the trailing whitespace and accept the result?

It's these kinds of questions that are easy to overlook, but that can produce false negatives or false positives.  That's why I prefer to look a little deeper and not limit myself to a single regular expression.  YMMV, but a good set of test cases is always useful.  If you want to use a formal testing system, you need to be able to mock the inputs and compare the outputs.  In my experience, this is easier if you've got your validation routines packaged in a class method, or at least in a function.

Here's another code sample.  You might want to run it and see if it meets your needs, or needs to be tweaked some more.
<?php // demo/temp_rwlloyd71_julian.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// COMPLICATED REGEX - DOES IT REALLY WORK WELL?
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'lybla200000X'
, 'lybla200001X'
, 'LYBLA200001X' // THIS PASSES, BUT SHOULD IT PASS?
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (preg_match($regex, $test)) echo ' PASSED';
}

Open in new window

0
 
Julian HansenCommented:
/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/
This will give a good starting point though - and will address the requirement as stated. If it turns out there are other factors then I don't believe it is difficult to change.

There is merit in splitting the expressions out in some cases - but when it is a simple case as we have now then I see no problem with combining them. Given that RegEx is geared to more complicated functions such as look ahead and look behind - things can get extremely complicated - especially from a support perspective - but in this case the matching is relatively straight forward.
0
 
rwlloyd71Author Commented:
Thanks all for your comments - very helpful.

I am going to go with the

/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/i

example from Dan and Julian as it suits me best. I have other validation on the form, such as min and max length and "strtoupper" in the code so all angles are covered.

I'll report back when I have it working.
0
 
rwlloyd71Author Commented:
All working.  Thanks
0
 
Julian HansenConnect With a Mentor Commented:
Nice catch ray
Try this in your code
$regex = "/^LYBLA[0|1]\d{5}X$|LYBLA20{5}X|^TMP\d{14}$/i";

Open in new window

0
 
rwlloyd71Author Commented:
Thank you!
0
All Courses

From novice to tech pro — start learning today.