Link to home
Start Free TrialLog in
Avatar of Richard Lloyd
Richard Lloyd

asked on

Help with REGEX with 2 different strings

Hi.

I am having difficulty validating a text input in a form.

The entry needs to be either LYBLA000000X (LYBLA Fixed, 000000 any combination of number less that 200000, and a fixed X) or TMP00000000000000 (Fixed TMP, followed by 14 numbers).

I have the first part cracked with

/LYBLA[0-9]{6}X/i

But not sure how to combine them both.

Thanks!
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Don't combine them both!  Instead write a little validation function that makes two separate tests.  Much easier to write and much easier to test :-)
Here's some of the background thinking.
https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html

Please see: https://iconoun.com/demo/temp_rwlloyd71.php
<?php // demo/temp_rwlloyd71.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// VALIDATION FUNCTION
function rwl_validation($str)
{
    // ADD REGULAR EXPRESSIONS AS NEEDED HERE
    $regexes = array
    ( '/LYBLA[0-9]{6}X/i'
    , '/TMP[0-9]{14}/i'
    )
    ;

    // TEST THE INPUT WITH EACH REGEX
    foreach ($regexes as $rgx)
    {
        if (preg_match($rgx, $str)) return TRUE;
    }
    return FALSE;
}

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (rwl_validation($test)) echo ' PASSED';
}

Open in new window

Output shows both success and failure.
LYBLA000000X PASSED
TMP00000000000000 PASSED
Gooseball

Open in new window

The part about "less than 200000" might require and additional bit of testing, but hopefully this gives you a good starting point.
Here's a little more sophisticated example.  Normally, we would not want multiple return  statements from a function, but in the case of very short functions like this, it's OK to overlook that detail.
<?php // demo/temp_rwlloyd71.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// VALIDATION FUNCTION
function rwl_validation($str)
{
    // STRAIGHT REGEX VALIDATION - ADD REGULAR EXPRESSIONS AS NEEDED HERE
    $regexes = array
    ( '/TMP[0-9]{14}/i'
    )
    ;

    // TEST THE INPUT WITH EACH REGEX
    foreach ($regexes as $rgx)
    {
        if (preg_match($rgx, $str)) return TRUE;
    }

    // MORE COMPLEX REGEX + VALUES VALIDATION
    $rgx = '/LYBLA([0-9]{6})X/i';
    if (!preg_match($rgx, $str, $mat)) return FALSE;
    if ($mat[1] <= 200000) return TRUE;

    return FALSE;
}

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'lybla200000X'
, 'lybla200001X'
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (rwl_validation($test)) echo ' PASSED';
}

Open in new window

Avatar of Julian Hansen
Personally I am in favour of doing the match in as few lines as possible - making use of the full power of the regular expression format

Try this

/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/

Open in new window

Code
<?php
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";
$input = array(
  'LYBLA000000X',
  'xLYBLA000000X',
  'LYBLA00C000X',
  'LYBLA00000X',
  'LYBLA0000000X',
  'LYBLA100000X',
  'LYBLA10000VX',
  'LYBLA10000X',
  'LYBLA1000000X',
  'LYBLA200000X',
  'LYBLA20000CX',
  'LYBLA20000X',
  'LYBLA2000000X',
  'LYBLA300000X',
  'TMP00000000000000',
  'yTMP00000000000000',
  'TMP00000000000000abc',
  'TMP000000000000000',
  'TMP000000000000000000',
  'TMP0000000000000'
);

foreach($input as $in) {
  $result = preg_match($regex, $in, $matches);

  if ($result) {
    echo "{$in} <b>MATCHES</b><br/>";
  }
  else {
    echo "{$in} does <b>NOT</b> match<br/>";
  }
}

Open in new window

Output
LYBLA000000X matches
xLYBLA000000X does not match
LYBLA00C000X does not match
LYBLA00000X does not match
LYBLA0000000X does not match
LYBLA100000X matches
LYBLA10000VX does not match
LYBLA10000X does not match
LYBLA1000000X does not match
LYBLA200000X matches
LYBLA20000CX does not match
LYBLA20000X does not match
LYBLA2000000X does not match
LYBLA300000X does not match
TMP00000000000000 matches
yTMP00000000000000 does not match
TMP00000000000000abc does not match
TMP000000000000000 does not match
TMP000000000000000000 does not match
TMP0000000000000 does not match

Open in new window

Working sample here
When you have multiple possible expressions, use an alternation. Think of it as an "OR".

HTH,
Dan
@Dan you mean like this?
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";
                             - ^ -

Open in new window

Yup. Exactly like that.
Regular expressions are hard to get right, and in general, string processing is hard because of the free-form nature of the inputs.  For example, consider this regex from the initial post:

/LYBLA[0-9]{6}X/i

I "bolded" the pattern modifier "i" to call attention to it, because it makes the regular expression case-insensitive, so both LYBLA and LyBLa would be acceptable substrings.  We do not really know whether case-sensitive matching is important, and some of the solutions proposed here will be case-sensitive.

What would we make of something like this, with a trailing blank?

"LYBLA000000X "

Should that string fail?  Or should we use trim() to remove the trailing whitespace and accept the result?

It's these kinds of questions that are easy to overlook, but that can produce false negatives or false positives.  That's why I prefer to look a little deeper and not limit myself to a single regular expression.  YMMV, but a good set of test cases is always useful.  If you want to use a formal testing system, you need to be able to mock the inputs and compare the outputs.  In my experience, this is easier if you've got your validation routines packaged in a class method, or at least in a function.

Here's another code sample.  You might want to run it and see if it meets your needs, or needs to be tweaked some more.
<?php // demo/temp_rwlloyd71_julian.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// COMPLICATED REGEX - DOES IT REALLY WORK WELL?
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'lybla200000X'
, 'lybla200001X'
, 'LYBLA200001X' // THIS PASSES, BUT SHOULD IT PASS?
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (preg_match($regex, $test)) echo ' PASSED';
}

Open in new window

/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/
This will give a good starting point though - and will address the requirement as stated. If it turns out there are other factors then I don't believe it is difficult to change.

There is merit in splitting the expressions out in some cases - but when it is a simple case as we have now then I see no problem with combining them. Given that RegEx is geared to more complicated functions such as look ahead and look behind - things can get extremely complicated - especially from a support perspective - but in this case the matching is relatively straight forward.
Avatar of Richard Lloyd
Richard Lloyd

ASKER

Thanks all for your comments - very helpful.

I am going to go with the

/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/i

example from Dan and Julian as it suits me best. I have other validation on the form, such as min and max length and "strtoupper" in the code so all angles are covered.

I'll report back when I have it working.
All working.  Thanks
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you!