Solved

Help with REGEX with 2 different strings

Posted on 2016-08-18
14
63 Views
Last Modified: 2016-09-06
Hi.

I am having difficulty validating a text input in a form.

The entry needs to be either LYBLA000000X (LYBLA Fixed, 000000 any combination of number less that 200000, and a fixed X) or TMP00000000000000 (Fixed TMP, followed by 14 numbers).

I have the first part cracked with

/LYBLA[0-9]{6}X/i

But not sure how to combine them both.

Thanks!
0
Comment
Question by:rwlloyd71
  • 5
  • 4
  • 3
  • +1
14 Comments
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 41760973
Don't combine them both!  Instead write a little validation function that makes two separate tests.  Much easier to write and much easier to test :-)
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 41760992
Here's some of the background thinking.
https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html

Please see: https://iconoun.com/demo/temp_rwlloyd71.php
<?php // demo/temp_rwlloyd71.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// VALIDATION FUNCTION
function rwl_validation($str)
{
    // ADD REGULAR EXPRESSIONS AS NEEDED HERE
    $regexes = array
    ( '/LYBLA[0-9]{6}X/i'
    , '/TMP[0-9]{14}/i'
    )
    ;

    // TEST THE INPUT WITH EACH REGEX
    foreach ($regexes as $rgx)
    {
        if (preg_match($rgx, $str)) return TRUE;
    }
    return FALSE;
}

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (rwl_validation($test)) echo ' PASSED';
}

Open in new window

Output shows both success and failure.
LYBLA000000X PASSED
TMP00000000000000 PASSED
Gooseball

Open in new window

The part about "less than 200000" might require and additional bit of testing, but hopefully this gives you a good starting point.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 41761012
Here's a little more sophisticated example.  Normally, we would not want multiple return  statements from a function, but in the case of very short functions like this, it's OK to overlook that detail.
<?php // demo/temp_rwlloyd71.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// VALIDATION FUNCTION
function rwl_validation($str)
{
    // STRAIGHT REGEX VALIDATION - ADD REGULAR EXPRESSIONS AS NEEDED HERE
    $regexes = array
    ( '/TMP[0-9]{14}/i'
    )
    ;

    // TEST THE INPUT WITH EACH REGEX
    foreach ($regexes as $rgx)
    {
        if (preg_match($rgx, $str)) return TRUE;
    }

    // MORE COMPLEX REGEX + VALUES VALIDATION
    $rgx = '/LYBLA([0-9]{6})X/i';
    if (!preg_match($rgx, $str, $mat)) return FALSE;
    if ($mat[1] <= 200000) return TRUE;

    return FALSE;
}

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'lybla200000X'
, 'lybla200001X'
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (rwl_validation($test)) echo ' PASSED';
}

Open in new window

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 56

Expert Comment

by:Julian Hansen
ID: 41761024
Personally I am in favour of doing the match in as few lines as possible - making use of the full power of the regular expression format

Try this

/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/

Open in new window

Code
<?php
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";
$input = array(
  'LYBLA000000X',
  'xLYBLA000000X',
  'LYBLA00C000X',
  'LYBLA00000X',
  'LYBLA0000000X',
  'LYBLA100000X',
  'LYBLA10000VX',
  'LYBLA10000X',
  'LYBLA1000000X',
  'LYBLA200000X',
  'LYBLA20000CX',
  'LYBLA20000X',
  'LYBLA2000000X',
  'LYBLA300000X',
  'TMP00000000000000',
  'yTMP00000000000000',
  'TMP00000000000000abc',
  'TMP000000000000000',
  'TMP000000000000000000',
  'TMP0000000000000'
);

foreach($input as $in) {
  $result = preg_match($regex, $in, $matches);

  if ($result) {
    echo "{$in} <b>MATCHES</b><br/>";
  }
  else {
    echo "{$in} does <b>NOT</b> match<br/>";
  }
}

Open in new window

Output
LYBLA000000X matches
xLYBLA000000X does not match
LYBLA00C000X does not match
LYBLA00000X does not match
LYBLA0000000X does not match
LYBLA100000X matches
LYBLA10000VX does not match
LYBLA10000X does not match
LYBLA1000000X does not match
LYBLA200000X matches
LYBLA20000CX does not match
LYBLA20000X does not match
LYBLA2000000X does not match
LYBLA300000X does not match
TMP00000000000000 matches
yTMP00000000000000 does not match
TMP00000000000000abc does not match
TMP000000000000000 does not match
TMP000000000000000000 does not match
TMP0000000000000 does not match

Open in new window

Working sample here
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41761045
When you have multiple possible expressions, use an alternation. Think of it as an "OR".

HTH,
Dan
0
 
LVL 56

Expert Comment

by:Julian Hansen
ID: 41761073
@Dan you mean like this?
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";
                             - ^ -

Open in new window

0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41761104
Yup. Exactly like that.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 41761113
Regular expressions are hard to get right, and in general, string processing is hard because of the free-form nature of the inputs.  For example, consider this regex from the initial post:

/LYBLA[0-9]{6}X/i

I "bolded" the pattern modifier "i" to call attention to it, because it makes the regular expression case-insensitive, so both LYBLA and LyBLa would be acceptable substrings.  We do not really know whether case-sensitive matching is important, and some of the solutions proposed here will be case-sensitive.

What would we make of something like this, with a trailing blank?

"LYBLA000000X "

Should that string fail?  Or should we use trim() to remove the trailing whitespace and accept the result?

It's these kinds of questions that are easy to overlook, but that can produce false negatives or false positives.  That's why I prefer to look a little deeper and not limit myself to a single regular expression.  YMMV, but a good set of test cases is always useful.  If you want to use a formal testing system, you need to be able to mock the inputs and compare the outputs.  In my experience, this is easier if you've got your validation routines packaged in a class method, or at least in a function.

Here's another code sample.  You might want to run it and see if it meets your needs, or needs to be tweaked some more.
<?php // demo/temp_rwlloyd71_julian.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// COMPLICATED REGEX - DOES IT REALLY WORK WELL?
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'lybla200000X'
, 'lybla200001X'
, 'LYBLA200001X' // THIS PASSES, BUT SHOULD IT PASS?
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (preg_match($regex, $test)) echo ' PASSED';
}

Open in new window

0
 
LVL 56

Expert Comment

by:Julian Hansen
ID: 41761122
/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/
This will give a good starting point though - and will address the requirement as stated. If it turns out there are other factors then I don't believe it is difficult to change.

There is merit in splitting the expressions out in some cases - but when it is a simple case as we have now then I see no problem with combining them. Given that RegEx is geared to more complicated functions such as look ahead and look behind - things can get extremely complicated - especially from a support perspective - but in this case the matching is relatively straight forward.
0
 

Author Comment

by:rwlloyd71
ID: 41761231
Thanks all for your comments - very helpful.

I am going to go with the

/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/i

example from Dan and Julian as it suits me best. I have other validation on the form, such as min and max length and "strtoupper" in the code so all angles are covered.

I'll report back when I have it working.
0
 

Author Comment

by:rwlloyd71
ID: 41761346
All working.  Thanks
0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 250 total points
ID: 41766159
Here is an example of why it is important to state the problem clearly and create good test cases.  Professional programmers who understand automated testing usually try to avoid writing complicated rules for string validation.  We prefer to get closer to the original problems in the data and isolate the issues before they become complicated string validation rules.  This helps us avoid regular expressions with holes that let unwanted data fall through into the soup!
The entry needs to be either LYBLA000000X (LYBLA Fixed, 000000 any combination of number less that 200000, and a fixed X) ...
https://iconoun.com/demo/temp_rwlloyd71_julian.php
<?php // demo/temp_rwlloyd71_julian.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html#a41761231
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// COMPLICATED REGEX - DOES IT REALLY WORK WELL?
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/i";

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'LYBLA200001X' // THIS PASSES, BUT SHOULD IT PASS ("...less that 200000")?
, 'LYBLA244444X' // THIS PASSES, BUT SHOULD IT PASS ("...less that 200000")?
, 'LYBLA299999X' // THIS PASSES, BUT SHOULD IT PASS ("...less that 200000")?
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (preg_match($regex, $test)) echo ' PASSED';
}

Open in new window

Outputs:
LYBLA000000X PASSED
LYBLA200001X PASSED
LYBLA244444X PASSED
LYBLA299999X PASSED
TMP00000000000000 PASSED
Gooseball

Open in new window

0
 
LVL 56

Assisted Solution

by:Julian Hansen
Julian Hansen earned 250 total points
ID: 41766413
Nice catch ray
Try this in your code
$regex = "/^LYBLA[0|1]\d{5}X$|LYBLA20{5}X|^TMP\d{14}$/i";

Open in new window

0
 

Author Closing Comment

by:rwlloyd71
ID: 41786757
Thank you!
0

Featured Post

Secure Your Active Directory - April 20, 2017

Active Directory plays a critical role in your company’s IT infrastructure and keeping it secure in today’s hacker-infested world is a must.
Microsoft published 300+ pages of guidance, but who has the time, money, and resources to implement? Register now to find an easier way.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

680 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question