Solved

Help with REGEX with 2 different strings

Posted on 2016-08-18
14
49 Views
Last Modified: 2016-09-06
Hi.

I am having difficulty validating a text input in a form.

The entry needs to be either LYBLA000000X (LYBLA Fixed, 000000 any combination of number less that 200000, and a fixed X) or TMP00000000000000 (Fixed TMP, followed by 14 numbers).

I have the first part cracked with

/LYBLA[0-9]{6}X/i

But not sure how to combine them both.

Thanks!
0
Comment
Question by:rwlloyd71
  • 5
  • 4
  • 3
  • +1
14 Comments
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 41760973
Don't combine them both!  Instead write a little validation function that makes two separate tests.  Much easier to write and much easier to test :-)
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 41760992
Here's some of the background thinking.
https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html

Please see: https://iconoun.com/demo/temp_rwlloyd71.php
<?php // demo/temp_rwlloyd71.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// VALIDATION FUNCTION
function rwl_validation($str)
{
    // ADD REGULAR EXPRESSIONS AS NEEDED HERE
    $regexes = array
    ( '/LYBLA[0-9]{6}X/i'
    , '/TMP[0-9]{14}/i'
    )
    ;

    // TEST THE INPUT WITH EACH REGEX
    foreach ($regexes as $rgx)
    {
        if (preg_match($rgx, $str)) return TRUE;
    }
    return FALSE;
}

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (rwl_validation($test)) echo ' PASSED';
}

Open in new window

Output shows both success and failure.
LYBLA000000X PASSED
TMP00000000000000 PASSED
Gooseball

Open in new window

The part about "less than 200000" might require and additional bit of testing, but hopefully this gives you a good starting point.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 41761012
Here's a little more sophisticated example.  Normally, we would not want multiple return  statements from a function, but in the case of very short functions like this, it's OK to overlook that detail.
<?php // demo/temp_rwlloyd71.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// VALIDATION FUNCTION
function rwl_validation($str)
{
    // STRAIGHT REGEX VALIDATION - ADD REGULAR EXPRESSIONS AS NEEDED HERE
    $regexes = array
    ( '/TMP[0-9]{14}/i'
    )
    ;

    // TEST THE INPUT WITH EACH REGEX
    foreach ($regexes as $rgx)
    {
        if (preg_match($rgx, $str)) return TRUE;
    }

    // MORE COMPLEX REGEX + VALUES VALIDATION
    $rgx = '/LYBLA([0-9]{6})X/i';
    if (!preg_match($rgx, $str, $mat)) return FALSE;
    if ($mat[1] <= 200000) return TRUE;

    return FALSE;
}

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'lybla200000X'
, 'lybla200001X'
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (rwl_validation($test)) echo ' PASSED';
}

Open in new window

0
 
LVL 51

Expert Comment

by:Julian Hansen
ID: 41761024
Personally I am in favour of doing the match in as few lines as possible - making use of the full power of the regular expression format

Try this

/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/

Open in new window

Code
<?php
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";
$input = array(
  'LYBLA000000X',
  'xLYBLA000000X',
  'LYBLA00C000X',
  'LYBLA00000X',
  'LYBLA0000000X',
  'LYBLA100000X',
  'LYBLA10000VX',
  'LYBLA10000X',
  'LYBLA1000000X',
  'LYBLA200000X',
  'LYBLA20000CX',
  'LYBLA20000X',
  'LYBLA2000000X',
  'LYBLA300000X',
  'TMP00000000000000',
  'yTMP00000000000000',
  'TMP00000000000000abc',
  'TMP000000000000000',
  'TMP000000000000000000',
  'TMP0000000000000'
);

foreach($input as $in) {
  $result = preg_match($regex, $in, $matches);

  if ($result) {
    echo "{$in} <b>MATCHES</b><br/>";
  }
  else {
    echo "{$in} does <b>NOT</b> match<br/>";
  }
}

Open in new window

Output
LYBLA000000X matches
xLYBLA000000X does not match
LYBLA00C000X does not match
LYBLA00000X does not match
LYBLA0000000X does not match
LYBLA100000X matches
LYBLA10000VX does not match
LYBLA10000X does not match
LYBLA1000000X does not match
LYBLA200000X matches
LYBLA20000CX does not match
LYBLA20000X does not match
LYBLA2000000X does not match
LYBLA300000X does not match
TMP00000000000000 matches
yTMP00000000000000 does not match
TMP00000000000000abc does not match
TMP000000000000000 does not match
TMP000000000000000000 does not match
TMP0000000000000 does not match

Open in new window

Working sample here
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 41761045
When you have multiple possible expressions, use an alternation. Think of it as an "OR".

HTH,
Dan
0
 
LVL 51

Expert Comment

by:Julian Hansen
ID: 41761073
@Dan you mean like this?
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";
                             - ^ -

Open in new window

0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 41761104
Yup. Exactly like that.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 108

Expert Comment

by:Ray Paseur
ID: 41761113
Regular expressions are hard to get right, and in general, string processing is hard because of the free-form nature of the inputs.  For example, consider this regex from the initial post:

/LYBLA[0-9]{6}X/i

I "bolded" the pattern modifier "i" to call attention to it, because it makes the regular expression case-insensitive, so both LYBLA and LyBLa would be acceptable substrings.  We do not really know whether case-sensitive matching is important, and some of the solutions proposed here will be case-sensitive.

What would we make of something like this, with a trailing blank?

"LYBLA000000X "

Should that string fail?  Or should we use trim() to remove the trailing whitespace and accept the result?

It's these kinds of questions that are easy to overlook, but that can produce false negatives or false positives.  That's why I prefer to look a little deeper and not limit myself to a single regular expression.  YMMV, but a good set of test cases is always useful.  If you want to use a formal testing system, you need to be able to mock the inputs and compare the outputs.  In my experience, this is easier if you've got your validation routines packaged in a class method, or at least in a function.

Here's another code sample.  You might want to run it and see if it meets your needs, or needs to be tweaked some more.
<?php // demo/temp_rwlloyd71_julian.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// COMPLICATED REGEX - DOES IT REALLY WORK WELL?
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/";

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'lybla200000X'
, 'lybla200001X'
, 'LYBLA200001X' // THIS PASSES, BUT SHOULD IT PASS?
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (preg_match($regex, $test)) echo ' PASSED';
}

Open in new window

0
 
LVL 51

Expert Comment

by:Julian Hansen
ID: 41761122
/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/
This will give a good starting point though - and will address the requirement as stated. If it turns out there are other factors then I don't believe it is difficult to change.

There is merit in splitting the expressions out in some cases - but when it is a simple case as we have now then I see no problem with combining them. Given that RegEx is geared to more complicated functions such as look ahead and look behind - things can get extremely complicated - especially from a support perspective - but in this case the matching is relatively straight forward.
0
 

Author Comment

by:rwlloyd71
ID: 41761231
Thanks all for your comments - very helpful.

I am going to go with the

/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/i

example from Dan and Julian as it suits me best. I have other validation on the form, such as min and max length and "strtoupper" in the code so all angles are covered.

I'll report back when I have it working.
0
 

Author Comment

by:rwlloyd71
ID: 41761346
All working.  Thanks
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 250 total points
ID: 41766159
Here is an example of why it is important to state the problem clearly and create good test cases.  Professional programmers who understand automated testing usually try to avoid writing complicated rules for string validation.  We prefer to get closer to the original problems in the data and isolate the issues before they become complicated string validation rules.  This helps us avoid regular expressions with holes that let unwanted data fall through into the soup!
The entry needs to be either LYBLA000000X (LYBLA Fixed, 000000 any combination of number less that 200000, and a fixed X) ...
https://iconoun.com/demo/temp_rwlloyd71_julian.php
<?php // demo/temp_rwlloyd71_julian.php
/**
 * https://www.experts-exchange.com/questions/28964110/Help-with-REGEX-with-2-different-strings.html#a41761231
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);
echo '<pre>';

// COMPLICATED REGEX - DOES IT REALLY WORK WELL?
$regex = "/^LYBLA[0|1|2]\d{5}X$|^TMP\d{14}$/i";

// TEST CASES
$tests = array
( 'LYBLA000000X'
, 'LYBLA200001X' // THIS PASSES, BUT SHOULD IT PASS ("...less that 200000")?
, 'LYBLA244444X' // THIS PASSES, BUT SHOULD IT PASS ("...less that 200000")?
, 'LYBLA299999X' // THIS PASSES, BUT SHOULD IT PASS ("...less that 200000")?
, 'TMP00000000000000'
, 'Gooseball'
)
;

// RUN THE AUTOMATED TESTS
foreach ($tests as $test)
{
    echo PHP_EOL . $test;
    if (preg_match($regex, $test)) echo ' PASSED';
}

Open in new window

Outputs:
LYBLA000000X PASSED
LYBLA200001X PASSED
LYBLA244444X PASSED
LYBLA299999X PASSED
TMP00000000000000 PASSED
Gooseball

Open in new window

0
 
LVL 51

Assisted Solution

by:Julian Hansen
Julian Hansen earned 250 total points
ID: 41766413
Nice catch ray
Try this in your code
$regex = "/^LYBLA[0|1]\d{5}X$|LYBLA20{5}X|^TMP\d{14}$/i";

Open in new window

0
 

Author Closing Comment

by:rwlloyd71
ID: 41786757
Thank you!
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Suggested Solutions

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now