split a short string into three parts

Hi,


I've got a strings like so:
example 1: $str="JP4000A";
example 2: $str="AUS123";
example 3: $str="12B";
example 4: $str="333";
What I want is:
1. recognize the structure of a string, in other words recognize if a format of the string is:
     a. [letters][numbers][letters] (as in example 1 above)
     b. [letters][numbers] (as in example 2 above)
     c. [numbers][letters] (as in example 3 above)
     d. [numbers] (as in example 4 above)
2. create variables:
     first part into $var1;
     second part (if exists) into $var2;
     third part (if exists) into var $var3;
So the final output for 'example 1' above would be: $var1="JP"; $var2="4000"; $var3="A";
example 2: $var1="AUS"; $var2="123"; $var3="";
example 3: $var1="12"; $var2="B"; $var3="";
example 4: $var1="333"; $var2=""; $var3="";


Thanks for any help.
LVL 8
ZadoAsked:
Who is Participating?
 
Ray PaseurConnect With a Mentor Commented:
Here is the version that assigns the individual variables to the parts.
http://www.laprbass.com/RAY_temp_zado_regex.php
Outputs:
/([A-Z]{0,})([0-9]{0,})([A-Z]{0,})/
JP4000A YIELDS
var1 = JP var2 = 4000 var3 = A
AUS123 YIELDS
var1 = AUS var2 = 123 var3 =
12B YIELDS
var1 = 12 var2 = B var3 =
333 YIELDS
var1 = 333 var2 =  var3 =

Best regards, ~Ray
<?php // RAY_temp_zado_regex.php
error_reporting(E_ALL);
echo "<pre>";

// THE TEST DATA (ADD OTHER ELEMENTS TO FINE-TUNE THE REGULAR EXPRESSION)
$strings = array
( 'JP4000A'
, 'AUS123'
, '12B'
, '333'
)
;

// THE REGEX TO ISOLATE PARTS OF THE STRINGS
$regex
= '/'                // REGEX DELIMITER
. '('                // PARENTH = START OF A GROUP
. '['                // BRACKET = START OF A CHARACTER CLASS
. 'A-Z'              // RANGE OF ALPHABETIC CHARACTERS
. ']'                // END OF THE ALPHABET CLASS
. '{0,}'             // ZERO OR MORE UP TO StrLen() OF INPUT
. ')'                // END OF THE GROUP
. '('                // START OF NEXT GROUP
. '[0-9]'            // NUMERIC CHARACTER CLASS
. '{0,}'             // ZERO OR MORE UP TO StrLen() OF INPUT
. ')'                // END OF NUMERIC CLASS GROUP
. '([A-Z]'           // ANOTHER GROUP OF ALPHABETIC CLASS
. '{0,}'             // ZERO OR MORE UP TO StrLen() OF INPUT
. ')'                // END OF THE LAST ALPHABETIC CLASS
. '/'                // REGEX DELIMITER
;

// SHOW THE REGEX
print_r($regex);

// ASSIGN VARIABLES BY NAME
foreach($strings as $string)
{
    preg_match_all($regex, $string, $match);

    // INITIALIZE OUR VARIABLES
    $new  = NULL;
    $var1 = NULL;
    $var2 = NULL;
    $var3 = NULL;

    // COLLAPSE THE ARRAY FROM PREG-MATCH
    foreach ($match as $thing)
    {
        $new[] = $thing[0];
    }

    // THIS HAS THE ENTIRE ORIGINAL STRING
    $var0 = $new[0];
    unset($new[0]);

    // ELIMINATE ANY EMPTY POSITIONS
    foreach ($new as $key => $val)
    {
        if (empty($val)) unset($new[$key]);
    }

    // RESET THE ARRAY AND ASSIGN THE NEW KEYS
    $new = array_values($new);
    $out = array();
    foreach ($new as $key => $val)
    {
        $n = $key+1;
        $out['var' . "$n"] = $val;
    }

    // INJECT THE NEW VARIABLES INTO THE CURRENT SCOPE
    extract($out);

    // SHOW THE WORK PRODUCT
    echo PHP_EOL . "$var0 YIELDS <br/>var1 = $var1 var2 = $var2 var3 = $var3";

}

Open in new window

0
 
Ray PaseurCommented:
I think this can be done with regular expressions that use groups and character classes.  This is a really well-crafted question with clear inputs and outputs.  I'll try to write up an example for you.
0
 
Ray PaseurCommented:
While I am working on the demonstration script, have a look at this article on TDD:
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html
0
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

 
ZadoAuthor Commented:
Nice article, thanks for link! I still have to learn quite a lot about PHP.
0
 
Ray PaseurCommented:
http://www.laprbass.com/RAY_temp_zado_regex.php

Explanation: We created three groups of substrings.  The first group is an alphabetic class and has a length that can be from zero to the maximum length of the input.  The second group is numeric, the third group is alphabetic again.

The regular expression breaks the string apart and loads the matched groups into positions of the array $match.

Next I will add the part to assign these to $var1, $var2, $var3.  However I am always a little suspicious of a design pattern that proliferates variables.  I think I might want to keep the arrays.
<?php // RAY_temp_zado_regex.php
error_reporting(E_ALL);
echo "<pre>";

// THE TEST DATA (ADD OTHER ELEMENTS TO FINE-TUNE THE REGULAR EXPRESSION)
$strings = array
( 'JP4000A'
, 'AUS123'
, '12B'
, '333'
)
;

// THE REGEX TO ISOLATE PARTS OF THE STRINGS
$regex
= '/'                // REGEX DELIMITER
. '('                // PARENTH = START OF A GROUP
. '['                // BRACKET = START OF A CHARACTER CLASS
. 'A-Z'              // RANGE OF ALPHABETIC CHARACTERS
. ']'                // END OF THE ALPHABET CLASS
. '{0,7}'            // ZERO OR MORE UP TO StrLen() OF INPUT
. ')'                // END OF THE GROUP
. '('                // START OF NEXT GROUP
. '[0-9]'            // NUMERIC CHARACTER CLASS
. '{0,7}'            // ZERO OR MORE UP TO StrLen() OF INPUT
. ')'                // END OF NUMERIC CLASS GROUP
. '([A-Z]'           // ANOTHER GROUP OF ALPHABETIC CLASS
. '{0,7}'            // ZERO OR MORE UP TO StrLen() OF INPUT
. ')'                // END OF THE LAST ALPHABETIC CLASS
. '/'                // REGEX DELIMITER
;

// SHOW THE WORKING SETS
print_r($regex);
print_r($strings);

// TEST THE REGEX
foreach($strings as $string)
{
    preg_match_all($regex, $string, $match);
    print_r($match);
}

Open in new window

0
 
ZadoAuthor Commented:
Nice one! Thanks Ray.
0
 
Ray PaseurCommented:
Just to put the TDD article in context, I keep a pitch-count clicker nearby because my son plays baseball.  I tested this script 32 times while I was developing it.  If you eliminate the comments and the debugging code, that works out to 32 tests for 52 lines of code.  In my experience that is a good ratio of tests.

It's an interesting problem, thanks for posting it!  
0
 
ZadoAuthor Commented:
Excellent, thanks again :-)
0
 
ZadoAuthor Commented:
a tip here: regex provided by Ray doesn't work for lowercase characters in string, so I used 'strtoupper' function first for my string, then it worked perfectly! :-)
0
 
ZadoAuthor Commented:
...forget my last comment, I just added the following line to the end of regex:
. 'i'         		 // CASE-INSENSITIVE

Open in new window

0
 
Ray PaseurCommented:
Ahh, yes... The value of test data again!  The test data posted with this question did not have any lower-case letters, so the question of case-sensitivity was never in play.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.