Link to home
Start Free TrialLog in
Avatar of Zado
ZadoFlag for United Kingdom of Great Britain and Northern Ireland

asked on

split a short string into three parts

Hi,


I've got a strings like so:
example 1: $str="JP4000A";
example 2: $str="AUS123";
example 3: $str="12B";
example 4: $str="333";
What I want is:
1. recognize the structure of a string, in other words recognize if a format of the string is:
     a. [letters][numbers][letters] (as in example 1 above)
     b. [letters][numbers] (as in example 2 above)
     c. [numbers][letters] (as in example 3 above)
     d. [numbers] (as in example 4 above)
2. create variables:
     first part into $var1;
     second part (if exists) into $var2;
     third part (if exists) into var $var3;
So the final output for 'example 1' above would be: $var1="JP"; $var2="4000"; $var3="A";
example 2: $var1="AUS"; $var2="123"; $var3="";
example 3: $var1="12"; $var2="B"; $var3="";
example 4: $var1="333"; $var2=""; $var3="";


Thanks for any help.
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

I think this can be done with regular expressions that use groups and character classes.  This is a really well-crafted question with clear inputs and outputs.  I'll try to write up an example for you.
Avatar of Zado

ASKER

Nice article, thanks for link! I still have to learn quite a lot about PHP.
http://www.laprbass.com/RAY_temp_zado_regex.php

Explanation: We created three groups of substrings.  The first group is an alphabetic class and has a length that can be from zero to the maximum length of the input.  The second group is numeric, the third group is alphabetic again.

The regular expression breaks the string apart and loads the matched groups into positions of the array $match.

Next I will add the part to assign these to $var1, $var2, $var3.  However I am always a little suspicious of a design pattern that proliferates variables.  I think I might want to keep the arrays.
<?php // RAY_temp_zado_regex.php
error_reporting(E_ALL);
echo "<pre>";

// THE TEST DATA (ADD OTHER ELEMENTS TO FINE-TUNE THE REGULAR EXPRESSION)
$strings = array
( 'JP4000A'
, 'AUS123'
, '12B'
, '333'
)
;

// THE REGEX TO ISOLATE PARTS OF THE STRINGS
$regex
= '/'                // REGEX DELIMITER
. '('                // PARENTH = START OF A GROUP
. '['                // BRACKET = START OF A CHARACTER CLASS
. 'A-Z'              // RANGE OF ALPHABETIC CHARACTERS
. ']'                // END OF THE ALPHABET CLASS
. '{0,7}'            // ZERO OR MORE UP TO StrLen() OF INPUT
. ')'                // END OF THE GROUP
. '('                // START OF NEXT GROUP
. '[0-9]'            // NUMERIC CHARACTER CLASS
. '{0,7}'            // ZERO OR MORE UP TO StrLen() OF INPUT
. ')'                // END OF NUMERIC CLASS GROUP
. '([A-Z]'           // ANOTHER GROUP OF ALPHABETIC CLASS
. '{0,7}'            // ZERO OR MORE UP TO StrLen() OF INPUT
. ')'                // END OF THE LAST ALPHABETIC CLASS
. '/'                // REGEX DELIMITER
;

// SHOW THE WORKING SETS
print_r($regex);
print_r($strings);

// TEST THE REGEX
foreach($strings as $string)
{
    preg_match_all($regex, $string, $match);
    print_r($match);
}

Open in new window

Avatar of Zado

ASKER

Nice one! Thanks Ray.
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Just to put the TDD article in context, I keep a pitch-count clicker nearby because my son plays baseball.  I tested this script 32 times while I was developing it.  If you eliminate the comments and the debugging code, that works out to 32 tests for 52 lines of code.  In my experience that is a good ratio of tests.

It's an interesting problem, thanks for posting it!  
Avatar of Zado

ASKER

Excellent, thanks again :-)
Avatar of Zado

ASKER

a tip here: regex provided by Ray doesn't work for lowercase characters in string, so I used 'strtoupper' function first for my string, then it worked perfectly! :-)
Avatar of Zado

ASKER

...forget my last comment, I just added the following line to the end of regex:
. 'i'         		 // CASE-INSENSITIVE

Open in new window

Ahh, yes... The value of test data again!  The test data posted with this question did not have any lower-case letters, so the question of case-sensitivity was never in play.