Zado
asked on
split a short string into three parts
Hi,
I've got a strings like so:
example 1: $str="JP4000A";
example 2: $str="AUS123";
example 3: $str="12B";
example 4: $str="333";
What I want is:
1. recognize the structure of a string, in other words recognize if a format of the string is:
a. [letters][numbers][letters ] (as in example 1 above)
b. [letters][numbers] (as in example 2 above)
c. [numbers][letters] (as in example 3 above)
d. [numbers] (as in example 4 above)
2. create variables:
first part into $var1;
second part (if exists) into $var2;
third part (if exists) into var $var3;
So the final output for 'example 1' above would be: $var1="JP"; $var2="4000"; $var3="A";
example 2: $var1="AUS"; $var2="123"; $var3="";
example 3: $var1="12"; $var2="B"; $var3="";
example 4: $var1="333"; $var2=""; $var3="";
Thanks for any help.
I've got a strings like so:
example 1: $str="JP4000A";
example 2: $str="AUS123";
example 3: $str="12B";
example 4: $str="333";
What I want is:
1. recognize the structure of a string, in other words recognize if a format of the string is:
a. [letters][numbers][letters
b. [letters][numbers] (as in example 2 above)
c. [numbers][letters] (as in example 3 above)
d. [numbers] (as in example 4 above)
2. create variables:
first part into $var1;
second part (if exists) into $var2;
third part (if exists) into var $var3;
So the final output for 'example 1' above would be: $var1="JP"; $var2="4000"; $var3="A";
example 2: $var1="AUS"; $var2="123"; $var3="";
example 3: $var1="12"; $var2="B"; $var3="";
example 4: $var1="333"; $var2=""; $var3="";
Thanks for any help.
I think this can be done with regular expressions that use groups and character classes. This is a really well-crafted question with clear inputs and outputs. I'll try to write up an example for you.
While I am working on the demonstration script, have a look at this article on TDD:
https://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html
https://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html
ASKER
Nice article, thanks for link! I still have to learn quite a lot about PHP.
http://www.laprbass.com/RAY_temp_zado_regex.php
Explanation: We created three groups of substrings. The first group is an alphabetic class and has a length that can be from zero to the maximum length of the input. The second group is numeric, the third group is alphabetic again.
The regular expression breaks the string apart and loads the matched groups into positions of the array $match.
Next I will add the part to assign these to $var1, $var2, $var3. However I am always a little suspicious of a design pattern that proliferates variables. I think I might want to keep the arrays.
Explanation: We created three groups of substrings. The first group is an alphabetic class and has a length that can be from zero to the maximum length of the input. The second group is numeric, the third group is alphabetic again.
The regular expression breaks the string apart and loads the matched groups into positions of the array $match.
Next I will add the part to assign these to $var1, $var2, $var3. However I am always a little suspicious of a design pattern that proliferates variables. I think I might want to keep the arrays.
<?php // RAY_temp_zado_regex.php
error_reporting(E_ALL);
echo "<pre>";
// THE TEST DATA (ADD OTHER ELEMENTS TO FINE-TUNE THE REGULAR EXPRESSION)
$strings = array
( 'JP4000A'
, 'AUS123'
, '12B'
, '333'
)
;
// THE REGEX TO ISOLATE PARTS OF THE STRINGS
$regex
= '/' // REGEX DELIMITER
. '(' // PARENTH = START OF A GROUP
. '[' // BRACKET = START OF A CHARACTER CLASS
. 'A-Z' // RANGE OF ALPHABETIC CHARACTERS
. ']' // END OF THE ALPHABET CLASS
. '{0,7}' // ZERO OR MORE UP TO StrLen() OF INPUT
. ')' // END OF THE GROUP
. '(' // START OF NEXT GROUP
. '[0-9]' // NUMERIC CHARACTER CLASS
. '{0,7}' // ZERO OR MORE UP TO StrLen() OF INPUT
. ')' // END OF NUMERIC CLASS GROUP
. '([A-Z]' // ANOTHER GROUP OF ALPHABETIC CLASS
. '{0,7}' // ZERO OR MORE UP TO StrLen() OF INPUT
. ')' // END OF THE LAST ALPHABETIC CLASS
. '/' // REGEX DELIMITER
;
// SHOW THE WORKING SETS
print_r($regex);
print_r($strings);
// TEST THE REGEX
foreach($strings as $string)
{
preg_match_all($regex, $string, $match);
print_r($match);
}
ASKER
Nice one! Thanks Ray.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Just to put the TDD article in context, I keep a pitch-count clicker nearby because my son plays baseball. I tested this script 32 times while I was developing it. If you eliminate the comments and the debugging code, that works out to 32 tests for 52 lines of code. In my experience that is a good ratio of tests.
It's an interesting problem, thanks for posting it!
It's an interesting problem, thanks for posting it!
ASKER
Excellent, thanks again :-)
ASKER
a tip here: regex provided by Ray doesn't work for lowercase characters in string, so I used 'strtoupper' function first for my string, then it worked perfectly! :-)
ASKER
...forget my last comment, I just added the following line to the end of regex:
. 'i' // CASE-INSENSITIVE
Ahh, yes... The value of test data again! The test data posted with this question did not have any lower-case letters, so the question of case-sensitivity was never in play.