Regex Matching Block of Text Using PHP

I have a page full of of text that repeats in blocks as few as 1 time up to 500 times.  The text block looks like this:

12-12345 Name
Some Text
Some Text
More Text

The next block starts directly under the above block in the same format:
12-12346 Name
Some Text
More Text
More Text
More Text

At the end of each line there is a new line character.

Between the ID numbers (12-12345, etc.) there could be 1 line of text or 10.

I need to way to grab the first ID number and the rest of the text up to the 2nd ID number using regular expressions and PHP.  I would like the first ID number and each line thereafter to be in an array if possible.

I hope I explained this correctly.  Any help would be GREATLY APPRECIATED!

biffsmithAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Terry WoodsIT GuruCommented:
<?php
$sourcestring="your source string";
preg_match_all('/^(\d\d-\d{5})([^\n]*\n)((?:.(?!\d\d-\d{5}))*)/ms',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

(Thanks to ddrudik's for use of his website www.myregextester.com  to generate the code)
0
Terry WoodsIT GuruCommented:
You can also add:

unset($matches[0]);

to leave you with just the results you want in the array.
0
biffsmithAuthor Commented:
Thanks for the code and the help, but I don't get the desired results.  I get this:
Array
(
    [0] => Array
        (
        )

    [1] => Array
        (
        )

    [2] => Array
        (
        )

    [3] => Array
        (
        )

)
0
IT Pros Agree: AI and Machine Learning Key

We’d all like to think our company’s data is well protected, but when you ask IT professionals they admit the data probably is not as safe as it could be.

Terry WoodsIT GuruCommented:
Can you post the data you tried with? Your sample worked fine.
0
biffsmithAuthor Commented:
I'm sorry - the problem was with my data.  Ok - so should I be seeing ALL of the arrays or just the first one, because I'm just seeing the first block of text split into an array.  (so excited this is working, though - thanks!)
0
Terry WoodsIT GuruCommented:
The print_r should show all matches.
0
biffsmithAuthor Commented:
Ok - then it's only grabbing the first block and disregarding the rest.   I wish I could post my data, but I can't because of what it contains.  Argh!  Frustrating!
0
Terry WoodsIT GuruCommented:
Maybe the id format isn't consistent? It's currently looking for 2 digits, a dash, then 5 digits.
0
Terry WoodsIT GuruCommented:
We could make it look for say 2 sets of digits separated by a - character, but only at the start of a line?

<?php
$sourcestring="your source string";
preg_match_all('/^(\d+-\d+)([^\n]*\n)((?:.(?!\d\d-\d{5}))*)/ms',$sourcestring,$matches);
unset($matches[0]);
echo "<pre>".print_r($matches,true);
?>
0
biffsmithAuthor Commented:
PROGRESS!  It appears that there could be none to several spaces before the 2nd ID number.  Removing those spaces makes it work!!  Now I need to figure out how to get around that!  :)
0
Terry WoodsIT GuruCommented:
<?php
$sourcestring="your source string";
preg_match_all('/^(\d+- *\d+)([^\n]*\n)((?:.(?!\d\d- *\d{5}))*)/ms',$sourcestring,$matches);
unset($matches[0]);
echo "<pre>".print_r($matches,true);
?>
0
Terry WoodsIT GuruCommented:
Apologies, missed changing the lookahead for the prev change:

<?php
$sourcestring="your source string";
preg_match_all('/^(\d+- *\d+)([^\n]*\n)((?:.(?!\d+- *\d+))*)/ms',$sourcestring,$matches);
unset($matches[0]);
echo "<pre>".print_r($matches,true);
?>

or:

<?php
$sourcestring="your source string";
preg_match_all('/^(\d\d- *\d{5})([^\n]*\n)((?:.(?!\d\d- *\d{5}))*)/ms',$sourcestring,$matches);
unset($matches[0]);
echo "<pre>".print_r($matches,true);
?>
0
biffsmithAuthor Commented:
Works great if there are no spaces or tabs before the ID number.  :)
0
Terry WoodsIT GuruCommented:
<?php
$sourcestring="your source string";
preg_match_all('/^\s*?(\d+- *\d+)([^\n]*\n)((?:.(?!\d+- *\d+))*)/ms',$sourcestring,$matches);
unset($matches[0]);
echo "<pre>".print_r($matches,true);
?>
0
biffsmithAuthor Commented:
Thank you SO much!  I am ALMOST there.  I'm getting the data back like this:

array[1] - id number
array[2] - name
array[3] - everything else (this could be 1 line or 10 lines)

What I really need to isolate is:

array[1] - id number
array[2] - name
LAST LINE before the next id number

Did I explain that correctly?  
0
Terry WoodsIT GuruCommented:
Try this:

<?php
$sourcestring="your source string";
preg_match_all('/^\s*?(\d+- *\d+)([^\n]*\n)(?:.*?)([^\n]*)(?:(?=\n*\s*\d+- *\d+)|(?!.))/ms',$sourcestring,$matches);
unset($matches[0]);
echo "<pre>".print_r($matches,true);
?>
0
biffsmithAuthor Commented:
The new code produces an empty array. :(
0
biffsmithAuthor Commented:
Ok - removed 1 of the spaces here between  \d+-   and    *\d+  (there were 2 spaces in between and on your last code there was only 1) and now I get the first 2 return PERFECTLY, but nothing more.  There are 111 in this list.  

(?:(?=\n*\s*\d+-  *\d+)|(?!.))/ms',$sourcestring,$matches);



0
Terry WoodsIT GuruCommented:
Hmm, this seems to work perfectly - I don't think the pattern is any different from what you're using:

<?php
$sourcestring="12-  12345 Name
Some Text
Some Text
More Text
The next block starts directly under the above block in the same format:
12-  12346 Name2
Some Text
More Text
More Text
More Text
12-  12347 Name3
Some Text
Some Textasdf
More Text
The next basdfsadlock starts directly under the above block in the same format:
12-  12348 Name4
Some Text
More Text
More Text
Last line";
preg_match_all('/^\s*?(\d+- *\d+)([^\n]*\n)(?:.*?)([^\n]*)(?:(?=\n*\s*\d+- *\d+)|(?!.))/ms',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
 

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
biffsmithAuthor Commented:
It must be in the data.  I will take another look at it tomorrow and post a note here. Thanks so much for your help!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2003

From novice to tech pro — start learning today.