Regex Matching Block of Text Using PHP

I have a page full of of text that repeats in blocks as few as 1 time up to 500 times.  The text block looks like this:

12-12345 Name
Some Text
Some Text
More Text

The next block starts directly under the above block in the same format:
12-12346 Name
Some Text
More Text
More Text
More Text

At the end of each line there is a new line character.

Between the ID numbers (12-12345, etc.) there could be 1 line of text or 10.

I need to way to grab the first ID number and the rest of the text up to the 2nd ID number using regular expressions and PHP.  I would like the first ID number and each line thereafter to be in an array if possible.

I hope I explained this correctly.  Any help would be GREATLY APPRECIATED!

biffsmithAsked:
Who is Participating?
 
Terry WoodsIT GuruCommented:
Hmm, this seems to work perfectly - I don't think the pattern is any different from what you're using:

<?php
$sourcestring="12-  12345 Name
Some Text
Some Text
More Text
The next block starts directly under the above block in the same format:
12-  12346 Name2
Some Text
More Text
More Text
More Text
12-  12347 Name3
Some Text
Some Textasdf
More Text
The next basdfsadlock starts directly under the above block in the same format:
12-  12348 Name4
Some Text
More Text
More Text
Last line";
preg_match_all('/^\s*?(\d+- *\d+)([^\n]*\n)(?:.*?)([^\n]*)(?:(?=\n*\s*\d+- *\d+)|(?!.))/ms',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
 

Open in new window

0
 
Terry WoodsIT GuruCommented:
<?php
$sourcestring="your source string";
preg_match_all('/^(\d\d-\d{5})([^\n]*\n)((?:.(?!\d\d-\d{5}))*)/ms',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

(Thanks to ddrudik's for use of his website www.myregextester.com  to generate the code)
0
 
Terry WoodsIT GuruCommented:
You can also add:

unset($matches[0]);

to leave you with just the results you want in the array.
0
Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

 
biffsmithAuthor Commented:
Thanks for the code and the help, but I don't get the desired results.  I get this:
Array
(
    [0] => Array
        (
        )

    [1] => Array
        (
        )

    [2] => Array
        (
        )

    [3] => Array
        (
        )

)
0
 
Terry WoodsIT GuruCommented:
Can you post the data you tried with? Your sample worked fine.
0
 
biffsmithAuthor Commented:
I'm sorry - the problem was with my data.  Ok - so should I be seeing ALL of the arrays or just the first one, because I'm just seeing the first block of text split into an array.  (so excited this is working, though - thanks!)
0
 
Terry WoodsIT GuruCommented:
The print_r should show all matches.
0
 
biffsmithAuthor Commented:
Ok - then it's only grabbing the first block and disregarding the rest.   I wish I could post my data, but I can't because of what it contains.  Argh!  Frustrating!
0
 
Terry WoodsIT GuruCommented:
Maybe the id format isn't consistent? It's currently looking for 2 digits, a dash, then 5 digits.
0
 
Terry WoodsIT GuruCommented:
We could make it look for say 2 sets of digits separated by a - character, but only at the start of a line?

<?php
$sourcestring="your source string";
preg_match_all('/^(\d+-\d+)([^\n]*\n)((?:.(?!\d\d-\d{5}))*)/ms',$sourcestring,$matches);
unset($matches[0]);
echo "<pre>".print_r($matches,true);
?>
0
 
biffsmithAuthor Commented:
PROGRESS!  It appears that there could be none to several spaces before the 2nd ID number.  Removing those spaces makes it work!!  Now I need to figure out how to get around that!  :)
0
 
Terry WoodsIT GuruCommented:
<?php
$sourcestring="your source string";
preg_match_all('/^(\d+- *\d+)([^\n]*\n)((?:.(?!\d\d- *\d{5}))*)/ms',$sourcestring,$matches);
unset($matches[0]);
echo "<pre>".print_r($matches,true);
?>
0
 
Terry WoodsIT GuruCommented:
Apologies, missed changing the lookahead for the prev change:

<?php
$sourcestring="your source string";
preg_match_all('/^(\d+- *\d+)([^\n]*\n)((?:.(?!\d+- *\d+))*)/ms',$sourcestring,$matches);
unset($matches[0]);
echo "<pre>".print_r($matches,true);
?>

or:

<?php
$sourcestring="your source string";
preg_match_all('/^(\d\d- *\d{5})([^\n]*\n)((?:.(?!\d\d- *\d{5}))*)/ms',$sourcestring,$matches);
unset($matches[0]);
echo "<pre>".print_r($matches,true);
?>
0
 
biffsmithAuthor Commented:
Works great if there are no spaces or tabs before the ID number.  :)
0
 
Terry WoodsIT GuruCommented:
<?php
$sourcestring="your source string";
preg_match_all('/^\s*?(\d+- *\d+)([^\n]*\n)((?:.(?!\d+- *\d+))*)/ms',$sourcestring,$matches);
unset($matches[0]);
echo "<pre>".print_r($matches,true);
?>
0
 
biffsmithAuthor Commented:
Thank you SO much!  I am ALMOST there.  I'm getting the data back like this:

array[1] - id number
array[2] - name
array[3] - everything else (this could be 1 line or 10 lines)

What I really need to isolate is:

array[1] - id number
array[2] - name
LAST LINE before the next id number

Did I explain that correctly?  
0
 
Terry WoodsIT GuruCommented:
Try this:

<?php
$sourcestring="your source string";
preg_match_all('/^\s*?(\d+- *\d+)([^\n]*\n)(?:.*?)([^\n]*)(?:(?=\n*\s*\d+- *\d+)|(?!.))/ms',$sourcestring,$matches);
unset($matches[0]);
echo "<pre>".print_r($matches,true);
?>
0
 
biffsmithAuthor Commented:
The new code produces an empty array. :(
0
 
biffsmithAuthor Commented:
Ok - removed 1 of the spaces here between  \d+-   and    *\d+  (there were 2 spaces in between and on your last code there was only 1) and now I get the first 2 return PERFECTLY, but nothing more.  There are 111 in this list.  

(?:(?=\n*\s*\d+-  *\d+)|(?!.))/ms',$sourcestring,$matches);



0
 
biffsmithAuthor Commented:
It must be in the data.  I will take another look at it tomorrow and post a note here. Thanks so much for your help!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.