• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 406
  • Last Modified:

regular expression to match all occurrences in a string

Hello Experts..
I have a requirement where I read a file in to a variable (in PHP, using file_get_contents function).
This file contains multiple tags like
<h3 attr="e"><target act="filename"></target></h3> and other tags (is not strict xml)

I need a regular expression (in PHP) that matches the above tag and returns "filename".. or the array of tags (like above) using regular expression.

I tried using preg_match_all but it was in vain..
Please help..
0
ansudhindra
Asked:
ansudhindra
  • 4
  • 3
  • 2
  • +1
2 Solutions
 
Beverley PortlockCommented:
Try this

<?php

$test = '<h3 attr="e"><target act="filename">test</target></h3><h3 attr="e"><target act="filename">myfile.ext</target></h3>';

preg_match_all( '#<target.*?"filename">([^<]*?)</target#s', $test, $matches );

echo "<pre>";
print_r( $matches[1] );
echo "</pre>";

Open in new window


Which, using the test data above, generates

Array
(
    [0] => test
    [1] => myfile.ext
)
0
 
ansudhindraAuthor Commented:
hi bportlock, thanks for your reply..
Your code is nearer to my solution.
what is need is the value of the attribute "act" of "target" tag and not the tag contents. and this "target" tag should come after "h3" tag.
0
 
Beverley PortlockCommented:
OK, how is this?

<?php

$test = '<h3 attr="e"><target act="filename1">test</target></h3>
                      <target act="filename2">myfile.ext</target>
         <h3><target act="filename3">myfile.ext</target></h3>';

preg_match_all( '#<h3.*?><target.*?act="([^"]*?)">[^<]*?</target>#s', $test, $matches );

echo "<pre>";
print_r( $matches[1] );
echo "</pre>";

Open in new window


Note that the middle test has no h3 tags and is skipped in the output like so

Array
(
    [0] => filename1
    [1] => filename3
)
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Ray PaseurCommented:
Personally,l I find it easier to understand REGEX if I write it out on several lines with comments, like this.  Have a close look at the variants on the act= attribute between lines 7 and 12.  This should be permissive enough to work for almost anything including HTML5 notation.
http://www.laprbass.com/RAY_temp_ansudhindra.php
Outputs:
Array
(
    [filename1] => myfile1.php
    [filename3] => myfile3.PNG
    [filename4] => this is myfile4
)

Best regards, ~Ray
<?php // RAY_temp_ansudhindra.php
error_reporting(E_ALL);
echo "<pre>";


// TEST DATA FOR THE QUESTION AT EE
$test = <<<ENDSTRING
<h3 attr="e"><target act="filename1">myfile1.php</target></h3>
             <target act="filename2">myfile2.html will not be found because the <h3> is in the wrong place</target>
<h3><target act='filename3'>myfile3.PNG</target></h3>
<h3><target act=filename4 term="foo">this is myfile4</target ></h3>
ENDSTRING;

// CONSTRUCT A REGEX
$regex
= '#'                // REGEX DELIMITER
. '\<h3.*?\>'        // THE <h3> TAG WITH WICKETS ESCAPED
. '<target.*?'       // THE target TAG WITH OPTIONAL ATTRIBUES
. ' act='            // THE act= ATTRIBUTE
. '["\']{0,1}'       // THE QUOTE OR APOSTROPHE - OPTIONAL
. '(.*?)'            // GROUP: THE CONTENTS OF THE act ATTRIBUTE
. '["\' ]{1}'        // THE END OF THE act ATTRIBUTE WITH DOUBLE, SINGLE OR NO QUOTES
. '(.*?)'            // GROUP: WHATEVER FOLLOWS THE act ATTRIBUTE TO THE END OF THE target TAG, IF ANY
. '[>]{1}'           // THE END OF THE target TAG WITH EXACTLY ONE WICKET
. '(.*?)'            // GROUP: THE TEXT MARKED UP BY THE target TAG
. '</target\>??'     // THE CLOSING TARGET TAG
. '#'                // REGEX DELIMITER
. 's'                // TREAT THE STRING AS A SINGLE LINE
. 'i'                // TREAT THE STRING AS CASE-INSENSITIVE
;

// USE THE REGEX
preg_match_all($regex, $test, $matches);

// ACTIVATE THIS TO SEE ALL OF THE MATCHED INFORMATION
// var_dump($matches);

// MAKE AN ARRAY OF KEY => VALUE PAIRS USING THE FIRST AND THIRD GROUPS
foreach ($matches[1] as $num => $filename)
{
    $arr[$filename] = $matches[3][$num];
}

// SHOW THE WORK PRODUCT (EXPECTED TO FIND filename1, filename3 and filename4)
print_r($arr);

Open in new window

0
 
Beverley PortlockCommented:
Ray said: "Personally,l I find it easier to understand REGEX if I write it out on several lines with comments"

I find that just makes it even less comprehensible - something that I never thought was possible with regexes...

;-)

0
 
Ray PaseurCommented:
@bportlock: Yes, REGEX is an excursion through the looking glass into a land where the entire language is made up of almost nothing but punctuation.  Who would think of such a thing?  Oh, a 1950's mathematician.  Figures.
http://en.wikipedia.org/wiki/Regular_expression

Best to all, over and out, ~Ray
0
 
tel2Commented:
Nice work, Ray.  Well layed out (even if your comments are SHOUTING at me).

BTW, do you know why people (including you, I see), generally use "//", as opposed to the shorter "#", for comments in PHP?
0
 
Beverley PortlockCommented:
"BTW, do you know why people (including you, I see), generally use "//", as opposed to the shorter "#", for comments in PHP?"

Speaking for myself, I came to PHP via C++ and Java and just carried the habit of using //

0
 
ansudhindraAuthor Commented:
awesome answers... thanks guys.....
0
 
Ray PaseurCommented:
Thanks for the points.  @tel2: No real reason for // vs # except habit.  The value of having SHOUTING COMMENTS is twofold:  It makes them easier to see when I glance at my code (and I can search with case-sensitive inspections).  And it tells novice programmers how IMPORTANT COMMENTS CAN BE!
0
 
tel2Commented:
Thanks Ray.
Well you're not alone in your habit, coz I don't recall ever seeing PHP code with "#" for comments, and I wonder how that habit started, if "#" is a valid (and more concise) alternative.  Unless people find "//" easier to spot, of course.  Or was "#" a more recent addition to PHP, than "//"?
I don't know much PHP, but Perl and shell scripts use "#", so that's what I tend to use in PHP.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 4
  • 3
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now