Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

regular expression to match all occurrences in a string

Posted on 2011-09-06
11
Medium Priority
?
391 Views
Last Modified: 2012-05-12
Hello Experts..
I have a requirement where I read a file in to a variable (in PHP, using file_get_contents function).
This file contains multiple tags like
<h3 attr="e"><target act="filename"></target></h3> and other tags (is not strict xml)

I need a regular expression (in PHP) that matches the above tag and returns "filename".. or the array of tags (like above) using regular expression.

I tried using preg_match_all but it was in vain..
Please help..
0
Comment
Question by:ansudhindra
  • 4
  • 3
  • 2
  • +1
11 Comments
 
LVL 34

Assisted Solution

by:Beverley Portlock
Beverley Portlock earned 1000 total points
ID: 36487712
Try this

<?php

$test = '<h3 attr="e"><target act="filename">test</target></h3><h3 attr="e"><target act="filename">myfile.ext</target></h3>';

preg_match_all( '#<target.*?"filename">([^<]*?)</target#s', $test, $matches );

echo "<pre>";
print_r( $matches[1] );
echo "</pre>";

Open in new window


Which, using the test data above, generates

Array
(
    [0] => test
    [1] => myfile.ext
)
0
 
LVL 13

Author Comment

by:ansudhindra
ID: 36487757
hi bportlock, thanks for your reply..
Your code is nearer to my solution.
what is need is the value of the attribute "act" of "target" tag and not the tag contents. and this "target" tag should come after "h3" tag.
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 36487900
OK, how is this?

<?php

$test = '<h3 attr="e"><target act="filename1">test</target></h3>
                      <target act="filename2">myfile.ext</target>
         <h3><target act="filename3">myfile.ext</target></h3>';

preg_match_all( '#<h3.*?><target.*?act="([^"]*?)">[^<]*?</target>#s', $test, $matches );

echo "<pre>";
print_r( $matches[1] );
echo "</pre>";

Open in new window


Note that the middle test has no h3 tags and is skipped in the output like so

Array
(
    [0] => filename1
    [1] => filename3
)
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 111

Accepted Solution

by:
Ray Paseur earned 1000 total points
ID: 36489120
Personally,l I find it easier to understand REGEX if I write it out on several lines with comments, like this.  Have a close look at the variants on the act= attribute between lines 7 and 12.  This should be permissive enough to work for almost anything including HTML5 notation.
http://www.laprbass.com/RAY_temp_ansudhindra.php
Outputs:
Array
(
    [filename1] => myfile1.php
    [filename3] => myfile3.PNG
    [filename4] => this is myfile4
)

Best regards, ~Ray
<?php // RAY_temp_ansudhindra.php
error_reporting(E_ALL);
echo "<pre>";


// TEST DATA FOR THE QUESTION AT EE
$test = <<<ENDSTRING
<h3 attr="e"><target act="filename1">myfile1.php</target></h3>
             <target act="filename2">myfile2.html will not be found because the <h3> is in the wrong place</target>
<h3><target act='filename3'>myfile3.PNG</target></h3>
<h3><target act=filename4 term="foo">this is myfile4</target ></h3>
ENDSTRING;

// CONSTRUCT A REGEX
$regex
= '#'                // REGEX DELIMITER
. '\<h3.*?\>'        // THE <h3> TAG WITH WICKETS ESCAPED
. '<target.*?'       // THE target TAG WITH OPTIONAL ATTRIBUES
. ' act='            // THE act= ATTRIBUTE
. '["\']{0,1}'       // THE QUOTE OR APOSTROPHE - OPTIONAL
. '(.*?)'            // GROUP: THE CONTENTS OF THE act ATTRIBUTE
. '["\' ]{1}'        // THE END OF THE act ATTRIBUTE WITH DOUBLE, SINGLE OR NO QUOTES
. '(.*?)'            // GROUP: WHATEVER FOLLOWS THE act ATTRIBUTE TO THE END OF THE target TAG, IF ANY
. '[>]{1}'           // THE END OF THE target TAG WITH EXACTLY ONE WICKET
. '(.*?)'            // GROUP: THE TEXT MARKED UP BY THE target TAG
. '</target\>??'     // THE CLOSING TARGET TAG
. '#'                // REGEX DELIMITER
. 's'                // TREAT THE STRING AS A SINGLE LINE
. 'i'                // TREAT THE STRING AS CASE-INSENSITIVE
;

// USE THE REGEX
preg_match_all($regex, $test, $matches);

// ACTIVATE THIS TO SEE ALL OF THE MATCHED INFORMATION
// var_dump($matches);

// MAKE AN ARRAY OF KEY => VALUE PAIRS USING THE FIRST AND THIRD GROUPS
foreach ($matches[1] as $num => $filename)
{
    $arr[$filename] = $matches[3][$num];
}

// SHOW THE WORK PRODUCT (EXPECTED TO FIND filename1, filename3 and filename4)
print_r($arr);

Open in new window

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 36489240
Ray said: "Personally,l I find it easier to understand REGEX if I write it out on several lines with comments"

I find that just makes it even less comprehensible - something that I never thought was possible with regexes...

;-)

0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 36489280
@bportlock: Yes, REGEX is an excursion through the looking glass into a land where the entire language is made up of almost nothing but punctuation.  Who would think of such a thing?  Oh, a 1950's mathematician.  Figures.
http://en.wikipedia.org/wiki/Regular_expression

Best to all, over and out, ~Ray
0
 
LVL 12

Expert Comment

by:tel2
ID: 36499561
Nice work, Ray.  Well layed out (even if your comments are SHOUTING at me).

BTW, do you know why people (including you, I see), generally use "//", as opposed to the shorter "#", for comments in PHP?
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 36501407
"BTW, do you know why people (including you, I see), generally use "//", as opposed to the shorter "#", for comments in PHP?"

Speaking for myself, I came to PHP via C++ and Java and just carried the habit of using //

0
 
LVL 13

Author Closing Comment

by:ansudhindra
ID: 36501428
awesome answers... thanks guys.....
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 36503314
Thanks for the points.  @tel2: No real reason for // vs # except habit.  The value of having SHOUTING COMMENTS is twofold:  It makes them easier to see when I glance at my code (and I can search with case-sensitive inspections).  And it tells novice programmers how IMPORTANT COMMENTS CAN BE!
0
 
LVL 12

Expert Comment

by:tel2
ID: 36506974
Thanks Ray.
Well you're not alone in your habit, coz I don't recall ever seeing PHP code with "#" for comments, and I wonder how that habit started, if "#" is a valid (and more concise) alternative.  Unless people find "//" easier to spot, of course.  Or was "#" a more recent addition to PHP, than "//"?
I don't know much PHP, but Perl and shell scripts use "#", so that's what I tend to use in PHP.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article discusses four methods for overlaying images in a container on a web page
Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses

885 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question