Solved

regular expression to match all occurrences in a string

Posted on 2011-09-06
11
353 Views
Last Modified: 2012-05-12
Hello Experts..
I have a requirement where I read a file in to a variable (in PHP, using file_get_contents function).
This file contains multiple tags like
<h3 attr="e"><target act="filename"></target></h3> and other tags (is not strict xml)

I need a regular expression (in PHP) that matches the above tag and returns "filename".. or the array of tags (like above) using regular expression.

I tried using preg_match_all but it was in vain..
Please help..
0
Comment
Question by:ansudhindra
  • 4
  • 3
  • 2
  • +1
11 Comments
 
LVL 34

Assisted Solution

by:Beverley Portlock
Beverley Portlock earned 250 total points
Comment Utility
Try this

<?php

$test = '<h3 attr="e"><target act="filename">test</target></h3><h3 attr="e"><target act="filename">myfile.ext</target></h3>';

preg_match_all( '#<target.*?"filename">([^<]*?)</target#s', $test, $matches );

echo "<pre>";
print_r( $matches[1] );
echo "</pre>";

Open in new window


Which, using the test data above, generates

Array
(
    [0] => test
    [1] => myfile.ext
)
0
 
LVL 13

Author Comment

by:ansudhindra
Comment Utility
hi bportlock, thanks for your reply..
Your code is nearer to my solution.
what is need is the value of the attribute "act" of "target" tag and not the tag contents. and this "target" tag should come after "h3" tag.
0
 
LVL 34

Expert Comment

by:Beverley Portlock
Comment Utility
OK, how is this?

<?php

$test = '<h3 attr="e"><target act="filename1">test</target></h3>
                      <target act="filename2">myfile.ext</target>
         <h3><target act="filename3">myfile.ext</target></h3>';

preg_match_all( '#<h3.*?><target.*?act="([^"]*?)">[^<]*?</target>#s', $test, $matches );

echo "<pre>";
print_r( $matches[1] );
echo "</pre>";

Open in new window


Note that the middle test has no h3 tags and is skipped in the output like so

Array
(
    [0] => filename1
    [1] => filename3
)
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 250 total points
Comment Utility
Personally,l I find it easier to understand REGEX if I write it out on several lines with comments, like this.  Have a close look at the variants on the act= attribute between lines 7 and 12.  This should be permissive enough to work for almost anything including HTML5 notation.
http://www.laprbass.com/RAY_temp_ansudhindra.php
Outputs:
Array
(
    [filename1] => myfile1.php
    [filename3] => myfile3.PNG
    [filename4] => this is myfile4
)

Best regards, ~Ray
<?php // RAY_temp_ansudhindra.php
error_reporting(E_ALL);
echo "<pre>";


// TEST DATA FOR THE QUESTION AT EE
$test = <<<ENDSTRING
<h3 attr="e"><target act="filename1">myfile1.php</target></h3>
             <target act="filename2">myfile2.html will not be found because the <h3> is in the wrong place</target>
<h3><target act='filename3'>myfile3.PNG</target></h3>
<h3><target act=filename4 term="foo">this is myfile4</target ></h3>
ENDSTRING;

// CONSTRUCT A REGEX
$regex
= '#'                // REGEX DELIMITER
. '\<h3.*?\>'        // THE <h3> TAG WITH WICKETS ESCAPED
. '<target.*?'       // THE target TAG WITH OPTIONAL ATTRIBUES
. ' act='            // THE act= ATTRIBUTE
. '["\']{0,1}'       // THE QUOTE OR APOSTROPHE - OPTIONAL
. '(.*?)'            // GROUP: THE CONTENTS OF THE act ATTRIBUTE
. '["\' ]{1}'        // THE END OF THE act ATTRIBUTE WITH DOUBLE, SINGLE OR NO QUOTES
. '(.*?)'            // GROUP: WHATEVER FOLLOWS THE act ATTRIBUTE TO THE END OF THE target TAG, IF ANY
. '[>]{1}'           // THE END OF THE target TAG WITH EXACTLY ONE WICKET
. '(.*?)'            // GROUP: THE TEXT MARKED UP BY THE target TAG
. '</target\>??'     // THE CLOSING TARGET TAG
. '#'                // REGEX DELIMITER
. 's'                // TREAT THE STRING AS A SINGLE LINE
. 'i'                // TREAT THE STRING AS CASE-INSENSITIVE
;

// USE THE REGEX
preg_match_all($regex, $test, $matches);

// ACTIVATE THIS TO SEE ALL OF THE MATCHED INFORMATION
// var_dump($matches);

// MAKE AN ARRAY OF KEY => VALUE PAIRS USING THE FIRST AND THIRD GROUPS
foreach ($matches[1] as $num => $filename)
{
    $arr[$filename] = $matches[3][$num];
}

// SHOW THE WORK PRODUCT (EXPECTED TO FIND filename1, filename3 and filename4)
print_r($arr);

Open in new window

0
 
LVL 34

Expert Comment

by:Beverley Portlock
Comment Utility
Ray said: "Personally,l I find it easier to understand REGEX if I write it out on several lines with comments"

I find that just makes it even less comprehensible - something that I never thought was possible with regexes...

;-)

0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
@bportlock: Yes, REGEX is an excursion through the looking glass into a land where the entire language is made up of almost nothing but punctuation.  Who would think of such a thing?  Oh, a 1950's mathematician.  Figures.
http://en.wikipedia.org/wiki/Regular_expression

Best to all, over and out, ~Ray
0
 
LVL 11

Expert Comment

by:tel2
Comment Utility
Nice work, Ray.  Well layed out (even if your comments are SHOUTING at me).

BTW, do you know why people (including you, I see), generally use "//", as opposed to the shorter "#", for comments in PHP?
0
 
LVL 34

Expert Comment

by:Beverley Portlock
Comment Utility
"BTW, do you know why people (including you, I see), generally use "//", as opposed to the shorter "#", for comments in PHP?"

Speaking for myself, I came to PHP via C++ and Java and just carried the habit of using //

0
 
LVL 13

Author Closing Comment

by:ansudhindra
Comment Utility
awesome answers... thanks guys.....
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Thanks for the points.  @tel2: No real reason for // vs # except habit.  The value of having SHOUTING COMMENTS is twofold:  It makes them easier to see when I glance at my code (and I can search with case-sensitive inspections).  And it tells novice programmers how IMPORTANT COMMENTS CAN BE!
0
 
LVL 11

Expert Comment

by:tel2
Comment Utility
Thanks Ray.
Well you're not alone in your habit, coz I don't recall ever seeing PHP code with "#" for comments, and I wonder how that habit started, if "#" is a valid (and more concise) alternative.  Unless people find "//" easier to spot, of course.  Or was "#" a more recent addition to PHP, than "//"?
I don't know much PHP, but Perl and shell scripts use "#", so that's what I tend to use in PHP.
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
This article discusses four methods for overlaying images in a container on a web page
The viewer will learn how to count occurrences of each item in an array.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now