• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 188
  • Last Modified:

Matching HTML for replacing with regexp.

Hello,

I'm having some trouble doing some specific matching on a string containing a complete HTML page.

The case is as follows: Given specific headings, I am to find those headings and remove them and the text below them.

So far I've got the matching of the headings working nicely. The problem comes when I'm looking to match the text below them. I'm having trouble making it stop as it were.

My idea is to look for the next <h#> tag and match to it. However, it doesn't stop at the *next* tag, it stops at the *last* one, and thus the script removes a lot more than it should. How do I prevent this?
$needle = '/<h'.$overskrift['Level'].'> <span class="mw-headline">'.str_replace('/', '\/', $overskrift['Heading']).'<\/span><\/h'.$overskrift['Level'].'>.*(<h\d>)/s';
 
// Example value of $needle: /<h2> <span class="mw-headline">Heading<\/span><\/h2>.*(<h\d>)/s
// Works nicely up till the dot.
 
$res = preg_replace($needle, "$1", $res);

Open in new window

0
Elisas
Asked:
Elisas
1 Solution
 
Hube02Commented:
the problem is that preg funcntions are gready and will match as much as they can. you can turn off this greadyness by adding a ?

.*?(<h\d>)

Let me know if this works, if not then we will try a lookahead here.
0
 
ElisasAuthor Commented:
Superb. That did the trick.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

Tackle projects and never again get stuck behind a technical roadblock.
Join Now