Solved

Matching HTML for replacing with regexp.

Posted on 2009-04-07
2
172 Views
Last Modified: 2012-05-06
Hello,

I'm having some trouble doing some specific matching on a string containing a complete HTML page.

The case is as follows: Given specific headings, I am to find those headings and remove them and the text below them.

So far I've got the matching of the headings working nicely. The problem comes when I'm looking to match the text below them. I'm having trouble making it stop as it were.

My idea is to look for the next <h#> tag and match to it. However, it doesn't stop at the *next* tag, it stops at the *last* one, and thus the script removes a lot more than it should. How do I prevent this?
$needle = '/<h'.$overskrift['Level'].'> <span class="mw-headline">'.str_replace('/', '\/', $overskrift['Heading']).'<\/span><\/h'.$overskrift['Level'].'>.*(<h\d>)/s';
 
// Example value of $needle: /<h2> <span class="mw-headline">Heading<\/span><\/h2>.*(<h\d>)/s
// Works nicely up till the dot.
 
$res = preg_replace($needle, "$1", $res);

Open in new window

0
Comment
Question by:Elisas
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 18

Accepted Solution

by:
Hube02 earned 250 total points
ID: 24086517
the problem is that preg funcntions are gready and will match as much as they can. you can turn off this greadyness by adding a ?

.*?(<h\d>)

Let me know if this works, if not then we will try a lookahead here.
0
 

Author Closing Comment

by:Elisas
ID: 31567458
Superb. That did the trick.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

735 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question