[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 497
  • Last Modified:

Regular Express to find empty <p> tags using PHP

Hello,

I've never been real good with regular expressions...maybe someone can help.  I need to find all instances of empty <p></p> tags in my content and replace with <p>&nbsp;</p> so they correctly do something (the style in the css file has margin:0; padding:0 for p and with nothing in them, they end up not giving even a line break.

In that easy case, I'm using str_ireplace("<p></p>","<p>&nbsp;</p>",$data);

The problem comes in when there is an empty <p> tag that includes some styling info that's coming from the TinyMCE editor, such as

<p style="text-align: center;"></p>

I still need to be able to find these.  The other problem is there may be white space in between the tags such as:

<p style="text-align: center;">    </p>

which also don't display anything in the browser.

So, my goal would be to find all <p tags with any number of additional attributes on the tag, followed by a closing bracket > followed by any number of whitespace characters (or at least spaces) followed by a closing </p> tag and replace with the same opening tag, &nbsp;</p>.

For example:
<p></p> ==> <p>&nbsp;</p>
<p>  </p> ==> <p>&nbsp;</p>
<p style="text-align: center;"></p> ==> <p style="text-align: center;">&nbsp;</p>

There may be "n" number of these situations in any data and they will be dispersed throughout the data as this is the content portion of a webpage, so there may be lots of paragraphs within a full page.

Can anybody help with this?
0
garyhoffmann
Asked:
garyhoffmann
  • 2
1 Solution
 
Dave BaldwinFixer of ProblemsCommented:
You don't need regex.  You have two conditions and netiher requires you to locate the opening tag.  Do a replace instead for '></p>' and '> </p>',  In the second one, even if it is part of a longer string, it doesn't matter because you are replacing a space with a different space.
str_ireplace("></p>",">&nbsp;</p>",$data);
str_ireplace("> </p>",">&nbsp;</p>",$data);

Open in new window


That should take care of both conditions.
0
 
Terry WoodsIT GuruCommented:
@DaveBaldwin, that would also add non-breaking spaces after other tags, like this:

<p><img src="foo.jpg"></p>

My view is that a regex is justified. I'd suggest:

preg_replace("#(<p[^>]*>)\s*(</p>)#i","$1&nbsp;$2",$data);

Open in new window

0
 
garyhoffmannAuthor Commented:
@TerryAtOpus - this seems to be the exact solution and seems to be working perfectly.  I'm wondering if you can do one more thing for me if you have time - would you please explain the regex.  I'm desperately trying to understand regex.

If I understand this correctly, you are using # as the delimiter, then you are testing for <p followed by any number of anything other than > (by the use of [^>]*) followed by a close bracket > and by surrounding this in parens, you are getting it back as $1

Then, you are looking for any number of whitespace characters denoted by \s* then the closing p tag (</p>) which also puts it into $2 by the use of parens

Finally, the "i" after the delimiter at the end says case insensitive

Do I have this correct?
0
 
Terry WoodsIT GuruCommented:
You've got it exactly right. A more common pattern delimiter is the / character, but that's not so good for patterns which have / in them as it means extra escaping.

The * in:
[^>]*
also matches 0 characters, but is greedy so will match as many as possible. There's a good cheat sheet here: http://download-my-brain.wikispaces.com/Computing+-+Regular+Expressions

Thanks for the points
0

Featured Post

The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now