• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 450
  • Last Modified:

Perl RegExp to convert linebreaks to HTML P tags

Here's a doozie for the Perl experts. Please read carefully, because there are some nuances.

I would like to be able to go through a multiline field, replacing newlines with <p> tags, by enclosing the relevant line within <p> tags.

Here's a starting regular expression that does the basics:
$field =~ s/
                  (.+?)
                  \s*
                  (?:\r|\n|$)+
            /<p>\1<\/p>\n/xgis;

However, this fails to be functional under the following circumstances:
- if a line already contains <p> tags, we shouldn't nest them twice.
- if an existing <p> tag contains linebreaks, we should just remove the linebreaks rather than add <p> tags around each line.

Below is some sample text and the correctly formatted text:


Well, for one I am a <b>very</b> diligent person who constantly pursues justice wherever it may be found.
I'm also good at linebreaks, as you can see here.

Finally, this is also on its own line, but shouldn't be be any different from the previous line. We shall see how it goes.
<p>This is its own paragraph.</p> We need to be careful here so we don't create nested paragraphs.
Here's another example. <p>Be sure to avoid the nested paragraphs here.</p>
Here's yet another example. <p>Again, be sure to avoid the nested paragraphs here.</p>
<p>This paragraph
for some reason has whitespace
that we don't need.</p>

<p>Standalone Paragraph</p>

================== converted to:

<p>Well, for one I am a <b>very</b> diligent person who constantly pursues justice wherever it may be found.
I'm also good at linebreaks, as you can see here.</p>
<p>Finally, this is also on its own line, but shouldn't be be any different from the previous line. We shall see how it goes.</p>
<p>This is its own paragraph.</p>
<p>We need to be careful here so we don't create nested paragraphs.</p>
<p>Here's another example.</p>
<p>Be sure to avoid the nested paragraphs here.</p>
<p>Here's yet another example.</p>
<p>Again, be sure to avoid the nested paragraphs here.</p>
<p>This paragraph for some reason has whitespace that we don't need.</p>
<p>Standalone Paragraph</p>

Open in new window

0
tomaugerdotcom
Asked:
tomaugerdotcom
  • 2
1 Solution
 
SuperdaveCommented:
$field =~ s#(?:<p>(.+?)</p>(?:\r|\n|$)*)|(?:
                  (.+?)
                  \s*
                  (?:\r|\n|$|(?=<p>))+
            )
            #<p>\1\2<\/p>\n#xgis;

That does most of it.  It leaves the blank lines at lines 11 and 13 in your test which you could remove with another regular expression.  I don't think it would be possible to do that with one re.  And thanks for a good start, it would have been hard for me to do that from scratch.
0
 
tomaugerdotcomAuthor Commented:
Superdave, you're a fricken genius. I was barking up the wrong tree trying to figure out negative look-ahead assertions that were, well, asserting diddly-squat.

Appreciate the help. Stay tuned - I have a follow up question.
0
 
tomaugerdotcomAuthor Commented:
For the sake of posterity I've done the extra newline stripping and have commented out the regular expression.
$field =~ s/
			(?:					# EITHER...
				<p>				
				(.+?)			# look for anything inside of <p> tags
				<\/p>			# (including the <p> tags themselves
				(?:\r|\n|$)*	# up to the next newline or the end of the line
			)
			|					# OR.....
			(?:
                (.+?)			# anything (not starting with a <p> tag)
                \s*				# (eating whitespace)
                (?:
					\r|\n		# to the next newline
					|$			# or the end of the line
					|(?=<p>)	# or the start of the next <p> tag
				)+
            )
        /<p>\1\2<\/p>\n/xgis;	# and then stick either one inside <p> tags
        
        
        $field =~ s/
			(?<!<\/p>)
        	(?:\r|\n)+
        / /xgis;
	}

Open in new window

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now