Solved

JavaScript RegExp - Match HTML Tag Containing Line Breaks

Posted on 2007-12-07
8
11,184 Views
Last Modified: 2008-02-07
I need a regular expression that matches HTML tags which can contain line breaks / carriage returns. This is easy in Perl or PHP because the dot (any) token matches line breaks, but unfortunately it does not in JavaScript. The attached code demonstrates the problem Change the regular expression to make it match the <p> tag and it's contents to win the prize.
<html>
	<head>
		<title>Regular Expression Test</title>
	</head>
	<body>
		<script language="javascript" type="text/javascript">
			var s;
			s += "<html>\n";
			s += "	<body>\n";
			s += "		<p>\n";
			s += "			Match this paragraph\n";
			s += "			even though it contains\n";
			s += "			line breaks\n";
			s += "		</p>\n";
			s += "	</body>\n";
			s += "</html>";
			var re = new RegExp('<p[^>]*>(.*)</p>', 'gim');
			var matches = s.match(re);
			if (matches && matches.length) {
				for (var x = 0; x < matches.length; x++) {
					document.write('<h3>Match found:</h3><xmp>' + matches[x] + '</xmp>');
				}
			} else {
				document.write('<h3>No matches found!</h3>');
			}
		</script>
	</body>
</html>

Open in new window

0
Comment
Question by:wwarby
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 11

Expert Comment

by:saleek
ID: 20426624
Hi,

Not sure what exactly you need here... but putting the \n (newline) regex will match the beginning of the paragraph.

<p[^>]*>(.*)\n</p>


regards,

KS
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 20426998
Instead of ".*" in your pattern I would recommend you use "[\S\s]*?"
Note that the use of "?" is required to stop at the first enclosing "</p>" in case you have multiple "<p></p>" blocks in your HTML source.
0
 
LVL 1

Author Comment

by:wwarby
ID: 20427246
Guys, thanks for your efforts so far but I'm afraid neither of these suggestions has worked in this particular example. Using the source code I provided, I substituted the following regular expressions:

<p[^>]*>(.*)\n</p> (from saleek)
<p[^>]*>([\S\s]*?)</p> (from ddrudik)

...but neither of them match the paragraph tag, at least not in the SpiderMonkey JS engine in Firefox 2.

The example I've given is just a simplification the real problem I'm working on which is a script that strips XML tags from text objects in a proprietary format in our web content management system. I have RegexBuddy and the expression I started with works fine until you emulate JavaScript because the dot can't cross line breaks. I've tried a lot of variations and so far drawn a blank.

I'm pretty desperate to get this script working so if I'm increasing the points to 500. Can I suggest saving the code snippet as an HTML file and testing in a browser - if you get the right regular expression in there it will write the matched code back to the screen.

Thanks

-William
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 27

Accepted Solution

by:
ddrudik earned 500 total points
ID: 20427421
var re = /<p[^>]*>([\S\s]*?)<\/p>/ig;

Note that you have a group () that you could also reference along with the match, if that's not required then you can remove () above.
0
 
LVL 1

Author Comment

by:wwarby
ID: 20427620
DDrudik,

Thank you very much - that's exactly what I was after. The group () in there since because where I will be using this expression is in a script that strips tags - I need to capture the content of the tags and remove the tags themselves. Where I was going wrong is using the RegExp object constructor rather than the inline constructor you have used. I tried the exact same regular expression constructed both ways - your one works, mine doesn't. I'll be accepting your solution with full marks either way, but I'd be very grateful for your thoughts on why one of these statements works when the other doesn't:

var re = /<p[^>]*>([\S\s]*?)<\/p>/ig;
var re = new RegExp('<p[^>]*>([\S\s]*?)<\/p>', 'ig');

Thanks,

-William
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 20427691
It seems that \'s need to be escaped with \ as:
var re = new RegExp('<p[^>]*>([\\S\\s]*?)<\\/p>', 'ig');
0
 
LVL 1

Author Comment

by:wwarby
ID: 20428615
Of course ;) I've hit that problem before - should have noticed it.

Thanks very much for this :)
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 20428738
Thanks for the question and the points.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Calculating percentage 2 30
AngularJS: ng-repeat 25 26
REST call Failing 1 13
Boolean 13 24
I've been trying to accomplish this for a while and it just struck me yesterday how to accomplish this task. I have done searches all over the internet looking for ways to email pages from my applications and finally I have done it!!! Every single s…
This article discusses the difference between strict equality operator and equality operator in JavaScript. The Need: Because JavaScript performs an implicit type conversion when performing comparisons, we have to take this into account when wri…
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question