Solved

Regexp help - getting content between two html tags

Posted on 2010-09-22
12
267 Views
Last Modified: 2012-05-10
I need to get the content between an opening an closing H3 tag. Here is an example source:

 <h3 class="post-title">
       <a href="http://www.example.com">
       Legal Support Services
       </a>
</h3>

I was using this regexp: <h3 class="post-title">([^<]*?)</h3>

It worked fine until I realized that there were occasionally HTML tags within the H3 tags. I have changed the regexp to: <h3 class="post-title">(.*?)</h3>

But it never works.

:'(
0
Comment
Question by:Xponex
  • 4
  • 3
  • 3
  • +1
12 Comments
 

Expert Comment

by:hprasad123
ID: 33736879
This would work for you:
<h3 .*</h3>
0
 

Expert Comment

by:hprasad123
ID: 33736896
or
<h3 class="post-title".*</h3>
0
 
LVL 3

Expert Comment

by:beezleinc
ID: 33736960
Simple Perl to find center tag content
$foo = <<EOF;
<h3 class="post-title">
       <a href="http://www.example.com">
       Legal Support Services
       </a>
</h3>
EOF

while ($foo =~ /\<(\w+)(.*?)\>(.*?)\<(\/\1)\>/si)
{
        $foo = $3;
}

print "Center match is \"$foo\"\r\n";

Open in new window

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:Xponex
ID: 33737035
This is being coded in ASP. Thanks for the code beezleinc but there will be a lot more text before (and after) the H3 tag. I need to isolate the text just between the opening H3 and closing H3 tag.
0
 
LVL 3

Expert Comment

by:beezleinc
ID: 33737105
do you have nested <h3> tags?  
0
 

Author Comment

by:Xponex
ID: 33737125
Never. It will always be:

Lots of html
<h3 class="post-title>
some text and MAYBE an <a> tag
</h3>
lots more html
0
 
LVL 3

Expert Comment

by:beezleinc
ID: 33737253
then   "<h3 .+?>(.+?)</h3>"  should work.

make sure your regex call is case insensitive if need be and can span multiple lines.  Not too familiar with ASP syntax but regex expressions are pretty universal.

you may have to escape the "/",  "<" and ">" characters in the regex string... i.e.  "\<h3.+?\>(.+?)\<\/h3\>"

it is not perfect and if you have multiple <h3></h3> tag sets in the input it will just return the first match (or should)  

Also watch out for additional whitespace that can screw up the regex if it is not accounted for.  i.e. "</h3>" will not match "</h3  >"



0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 33738018
@beezleinc

>>  ASP syntax but regex expressions are pretty universal.

LOL.  In which universe?  ;)


You have to enable single-line mode for the dot to match newlines. Also, your pattern will not catch tags like:

    <h3>...</h3>

This would be a better option:
(?s)<h3[^>]*>.+?</h3>

Open in new window

0
 

Author Comment

by:Xponex
ID: 33738068
@kaufmed

So should I have multiline on or off? And... I don't want to match ALL h3's, just the one with class="post-title" attribute and nothing more. So the "[^>]* would be counter-productive I think...
0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
ID: 33738202
I believe you will need to set Global to true.

Multiline affects the behavior of ^ and $ in a regex. It will not benefit you here.

I think you are using the 5.5 regex library in your code. There is no dot matches newline option that I can see. You can circumvent this by using the pattern below:
<h3\s+class=""post-title"">[\s\S]+?</h3>

Open in new window

0
 

Author Comment

by:Xponex
ID: 33738281
Ah ha! That's what I was looking for: [\s\S]

That did the trick! Thanks!
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 33738383
NP. Glad to help  :)
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
RegEx with optional part 4 52
Diminish Pop-up  in 3 seconds 7 66
IIS 7 and executing pages using localhost 16 24
JQuery DataTable Functionality 8 14
I have helped a lot of people on EE with their coding sources and have enjoyed near about every minute of it. Sometimes it can get a little tedious but it is always a challenge and the one thing that I always say is:   The Exchange of informatio…
I was asked about the differences between classic ASP and ASP.NET, so let me put them down here, for reference: Let's make the introductions... Classic ASP was launched by Microsoft in 1998 and dynamically generate web pages upon user interact…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question