Link to home
Start Free TrialLog in
Avatar of disrupt
disruptFlag for United States of America

asked on

VB .NET 2008 HTML Source Regex

I have the following HTML document and I need to pull the text from the <p> tag from the code below.  How can I solve this using regular expressions to pick up "text here" and save it to a string.
<div id="abc" class="container container-a">
<div class="desc">
<p>text here</p>
</div>

Open in new window

Avatar of kaufmed
kaufmed
Flag of United States of America image

Try the following. It should handle just <p>...</p>, <p attributes="">...</p>, and either of the previous with nested tags--except nested <p> tags.
Imports System.Text.RegularExpressions

...

Dim m As Match = Regex.Match(source_text, "(?<=<p(?:>| [^>]*>))(?:[^<]|<(?!/p>))*(?=</p>)")
Dim text As String

If m.Success Then
    text = m.Value
End If

Open in new window

If you are just going to have <p>...</p> with no nested tags and you want to simplify the pattern, you can use:
Imports System.Text.RegularExpressions

...

Dim m As Match = Regex.Match(source_text, "(?<=<p>)[^<]*(?=</p>)")
Dim text As String

If m.Success Then
    text = m.Value
End If

Open in new window

Avatar of disrupt

ASKER

If i have multiple <p> tags how could I handle that?
Avatar of disrupt

ASKER

like if i wanted it to loop through all the <p> tags:

I tried doing a MatchCollection but no luck :/
ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial