• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 271
  • Last Modified:

Regular Expression

Hi

Any body know a nice regular expression that matches and gets <a> tags

IE if i have a document i would like to extract all the <a> assuming that the <a> has an ending tag (</a>)

Thanks

Allan
0
acadenilla
Asked:
acadenilla
  • 2
1 Solution
 
Bruce_1975Commented:
<(?<a>\w*)>(?<text>.*)</\k<a>>

Regards,
Bruce
0
 
acadenillaAuthor Commented:
bruce

I fails when i tried a simple link

<a href='asdfasdf.com'>first tag text</a>

could you explain to me the expression

I might need to handle some crazy link ie

<a id='asdfas' onmouseclick='asdfasdf' href='asdfasdf'><font><b>asdfasdfas</b><font></a>

or

<a href='aasdfasdf'><img></img></a>

thanks
0
 
Fernando SotoCommented:
Hi acadenilla;

This pattern will give you what you want.

' Test Data in a file
Dim sr As New StreamReader("HtmlData.htm")
' Read the data into a string
Dim input As String = sr.ReadToEnd()
' Find all the Matches for the pattern "<a.*?/a>"
Dim mc As MatchCollection = Regex.Matches(input, "<a.*?/a>")
For Each m As Match In mc
    ' Display the result in the output window of the IDE
    Console.WriteLine(m.Value)
Next


Fernando
0
 
Bruce_1975Commented:
Just leave away the ?<text> and use

<(?<a>\w*)>(.*)</\k<a>>

<(?<a>\w*)> check for <a followed by any alphanummeric value, hast to close with >
(.*)                any number or character is allowed, any number of repetition
</\k<a>>       has to end with </a>
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now