Solved

Regular Expression

Posted on 2008-06-26
4
253 Views
Last Modified: 2010-04-15
Hi

Any body know a nice regular expression that matches and gets <a> tags

IE if i have a document i would like to extract all the <a> assuming that the <a> has an ending tag (</a>)

Thanks

Allan
0
Comment
Question by:acadenilla
  • 2
4 Comments
 
LVL 6

Expert Comment

by:Bruce_1975
ID: 21876377
<(?<a>\w*)>(?<text>.*)</\k<a>>

Regards,
Bruce
0
 

Author Comment

by:acadenilla
ID: 21876622
bruce

I fails when i tried a simple link

<a href='asdfasdf.com'>first tag text</a>

could you explain to me the expression

I might need to handle some crazy link ie

<a id='asdfas' onmouseclick='asdfasdf' href='asdfasdf'><font><b>asdfasdfas</b><font></a>

or

<a href='aasdfasdf'><img></img></a>

thanks
0
 
LVL 63

Accepted Solution

by:
Fernando Soto earned 250 total points
ID: 21876721
Hi acadenilla;

This pattern will give you what you want.

' Test Data in a file
Dim sr As New StreamReader("HtmlData.htm")
' Read the data into a string
Dim input As String = sr.ReadToEnd()
' Find all the Matches for the pattern "<a.*?/a>"
Dim mc As MatchCollection = Regex.Matches(input, "<a.*?/a>")
For Each m As Match In mc
    ' Display the result in the output window of the IDE
    Console.WriteLine(m.Value)
Next


Fernando
0
 
LVL 6

Expert Comment

by:Bruce_1975
ID: 21876781
Just leave away the ?<text> and use

<(?<a>\w*)>(.*)</\k<a>>

<(?<a>\w*)> check for <a followed by any alphanummeric value, hast to close with >
(.*)                any number or character is allowed, any number of repetition
</\k<a>>       has to end with </a>
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article introduced a TextBox that supports transparent background.   Introduction TextBox is the most widely used control component in GUI design. Most GUI controls do not support transparent background and more or less do not have the…
Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
Email security requires an ever evolving service that stays up to date with counter-evolving threats. The Email Laundry perform Research and Development to ensure their email security service evolves faster than cyber criminals. We apply our Threat…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question