Solved

RegEx Everything up to but not including (extracting hyperlink text)

Posted on 2009-04-02
4
888 Views
Last Modified: 2012-05-06
I have a program that I'm working on that gets a directory list from a WebDav secure server (using HttpWebResponse).  This list is returned as HTML.  I'm trying to use RegEx to get the text that appears in the hyperlink (which is the name of a folder or file).  I'm close, but not quite there...

Here's an example of what is returned on the request:

"<html><head><title>securetransfer.xxx.com - /localuser/</title></head><body><H1>securetransfer.xxx.com - /localuser/</H1><hr>  <pre><A HREF="/">[To Parent Directory]</A><br><br> 2/26/2009  2:38 PM        &lt;dir&gt; <A HREF="/localuser/Albe/">Albe</A><br>  3/5/2009  4:00 PM        &lt;dir&gt; <A HREF="/localuser/Art/">Art</A><br> 3/23/2009 12:31 PM        &lt;dir&gt; <A HREF="/localuser/Castle/">Castle</A><br> 2/19/2009  5:25 PM        &lt;dir&gt; <A HREF="/localuser/CF/">CF</A><br> 3/16/2009  8:43 PM        &lt;dir&gt; <A HREF="/localuser/CHI/">CHI</A><br> 2/19/2009  5:43 PM        &lt;dir&gt; <A HREF="/localuser/CSE/">CSE</A><br></pre><hr></body></html>"


I've come up with a RegEx expression that will parse out the complete hyperlink:
"<a.*?>.*</a>"  
     - returns <A HREF=""/localuser/Albe/"">Albe</A>

With a small adjustment to the expression, I can exclude the anchor, and just get the text with the closing bracket:
"(?<=(<a.*?>)).*</a>"
     - returns Albe</A>


I can't figure out how to get rid of the last bracket </A>.  I've tried the following, to no avail:

"(?<=(<a.*?>)).*[^</a>]"
     - returns Albe</A><br

"(?<=(<a.*?>)).*(?<=</a>)"
     - returns Albe</A>


I appreciate anything you can think of.
0
Comment
Question by:VBRocks
  • 2
  • 2
4 Comments
 
LVL 62

Accepted Solution

by:
Fernando Soto earned 500 total points
ID: 24053710
Hi VBRocks;

Here is some sample code to do what you want.

Fernando
Imports System.Text.RegularExpressions
 

Dim xmlData As String = "<html><head><title>securetransfer.xxx.com - /localuser/</title></head><body><H1>securetransfer.xxx.com - /localuser/</H1><hr>  <pre><A HREF="" / "">[To Parent Directory]</A><br><br> 2/26/2009  2:38 PM        &lt;dir&gt; <A HREF="" / localuser / Albe / "">Albe</A><br>  3/5/2009  4:00 PM        &lt;dir&gt; <A HREF="" / localuser / Art / "">Art</A><br> 3/23/2009 12:31 PM        &lt;dir&gt; <A HREF="" / localuser / Castle / "">Castle</A><br> 2/19/2009  5:25 PM        &lt;dir&gt; <A HREF="" / localuser / CF / "">CF</A><br> 3/16/2009  8:43 PM        &lt;dir&gt; <A HREF="" / localuser / CHI / "">CHI</A><br> 2/19/2009  5:43 PM        &lt;dir&gt; <A HREF="" / localuser / CSE / "">CSE</A><br></pre><hr></body></html>"
 

Dim mc As MatchCollection = Regex.Matches(xmlData, "<[aA][^>]+>(.*?)</[aA]>")
 

If mc.Count > 0 Then

    For Each m As Match In mc

        Console.WriteLine("hyperlink text = " + m.Groups(1).Value)

    Next

End If

Open in new window

0
 
LVL 27

Author Closing Comment

by:VBRocks
ID: 31565946
Sweet!  Thank you so much for the help!  I've been researching this most of the morning.
0
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 24053970
Not a problem, glad to help.  ;=)
0
 
LVL 27

Author Comment

by:VBRocks
ID: 24054650
The big tip was "Groups" (m.Groups(1).Value)  I was just using m.Value.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Name space syntax error 12 41
C# HTTP GET method sample code 3 41
Advice on Xojo as a development tool over VB. 4 39
Point to Current Row Ater Refresh Datagridview 3 20
Parsing a CSV file is a task that we are confronted with regularly, and although there are a vast number of means to do this, as a newbie, the field can be confusing and the tools can seem complex. A simple solution to parsing a customized CSV fi…
The ECB site provides FX rates for major currencies since its inception in 1999 in the form of an XML feed. The files have the following format (reducted for brevity) (CODE) There are three files available HERE (http://www.ecb.europa.eu/stats/exch…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now