Solved

RegEx Everything up to but not including (extracting hyperlink text)

Posted on 2009-04-02
4
887 Views
Last Modified: 2012-05-06
I have a program that I'm working on that gets a directory list from a WebDav secure server (using HttpWebResponse).  This list is returned as HTML.  I'm trying to use RegEx to get the text that appears in the hyperlink (which is the name of a folder or file).  I'm close, but not quite there...

Here's an example of what is returned on the request:

"<html><head><title>securetransfer.xxx.com - /localuser/</title></head><body><H1>securetransfer.xxx.com - /localuser/</H1><hr>  <pre><A HREF="/">[To Parent Directory]</A><br><br> 2/26/2009  2:38 PM        &lt;dir&gt; <A HREF="/localuser/Albe/">Albe</A><br>  3/5/2009  4:00 PM        &lt;dir&gt; <A HREF="/localuser/Art/">Art</A><br> 3/23/2009 12:31 PM        &lt;dir&gt; <A HREF="/localuser/Castle/">Castle</A><br> 2/19/2009  5:25 PM        &lt;dir&gt; <A HREF="/localuser/CF/">CF</A><br> 3/16/2009  8:43 PM        &lt;dir&gt; <A HREF="/localuser/CHI/">CHI</A><br> 2/19/2009  5:43 PM        &lt;dir&gt; <A HREF="/localuser/CSE/">CSE</A><br></pre><hr></body></html>"


I've come up with a RegEx expression that will parse out the complete hyperlink:
"<a.*?>.*</a>"  
     - returns <A HREF=""/localuser/Albe/"">Albe</A>

With a small adjustment to the expression, I can exclude the anchor, and just get the text with the closing bracket:
"(?<=(<a.*?>)).*</a>"
     - returns Albe</A>


I can't figure out how to get rid of the last bracket </A>.  I've tried the following, to no avail:

"(?<=(<a.*?>)).*[^</a>]"
     - returns Albe</A><br

"(?<=(<a.*?>)).*(?<=</a>)"
     - returns Albe</A>


I appreciate anything you can think of.
0
Comment
Question by:VBRocks
  • 2
  • 2
4 Comments
 
LVL 62

Accepted Solution

by:
Fernando Soto earned 500 total points
ID: 24053710
Hi VBRocks;

Here is some sample code to do what you want.

Fernando
Imports System.Text.RegularExpressions
 

Dim xmlData As String = "<html><head><title>securetransfer.xxx.com - /localuser/</title></head><body><H1>securetransfer.xxx.com - /localuser/</H1><hr>  <pre><A HREF="" / "">[To Parent Directory]</A><br><br> 2/26/2009  2:38 PM        &lt;dir&gt; <A HREF="" / localuser / Albe / "">Albe</A><br>  3/5/2009  4:00 PM        &lt;dir&gt; <A HREF="" / localuser / Art / "">Art</A><br> 3/23/2009 12:31 PM        &lt;dir&gt; <A HREF="" / localuser / Castle / "">Castle</A><br> 2/19/2009  5:25 PM        &lt;dir&gt; <A HREF="" / localuser / CF / "">CF</A><br> 3/16/2009  8:43 PM        &lt;dir&gt; <A HREF="" / localuser / CHI / "">CHI</A><br> 2/19/2009  5:43 PM        &lt;dir&gt; <A HREF="" / localuser / CSE / "">CSE</A><br></pre><hr></body></html>"
 

Dim mc As MatchCollection = Regex.Matches(xmlData, "<[aA][^>]+>(.*?)</[aA]>")
 

If mc.Count > 0 Then

    For Each m As Match In mc

        Console.WriteLine("hyperlink text = " + m.Groups(1).Value)

    Next

End If

Open in new window

0
 
LVL 27

Author Closing Comment

by:VBRocks
ID: 31565946
Sweet!  Thank you so much for the help!  I've been researching this most of the morning.
0
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 24053970
Not a problem, glad to help.  ;=)
0
 
LVL 27

Author Comment

by:VBRocks
ID: 24054650
The big tip was "Groups" (m.Groups(1).Value)  I was just using m.Value.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
Creating an analog clock UserControl seems fairly straight forward.  It is, after all, essentially just a circle with several lines in it!  Two common approaches for rendering an analog clock typically involve either manually calculating points with…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now