?
Solved

RegEx Everything up to but not including (extracting hyperlink text)

Posted on 2009-04-02
4
Medium Priority
?
898 Views
Last Modified: 2012-05-06
I have a program that I'm working on that gets a directory list from a WebDav secure server (using HttpWebResponse).  This list is returned as HTML.  I'm trying to use RegEx to get the text that appears in the hyperlink (which is the name of a folder or file).  I'm close, but not quite there...

Here's an example of what is returned on the request:

"<html><head><title>securetransfer.xxx.com - /localuser/</title></head><body><H1>securetransfer.xxx.com - /localuser/</H1><hr>  <pre><A HREF="/">[To Parent Directory]</A><br><br> 2/26/2009  2:38 PM        &lt;dir&gt; <A HREF="/localuser/Albe/">Albe</A><br>  3/5/2009  4:00 PM        &lt;dir&gt; <A HREF="/localuser/Art/">Art</A><br> 3/23/2009 12:31 PM        &lt;dir&gt; <A HREF="/localuser/Castle/">Castle</A><br> 2/19/2009  5:25 PM        &lt;dir&gt; <A HREF="/localuser/CF/">CF</A><br> 3/16/2009  8:43 PM        &lt;dir&gt; <A HREF="/localuser/CHI/">CHI</A><br> 2/19/2009  5:43 PM        &lt;dir&gt; <A HREF="/localuser/CSE/">CSE</A><br></pre><hr></body></html>"


I've come up with a RegEx expression that will parse out the complete hyperlink:
"<a.*?>.*</a>"  
     - returns <A HREF=""/localuser/Albe/"">Albe</A>

With a small adjustment to the expression, I can exclude the anchor, and just get the text with the closing bracket:
"(?<=(<a.*?>)).*</a>"
     - returns Albe</A>


I can't figure out how to get rid of the last bracket </A>.  I've tried the following, to no avail:

"(?<=(<a.*?>)).*[^</a>]"
     - returns Albe</A><br

"(?<=(<a.*?>)).*(?<=</a>)"
     - returns Albe</A>


I appreciate anything you can think of.
0
Comment
Question by:VBRocks
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 63

Accepted Solution

by:
Fernando Soto earned 2000 total points
ID: 24053710
Hi VBRocks;

Here is some sample code to do what you want.

Fernando
Imports System.Text.RegularExpressions
 
Dim xmlData As String = "<html><head><title>securetransfer.xxx.com - /localuser/</title></head><body><H1>securetransfer.xxx.com - /localuser/</H1><hr>  <pre><A HREF="" / "">[To Parent Directory]</A><br><br> 2/26/2009  2:38 PM        &lt;dir&gt; <A HREF="" / localuser / Albe / "">Albe</A><br>  3/5/2009  4:00 PM        &lt;dir&gt; <A HREF="" / localuser / Art / "">Art</A><br> 3/23/2009 12:31 PM        &lt;dir&gt; <A HREF="" / localuser / Castle / "">Castle</A><br> 2/19/2009  5:25 PM        &lt;dir&gt; <A HREF="" / localuser / CF / "">CF</A><br> 3/16/2009  8:43 PM        &lt;dir&gt; <A HREF="" / localuser / CHI / "">CHI</A><br> 2/19/2009  5:43 PM        &lt;dir&gt; <A HREF="" / localuser / CSE / "">CSE</A><br></pre><hr></body></html>"
 
Dim mc As MatchCollection = Regex.Matches(xmlData, "<[aA][^>]+>(.*?)</[aA]>")
 
If mc.Count > 0 Then
    For Each m As Match In mc
        Console.WriteLine("hyperlink text = " + m.Groups(1).Value)
    Next
End If

Open in new window

0
 
LVL 27

Author Closing Comment

by:VBRocks
ID: 31565946
Sweet!  Thank you so much for the help!  I've been researching this most of the morning.
0
 
LVL 63

Expert Comment

by:Fernando Soto
ID: 24053970
Not a problem, glad to help.  ;=)
0
 
LVL 27

Author Comment

by:VBRocks
ID: 24054650
The big tip was "Groups" (m.Groups(1).Value)  I was just using m.Value.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Hi all and welcome to my first article on Experts Exchange. A while ago, someone asked me if i could do some tutorials on object oriented programming. I decided to do them on C#. Now you may ask me, why's that? Well, one of the re…
Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question