Solved

RegEx Everything up to but not including (extracting hyperlink text)

Posted on 2009-04-02
4
891 Views
Last Modified: 2012-05-06
I have a program that I'm working on that gets a directory list from a WebDav secure server (using HttpWebResponse).  This list is returned as HTML.  I'm trying to use RegEx to get the text that appears in the hyperlink (which is the name of a folder or file).  I'm close, but not quite there...

Here's an example of what is returned on the request:

"<html><head><title>securetransfer.xxx.com - /localuser/</title></head><body><H1>securetransfer.xxx.com - /localuser/</H1><hr>  <pre><A HREF="/">[To Parent Directory]</A><br><br> 2/26/2009  2:38 PM        &lt;dir&gt; <A HREF="/localuser/Albe/">Albe</A><br>  3/5/2009  4:00 PM        &lt;dir&gt; <A HREF="/localuser/Art/">Art</A><br> 3/23/2009 12:31 PM        &lt;dir&gt; <A HREF="/localuser/Castle/">Castle</A><br> 2/19/2009  5:25 PM        &lt;dir&gt; <A HREF="/localuser/CF/">CF</A><br> 3/16/2009  8:43 PM        &lt;dir&gt; <A HREF="/localuser/CHI/">CHI</A><br> 2/19/2009  5:43 PM        &lt;dir&gt; <A HREF="/localuser/CSE/">CSE</A><br></pre><hr></body></html>"


I've come up with a RegEx expression that will parse out the complete hyperlink:
"<a.*?>.*</a>"  
     - returns <A HREF=""/localuser/Albe/"">Albe</A>

With a small adjustment to the expression, I can exclude the anchor, and just get the text with the closing bracket:
"(?<=(<a.*?>)).*</a>"
     - returns Albe</A>


I can't figure out how to get rid of the last bracket </A>.  I've tried the following, to no avail:

"(?<=(<a.*?>)).*[^</a>]"
     - returns Albe</A><br

"(?<=(<a.*?>)).*(?<=</a>)"
     - returns Albe</A>


I appreciate anything you can think of.
0
Comment
Question by:VBRocks
  • 2
  • 2
4 Comments
 
LVL 62

Accepted Solution

by:
Fernando Soto earned 500 total points
ID: 24053710
Hi VBRocks;

Here is some sample code to do what you want.

Fernando
Imports System.Text.RegularExpressions
 
Dim xmlData As String = "<html><head><title>securetransfer.xxx.com - /localuser/</title></head><body><H1>securetransfer.xxx.com - /localuser/</H1><hr>  <pre><A HREF="" / "">[To Parent Directory]</A><br><br> 2/26/2009  2:38 PM        &lt;dir&gt; <A HREF="" / localuser / Albe / "">Albe</A><br>  3/5/2009  4:00 PM        &lt;dir&gt; <A HREF="" / localuser / Art / "">Art</A><br> 3/23/2009 12:31 PM        &lt;dir&gt; <A HREF="" / localuser / Castle / "">Castle</A><br> 2/19/2009  5:25 PM        &lt;dir&gt; <A HREF="" / localuser / CF / "">CF</A><br> 3/16/2009  8:43 PM        &lt;dir&gt; <A HREF="" / localuser / CHI / "">CHI</A><br> 2/19/2009  5:43 PM        &lt;dir&gt; <A HREF="" / localuser / CSE / "">CSE</A><br></pre><hr></body></html>"
 
Dim mc As MatchCollection = Regex.Matches(xmlData, "<[aA][^>]+>(.*?)</[aA]>")
 
If mc.Count > 0 Then
    For Each m As Match In mc
        Console.WriteLine("hyperlink text = " + m.Groups(1).Value)
    Next
End If

Open in new window

0
 
LVL 27

Author Closing Comment

by:VBRocks
ID: 31565946
Sweet!  Thank you so much for the help!  I've been researching this most of the morning.
0
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 24053970
Not a problem, glad to help.  ;=)
0
 
LVL 27

Author Comment

by:VBRocks
ID: 24054650
The big tip was "Groups" (m.Groups(1).Value)  I was just using m.Value.
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

We all know that functional code is the leg that any good program stands on when it comes right down to it, however, if your program lacks a good user interface your product may not have the appeal needed to keep your customers happy. This issue can…
Parsing a CSV file is a task that we are confronted with regularly, and although there are a vast number of means to do this, as a newbie, the field can be confusing and the tools can seem complex. A simple solution to parsing a customized CSV fi…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

816 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now