Regex help

Posted on 2014-08-18
Last Modified: 2014-08-18
Hi all,

I want to scan a html string and pull out all url and anchor text from the links contained in it.

I have the following;

<a href=\"(?<url>.*)\">(?<name>.*)</a>

Open in new window

example text i have pulled back is;

<html><head><title>website - /fonts/</title></head><body><H1>website - /fonts/</H1><hr>

<pre><A HREF="/">[To Parent Directory]</A><br><br> 4/28/2014  4:47 PM       242004 <A HREF="/fonts/BillionStars.ttf">BillionStars.ttf</A><br> 4/28/2014  4:47 PM       100380 <A HREF="/fonts/elriott2.ttf">elriott2.ttf</A><br></pre><hr></body></html>

however no matches are being returned from the following code;

HttpWebRequest request = WebRequest.Create(fonts_path) as HttpWebRequest;
        request.Accept = "*/*";
        WebResponse response = request.GetResponse();
       Regex regex = new Regex("<a href=\".*\">(?<name>.*)</a>");
        List<KeyValuePair<String, String>> files = new List<KeyValuePair<String, String>>();

        using (var reader = new StreamReader(response.GetResponseStream()))
            string result = reader.ReadToEnd();

            Global.log.Info("Debug: streamreader :" + result + ":");

            MatchCollection matches = regex.Matches(result);

            Global.log.Info("Debug: GetFonts 3:" + matches.Count);

            if (matches.Count == 0)
                Console.WriteLine("parse failed.");
                return null;

            foreach (Match match in matches)
                if (!match.Success) { continue; }

                Global.log.Info("Debug: MATCH :" + match.Groups["url"].ToString() + ":" +  match.Groups["name"].ToString() + ":");
                files.Add(new KeyValuePair<String, String>(match.Groups["url"].ToString(),match.Groups["name"].ToString()));

Open in new window

any ideas what i am doing wrong herE?
Question by:flynny
    LVL 74

    Accepted Solution

    I would just look for either a double- or single-quote, chevron, or a space (if the HTML is malformed) at the end of the URL.


    (?i)<a [^>]*?href=["']?(?<url>[^"'> ]+)

    Open in new window


    Author Closing Comment

    perfect thank you.
    LVL 11

    Expert Comment

    Regex is probably not the best tool for this (do you have to use it)
    The Html Agility pack makes this a breeze
    LVL 74

    Expert Comment

    by:käµfm³d 👽

    Ordinarily I would agree. But since what the OP is after is very specific, and really has a finite number of configurations, regex should be fine. If he were trying to match nested constructs (e.g. <div><div></div></div>), the I would certainly advise something else (most likely HAP).

    Featured Post

    Looking for New Ways to Advertise?

    Engage with tech pros in our community with native advertising, as a Vendor Expert, and more.

    Join & Write a Comment

    Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
    Wouldn’t it be nice if you could test whether an element is contained in an array by using a Contains method just like the one available on List objects? Wouldn’t it be good if you could write code like this? (CODE) In .NET 3.5, this is possible…
    Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
    Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

    729 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    19 Experts available now in Live!

    Get 1:1 Help Now