Regex help

Hi all,

I want to scan a html string and pull out all url and anchor text from the links contained in it.

I have the following;

<a href=\"(?<url>.*)\">(?<name>.*)</a>

example text i have pulled back is;

<html><head><title>website - /fonts/</title></head><body><H1>website - /fonts/</H1><hr>

<pre><A HREF="/">[To Parent Directory]</A><br><br> 4/28/2014  4:47 PM       242004 <A HREF="/fonts/BillionStars.ttf">BillionStars.ttf</A><br> 4/28/2014  4:47 PM       100380 <A HREF="/fonts/elriott2.ttf">elriott2.ttf</A><br></pre><hr></body></html>

however no matches are being returned from the following code;

HttpWebRequest request = WebRequest.Create(fonts_path) as HttpWebRequest;
        request.Accept = "*/*";
        WebResponse response = request.GetResponse();
       Regex regex = new Regex("<a href=\".*\">(?<name>.*)</a>");
        List<KeyValuePair<String, String>> files = new List<KeyValuePair<String, String>>();

        using (var reader = new StreamReader(response.GetResponseStream()))
            string result = reader.ReadToEnd();

            Global.log.Info("Debug: streamreader :" + result + ":");

            MatchCollection matches = regex.Matches(result);

            Global.log.Info("Debug: GetFonts 3:" + matches.Count);

            if (matches.Count == 0)
                Console.WriteLine("parse failed.");
                return null;

            foreach (Match match in matches)
                if (!match.Success) { continue; }

                Global.log.Info("Debug: MATCH :" + match.Groups["url"].ToString() + ":" +  match.Groups["name"].ToString() + ":");
                files.Add(new KeyValuePair<String, String>(match.Groups["url"].ToString(),match.Groups["name"].ToString()));

any ideas what i am doing wrong herE?
käµfm³d 👽Commented:
I would just look for either a double- or single-quote, chevron, or a space (if the HTML is malformed) at the end of the URL.


(?i)<a [^>]*?href=["']?(?<url>[^"'> ]+)

flynnyAuthor Commented:
perfect thank you.
Regex is probably not the best tool for this (do you have to use it)
The Html Agility pack makes this a breeze
käµfm³d 👽Commented:

Ordinarily I would agree. But since what the OP is after is very specific, and really has a finite number of configurations, regex should be fine. If he were trying to match nested constructs (e.g. <div><div></div></div>), the I would certainly advise something else (most likely HAP).
Regular Expressions

