Link to home
Start Free TrialLog in
Avatar of joshuadavidlee
joshuadavidlee

asked on

Need a regex pattern for finding href links in C#

Need a regex pattern for finding href links in an html document in C#

a good one please
Avatar of jonorossi
jonorossi

Avatar of joshuadavidlee

ASKER

umm i asked for href links and those examples are all http
so you want to find the href attributes of the a tags within a html file
i want the whole href links found yes
i have clients uploading html files and i have to modify ALL the href links
This one looks like it will do what you want, if not:
http://regexlib.com/REDetails.aspx?regexp_id=984

Then this one may:
http://www.mambers.com/archive/index.php/t-1849.html
try this

(?:[hH][rR][eE][fF]\s*=)
(?:[\s""']*)
(?!#|[Mm]ailto|[lL]ocation.|[jJ]avascript|.*css|.*this\.)
(.*?)(?:[\s>""'])
ok the first 2 links do not compile for some weird reason, escape sequnce issues or soemthing
what do i do the with the multiline one? put plus's imbetween?
(?<HTML><a[^>]*href\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>(?<Title>[^<]+|.*?)?</a\s*>)

You need to copy the regex from the test page since it is html encoded on the other page:
http://regexlib.com/RETester.aspx?regexp_id=984

I tried it and it picked up the links in a page.
       Regex r = new Regex("(?<HTML><a[^>]*href\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>(?<Title>[^<]+|.*?)?</a\s*>)");

does not form properly or compile?
Regex r = new Regex(@"(?<HTML><a[^>]*href\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>(?<Title>[^<]+|.*?)?</a\s*>)");
                                 ^

Prevent it from trying to escape the characters
joshuadavidlee

the one i gave you..
actually is a single reg ex..
they are mot multiple
sorry getting confused with who is who, ok the

Regex r = new Regex(@"(?<HTML><a[^>]*href\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>(?<Title>[^<]+|.*?)?</a\s*>)");

with the @ symbol is still not working, the s and others,  and anv how would i put that in the

Regex r = new Regex( ??? ) format?  this is driving me nuts
sorry for the confusion..

Try this expression..

href\s*=\s*(?:(?:\"(?<url>[^\"]*)\")|(?<url>[^\s*] ))
ASKER CERTIFIED SOLUTION
Avatar of anv
anv

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
well that accomplished soemthing thank u anv, any bigger longer better ones? i few odd thinkgs showed up in it and they all had " on the end of them but it looked ok
Try this one, I fixed the problem with the the escaping.

            string text = textBox1.Text;
            string pattern = "(?<HTML><a[^>]*href\\s*=\\s*[\"\']?(?<HRef>[^\"'>\\s]*)[\"\']?[^>]*>(?<Title>[^<]+|.*?)?</a\\s*>)";
            string result = "";
            Regex r = new Regex(pattern);
            foreach (Match m in Regex.Matches(text, pattern))
            {
                result += m.ToString() + "\n";
            }
            MessageBox.Show(result);
the last one ans gave seems to provide clearer results , while yours jon provides the link name and sometime other weird sentneces that r not links
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ok i'd like to split the points between u guys if i can figure out how ha
There is a split link at the bottom and you go to a page that has textboxes allowing you to enter how much for each person/post.