joshuadavidlee
asked on
Need a regex pattern for finding href links in C#
Need a regex pattern for finding href links in an html document in C#
a good one please
a good one please
ASKER
umm i asked for href links and those examples are all http
so you want to find the href attributes of the a tags within a html file
ASKER
i want the whole href links found yes
ASKER
i have clients uploading html files and i have to modify ALL the href links
This one looks like it will do what you want, if not:
http://regexlib.com/REDetails.aspx?regexp_id=984
Then this one may:
http://www.mambers.com/archive/index.php/t-1849.html
http://regexlib.com/REDetails.aspx?regexp_id=984
Then this one may:
http://www.mambers.com/archive/index.php/t-1849.html
try this
(?:[hH][rR][eE][fF]\s*=)
(?:[\s""']*)
(?!#|[Mm]ailto|[lL]ocation .|[jJ]avas cript|.*cs s|.*this\. )
(.*?)(?:[\s>""'])
(?:[hH][rR][eE][fF]\s*=)
(?:[\s""']*)
(?!#|[Mm]ailto|[lL]ocation
(.*?)(?:[\s>""'])
ASKER
ok the first 2 links do not compile for some weird reason, escape sequnce issues or soemthing
ASKER
what do i do the with the multiline one? put plus's imbetween?
(?<HTML><a[^>]*href\s*=\s* [\"\']?(?< HRef>[^"'> \s]*)[\"\' ]?[^>]*>(? <Title>[^< ]+|.*?)?</ a\s*>)
You need to copy the regex from the test page since it is html encoded on the other page:
http://regexlib.com/RETester.aspx?regexp_id=984
I tried it and it picked up the links in a page.
You need to copy the regex from the test page since it is html encoded on the other page:
http://regexlib.com/RETester.aspx?regexp_id=984
I tried it and it picked up the links in a page.
ASKER
Regex r = new Regex("(?<HTML><a[^>]*href \s*=\s*[\" \']?(?<HRe f>[^"'>\s] *)[\"\']?[ ^>]*>(?<Ti tle>[^<]+| .*?)?</a\s *>)");
does not form properly or compile?
does not form properly or compile?
Regex r = new Regex(@"(?<HTML><a[^>]*hre f\s*=\s*[\ "\']?(?<HR ef>[^"'>\s ]*)[\"\']? [^>]*>(?<T itle>[^<]+ |.*?)?</a\ s*>)");
^
Prevent it from trying to escape the characters
^
Prevent it from trying to escape the characters
joshuadavidlee
the one i gave you..
actually is a single reg ex..
they are mot multiple
the one i gave you..
actually is a single reg ex..
they are mot multiple
ASKER
sorry getting confused with who is who, ok the
Regex r = new Regex(@"(?<HTML><a[^>]*hre f\s*=\s*[\ "\']?(?<HR ef>[^"'>\s ]*)[\"\']? [^>]*>(?<T itle>[^<]+ |.*?)?</a\ s*>)");
with the @ symbol is still not working, the s and others, and anv how would i put that in the
Regex r = new Regex( ??? ) format? this is driving me nuts
Regex r = new Regex(@"(?<HTML><a[^>]*hre
with the @ symbol is still not working, the s and others, and anv how would i put that in the
Regex r = new Regex( ??? ) format? this is driving me nuts
sorry for the confusion..
Try this expression..
href\s*=\s*(?:(?:\"(?<url> [^\"]*)\") |(?<url>[^ \s*] ))
Try this expression..
href\s*=\s*(?:(?:\"(?<url>
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
well that accomplished soemthing thank u anv, any bigger longer better ones? i few odd thinkgs showed up in it and they all had " on the end of them but it looked ok
Try this one, I fixed the problem with the the escaping.
string text = textBox1.Text;
string pattern = "(?<HTML><a[^>]*href\\s*=\ \s*[\"\']? (?<HRef>[^ \"'>\\s]*) [\"\']?[^> ]*>(?<Titl e>[^<]+|.* ?)?</a\\s* >)";
string result = "";
Regex r = new Regex(pattern);
foreach (Match m in Regex.Matches(text, pattern))
{
result += m.ToString() + "\n";
}
MessageBox.Show(result);
string text = textBox1.Text;
string pattern = "(?<HTML><a[^>]*href\\s*=\
string result = "";
Regex r = new Regex(pattern);
foreach (Match m in Regex.Matches(text, pattern))
{
result += m.ToString() + "\n";
}
MessageBox.Show(result);
ASKER
the last one ans gave seems to provide clearer results , while yours jon provides the link name and sometime other weird sentneces that r not links
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
ok i'd like to split the points between u guys if i can figure out how ha
There is a split link at the bottom and you go to a page that has textboxes allowing you to enter how much for each person/post.
http://geekswithblogs.net/casualjim/archive/2005/12/01/61722.aspx
Another one:
http://www.truerwords.net/articles/ut/urlactivation.html
One that no one can read:
http://aspn.activestate.com/ASPN/Cookbook/Rx/Recipe/59864