[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1549
  • Last Modified:

Need a regex pattern for finding href links in C#

Need a regex pattern for finding href links in an html document in C#

a good one please
0
joshuadavidlee
Asked:
joshuadavidlee
  • 10
  • 8
  • 4
2 Solutions
 
joshuadavidleeAuthor Commented:
umm i asked for href links and those examples are all http
0
 
jonorossiCommented:
so you want to find the href attributes of the a tags within a html file
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
joshuadavidleeAuthor Commented:
i want the whole href links found yes
0
 
joshuadavidleeAuthor Commented:
i have clients uploading html files and i have to modify ALL the href links
0
 
jonorossiCommented:
This one looks like it will do what you want, if not:
http://regexlib.com/REDetails.aspx?regexp_id=984

Then this one may:
http://www.mambers.com/archive/index.php/t-1849.html
0
 
anvCommented:
try this

(?:[hH][rR][eE][fF]\s*=)
(?:[\s""']*)
(?!#|[Mm]ailto|[lL]ocation.|[jJ]avascript|.*css|.*this\.)
(.*?)(?:[\s>""'])
0
 
joshuadavidleeAuthor Commented:
ok the first 2 links do not compile for some weird reason, escape sequnce issues or soemthing
0
 
joshuadavidleeAuthor Commented:
what do i do the with the multiline one? put plus's imbetween?
0
 
jonorossiCommented:
(?<HTML><a[^>]*href\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>(?<Title>[^<]+|.*?)?</a\s*>)

You need to copy the regex from the test page since it is html encoded on the other page:
http://regexlib.com/RETester.aspx?regexp_id=984

I tried it and it picked up the links in a page.
0
 
joshuadavidleeAuthor Commented:
       Regex r = new Regex("(?<HTML><a[^>]*href\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>(?<Title>[^<]+|.*?)?</a\s*>)");

does not form properly or compile?
0
 
jonorossiCommented:
Regex r = new Regex(@"(?<HTML><a[^>]*href\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>(?<Title>[^<]+|.*?)?</a\s*>)");
                                 ^

Prevent it from trying to escape the characters
0
 
anvCommented:
joshuadavidlee

the one i gave you..
actually is a single reg ex..
they are mot multiple
0
 
joshuadavidleeAuthor Commented:
sorry getting confused with who is who, ok the

Regex r = new Regex(@"(?<HTML><a[^>]*href\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>(?<Title>[^<]+|.*?)?</a\s*>)");

with the @ symbol is still not working, the s and others,  and anv how would i put that in the

Regex r = new Regex( ??? ) format?  this is driving me nuts
0
 
anvCommented:
sorry for the confusion..

Try this expression..

href\s*=\s*(?:(?:\"(?<url>[^\"]*)\")|(?<url>[^\s*] ))
0
 
anvCommented:
this is how u need to use it..

Regex r = new Regex("href\\s*=\\s*(?:(?:\\\"(?<url>[^\\\"]*)\\\")|(?<url>[^\\s]* ))");
0
 
joshuadavidleeAuthor Commented:
well that accomplished soemthing thank u anv, any bigger longer better ones? i few odd thinkgs showed up in it and they all had " on the end of them but it looked ok
0
 
jonorossiCommented:
Try this one, I fixed the problem with the the escaping.

            string text = textBox1.Text;
            string pattern = "(?<HTML><a[^>]*href\\s*=\\s*[\"\']?(?<HRef>[^\"'>\\s]*)[\"\']?[^>]*>(?<Title>[^<]+|.*?)?</a\\s*>)";
            string result = "";
            Regex r = new Regex(pattern);
            foreach (Match m in Regex.Matches(text, pattern))
            {
                result += m.ToString() + "\n";
            }
            MessageBox.Show(result);
0
 
joshuadavidleeAuthor Commented:
the last one ans gave seems to provide clearer results , while yours jon provides the link name and sometime other weird sentneces that r not links
0
 
jonorossiCommented:
They are very similar, I'm not a wiz at regex so maybe anv might be better at merging them together (which really is just adding the title to his).

                            href\\s*=\\s*(?:(?:\\\"(?<url>[^\\\"]*)\\\")|(?<url>[^\\s]* ))
(?<HTML><a[^>]*href\\s*=\\s*[\"\']?(?<HRef>[^\"'>\\s]*)[\"\']?[^>]*>(?<Title>[^<]+|.*?)?</a\\s*>)
0
 
joshuadavidleeAuthor Commented:
ok i'd like to split the points between u guys if i can figure out how ha
0
 
jonorossiCommented:
There is a split link at the bottom and you go to a page that has textboxes allowing you to enter how much for each person/post.
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

  • 10
  • 8
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now