We help IT Professionals succeed at work.

.NET c# - Read HTML to find link with specific class

Hi,

I have written a simple .NET c# application that downloads the source code of a HTML web page and reads the stream line by line.

The HTML contains many links like this:
<a class="org" href="/business/linkaddress">Link Text</a>

Open in new window


I need to extract the link address (eg: "/business/linkaddress" for all links that have class="org" and put into an array.

Can I do this with Regex?

string target = @"HTTP ADDRESS";

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(target);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

string resultline;
int i = 0;

using (Stream responseStream = response.GetResponseStream())
using (StreamReader htmlStream = new StreamReader(responseStream, Encoding.UTF8))

    while ((resultline = htmlStream.ReadLine()) != null)
		i = i+1;
		ClearCurrentConsoleLine();
        Console.WriteLine(i);
        
		if (Regex.IsMatch(resultline, "XXXXX"))
        Console.WriteLine(resultline.Trim(new char[] { ' ', '\t' }));

Open in new window

Comment
Watch Question

Explore More ContentExplore courses, solutions, and other research materials related to this topic.