• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 548
  • Last Modified:

extracting specific links from a web page

using html agility pack i am trying to extract links from a webpage. I can extract all links using the following code. But there is a table on this page with id="table1". I only want links present inside that table. What shall i change in this code(guess i need to modify only my xpath expression, but how shall i write xpath expression) to just extract links from that part of the page?    
protected void Page_Load(object sender, EventArgs e)
    {

        HtmlAgilityPack.HtmlWeb web = new HtmlWeb();
        HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.eppi.ioe.ac.uk/cms/Default.aspx?tabid=62");

        List<string> hrefTags = new List<string>();

                foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
        {
            HtmlAttribute att = link.Attributes["href"];
            hrefTags.Add(att.Value);
        }

                int i = 0;

    }

Open in new window

0
mmalik15
Asked:
mmalik15
  • 5
  • 3
1 Solution
 
käµfm³d 👽Commented:
Try this:

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//table[@id=\"table1\"//a[@href]"))

Open in new window

0
 
mmalik15Author Commented:
thanks for the comment but getting the following exception

'//table[@id="table1"//a[@href]' has an invalid token.
0
 
mmalik15Author Commented:
the complete method is

 
protected void Page_Load(object sender, EventArgs e)
    {

        HtmlAgilityPack.HtmlWeb web = new HtmlWeb();
        HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.eppi.ioe.ac.uk/cms/Default.aspx?tabid=62");

        List<string> hrefTags = new List<string>();

        foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//table[@id=\"table1\"//a[@href]"))
        {
            HtmlAttribute att = link.Attributes["href"];
            hrefTags.Add(att.Value);
        }

                int i = 0;

    }

Open in new window

0
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

 
mmalik15Author Commented:
I have read a couple of issues online where this problem was caused by some invalid characters in the xml but It works okay with the following line of code so i assume its not any invalid character in that xml

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
0
 
käµfm³d 👽Commented:
My fault. There should be a closing bracket on the first attribute selector:

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//table[@id=\"table1\"]//a[@href]"))

Open in new window

0
 
mmalik15Author Commented:
sorry I should have realised the missing bracket :(. But thanks kaufmed. you 're a start buddy :)
0
 
mmalik15Author Commented:
*star
0
 
käµfm³d 👽Commented:
NP. Glad to help  = )
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 5
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now