Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2363
  • Last Modified:

Using HTML agility pack to select a node

Hi guys,

From the below link I am trying to return '16 line items found.'

http://www.netcomponents.com/results.htm?t=f&r=1&pn1=12345J

My code so far is below, it keeps returning 0 results..

    HtmlWeb NC = new HtmlWeb();
            HtmlAgilityPack.HtmlDocument doc = NC.Load(NetComsURL.NetComURL);
            HtmlNodeCollection NetlinkNodes = doc.DocumentNode.SelectNodes("//div[@ID=\"livesearch\"]");

            if (NetlinkNodes != null)
            {
                // Loop through the nodes in and grab the last one
                foreach (var node in NetlinkNodes)
                {

                    txtResult.text += node.InnerText;

                }

Open in new window


Incase you cannot load the link here is a source code snippet....

<tr><td colspan="2"></td></tr></table></td><TD ALIGN=CENTER VALIGN=MIDDLE NOWRAP WIDTH=100% >&nbsp;<div ID="livesearch" style="display:none;;background: infobackground;"></div>16 line items found.</td> 

Open in new window


Many Thanks,

Dean
0
deanlee17
Asked:
deanlee17
  • 18
  • 14
  • 6
1 Solution
 
AndyAinscowCommented:
I'd put a breakpoint on line 5 (if (NetlinkNodes != null)) and inspect the values of the variables you have then single step the code to see what is happening.
0
 
deanlee17Author Commented:
Hi Andy I have done that. Its not finding anything, hence the result is 0
0
 
AndyAinscowCommented:
Have you iterated through the nodes in the document and inspected their values ?
0
NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

 
deanlee17Author Commented:
Not exactly, as im not sure how to do that, im new to this so still finding my way.
0
 
AndyAinscowCommented:
something like
foreach (var node in doc.DocumentNodes)
{
  //check what is in the node
}
0
 
käµfm³d 👽Commented:
Are you sure you are bringing back the correct page? When I click on your link above, I am taken to a login screen. Clicking the search button takes me to the page with the "16 line items found." line. The only difference I see between the target page and the link you posted is the "d=1" querystring parameter.

In addition to that, the "livesearch" <div> does not contain the text you are seeking; the <td> does.

(Formatted)
<TD ALIGN=CENTER VALIGN=MIDDLE NOWRAP WIDTH=100% >
    &nbsp;<div ID="livesearch" style="display:none;position:absolute;background: infobackground;"></div>
    16 line items found.
</td>

Open in new window

0
 
deanlee17Author Commented:
hi kaufmed,

We have an account with that site, so when my user clicks to go to it they are logged in automatically. Yes you are correct, its the TD data that I need to navigate to.

Any ideas?
0
 
käµfm³d 👽Commented:
Again, I'd make sure that HAP is pulling the correct document. When I try to run against the page, I don't get the correct HTML because apparently the site requires cookies:

Screenshot
There are apparently some ways to overcome this within HAP, but I don't have time at the moment to craft an example. I can work one up later this morning.
0
 
deanlee17Author Commented:
An example would be fantastic if you do get the time.

Hey how did you genrate the html visualiser image?
0
 
AndyAinscowCommented:
>>Are you sure you are bringing back the correct page?

That is why I suggested, in an earlier comment, checking what actually was being processed.
0
 
deanlee17Author Commented:
Yes, sorry Andy, I need to get AgilityPack to show me whats is getting read into it, im looking into how to do this.
0
 
käµfm³d 👽Commented:
Click the little arrow next to the magnifying glass for the property you are interested in:

Screenshot
0
 
deanlee17Author Commented:
Guys, you are absolutely right, it seems to be the cookies problem. I got the same error as you print screened earlier.

Earlier in my code I....

            string BrokerPrefix = "http://www.brokerforum.com/electronic-components-search-en.jsa?originalFullPartNumber=";
            string BrokerSuffix = "&x=50&y=16&hasNoSearchCriteria=false";
            string ConcatBroker = BrokerPrefix + TxtboxSearch + BrokerSuffix;
            BroBrowser.Navigate(ConcatBroker);


            BrokerSearch = new SearchResults ();
            BrokerSearch.BrokerURL = ConcatBroker;

Open in new window


And it loads correctly so assumed it was loading correctly in HTML Agility Pack.
0
 
käµfm³d 👽Commented:
Well the WebBrowser control (I assume that's what "BroBrowser" is) will handle cookies on its own. It's a scaled down version of IE, effectively. HAP relies on HttpWebRequest/Response, and so cookies need to be handled manually.
0
 
deanlee17Author Commented:
Yes you are correct it is a webbrowser cotrol. I see.
0
 
käµfm³d 👽Commented:
This appears to work for grabbing the page:

using System.Net;
using HtmlAgilityPack;

namespace ConsoleApplication56
{
    class Program
    {
        private static CookieCollection _cookies = new CookieCollection();

        static void Main(string[] args)
        {
            HtmlWeb web = new HtmlWeb() { PreRequest = BeforeHAPRequest, PostResponse = AfterHAPRequest };
            HtmlDocument doc = web.Load("http://www.netcomponents.com/results.htm?t=f&r=1&pn1=12345J");

            doc = web.Load("http://www.netcomponents.com/results.htm?d=1&t=f&r=1&pn1=12345J");
        }

        static bool BeforeHAPRequest(HttpWebRequest request)
        {
            request.CookieContainer = new CookieContainer();

            foreach (Cookie cookie in _cookies)
            {
                request.CookieContainer.Add(cookie);
            }

            return true;
        }

        static void AfterHAPRequest(HttpWebRequest request, HttpWebResponse response)
        {
            _cookies = request.CookieContainer.GetCookies(request.RequestUri);
        }
    }
}

Open in new window


Try executing your search against the HTML that is returned by the above.
0
 
deanlee17Author Commented:
Oh excellent.

Ok so now I need to get the returned data inside 'doc' into my code below? ....

 public void NetComponents()
        {

            HtmlWeb NC = new HtmlWeb();
            HtmlAgilityPack.HtmlDocument doc = NC.Load(NetComsURL.NetComURL);
            HtmlNodeCollection NetlinkNodes = doc.DocumentNode.SelectNodes("//div[@ID=\"livesearch\"]//td");

            if (NetlinkNodes != null)
            {
                // Loop through the nodes in and grab the last one
                foreach (var node in NetlinkNodes)
                {
                    //Create instance of class and load result into it
                    NetCommsReturnResults = new NetCommsForumSearch();
                    NetCommsReturnResults.NetCommsResult += node.InnerText;

                }
           
            }

Open in new window


How can I get that value out of the static class?

Thanks.
0
 
käµfm³d 👽Commented:
What static class?
0
 
AndyAinscowCommented:
                    //Create instance of class and load result into it
                    NetCommsReturnResults = new NetCommsForumSearch();
                    NetCommsReturnResults.NetCommsResult += node.InnerText;

Open in new window


Each time you assign a new object to an existing object (first code line - line 2) the previous object is replaced.  This code would ONLY ever give the value of the final node.InnerText, all the values from the other nodes are thrown away.
0
 
deanlee17Author Commented:
Oh yes I understand that it would only get the final node. Basically im struggling to integrate your code into my project. Ignore what I said about a static class.

The result of the project class is html stored within 'doc'?

How can I call 'doc' in the below code? ....

  public void NetComponents()
        {

            HtmlWeb NC = new HtmlWeb();
            HtmlAgilityPack.HtmlDocument doc = NC.Load(NetComsURL.NetComURL);
            HtmlNodeCollection NetlinkNodes = doc.DocumentNode.SelectNodes("//div[@ID=\"livesearch\"]//td");

            if (NetlinkNodes != null)
            {
                // Loop through the nodes in and grab the last one
                foreach (var node in NetlinkNodes)
                {
                    //Create instance of class and load result into it
                    NetCommsReturnResults = new NetCommsForumSearch();
                    NetCommsReturnResults.NetCommsResult += node.InnerText;

                }

            }
}

Open in new window


Many Thanks.
0
 
käµfm³d 👽Commented:
using System.Net;

...

private CookieCollection _cookies = new CookieCollection();

public void NetComponents()
{
    HtmlWeb NC = new HtmlWeb() { PreRequest = BeforeHAPRequest, PostResponse = AfterHAPRequest };
    HtmlAgilityPack.HtmlDocument doc = NC.Load(NetComsURL.NetComURL);
    HtmlNodeCollection NetlinkNodes = doc.DocumentNode.SelectNodes("//div[@ID=\"livesearch\"]//td");

    if (NetlinkNodes != null)
    {
        // Loop through the nodes in and grab the last one
        foreach (var node in NetlinkNodes)
        {
            //Create instance of class and load result into it
            NetCommsReturnResults = new NetCommsForumSearch();
            NetCommsReturnResults.NetCommsResult += node.InnerText;

        }

    }
}

private bool BeforeHAPRequest(HttpWebRequest request)
{
    request.CookieContainer = new CookieContainer();

    foreach (Cookie cookie in _cookies)
    {
        request.CookieContainer.Add(cookie);
    }

    return true;
}

private void AfterHAPRequest(HttpWebRequest request, HttpWebResponse response)
{
    _cookies = request.CookieContainer.GetCookies(request.RequestUri);
}

Open in new window


I think you'll need to preload the cookies collection by navigating the base URL. This is why you saw two requests in my example. Try it without it, and if it doesn't work, then add in another call to preload the cookies.
0
 
deanlee17Author Commented:
Invalid URI: The format of the URI could not be determined.

On line....

 HtmlAgilityPack.HtmlDocument doc = NC.Load(NetComsURL.NetComURL);

==========================================

Ok ignore the above, i was sending the wrong link. So ive sorted that, added a break point and im viewing whats being passed in, its the site login page.
0
 
käµfm³d 👽Commented:
Are you handling logins within your app? I recall you mentioning so earlier in the discussion.
0
 
AndyAinscowCommented:
If you only require the last node check if this works:
    if (NetlinkNodes != null)
    {
        // Get last node
            NetCommsReturnResults = new NetCommsForumSearch();
            NetCommsReturnResults.NetCommsResult = NetlinkNodes[NetlinkNodes.Count - 1].InnerText;
        }

Open in new window


If it does it means there is no looping required and only one new statement being executed
0
 
deanlee17Author Commented:
kaufmed: I am logging into the website manually (the first time the app loads) within the web browser and saving the credentials.

AndyAinscow: I shall try this when I have sorted out loggin into the site. You are right, makes no sense doing all that looping to get the last node.
0
 
käµfm³d 👽Commented:
Ah. I don't think that is going to work. I believe the WebBrowser and HAP's cookies will be mutually exclusive. Neither is accessible to the other. Try this out:

using System.Net;

...

private CookieCollection _cookies = new CookieCollection();

public void NetComponents()
{
    HtmlWeb NC = new HtmlWeb() { PostResponse = AfterHAPRequest };
    HtmlAgilityPack.HtmlDocument doc;
    HtmlNodeCollection NetlinkNodes;

    NC.PreRequest = BeforeHAPRequestLogin;
    NC.Load("http://www.netcomponents.com/login.htm");    
    NC.PreRequest = BeforeHAPRequest; 
    doc = NC.Load(NetComsURL.NetComURL);
    NetlinkNodes = doc.DocumentNode.SelectNodes("//div[@ID=\"livesearch\"]//td");
    
    if (NetlinkNodes != null)
    {
        // Loop through the nodes in and grab the last one
        foreach (var node in NetlinkNodes)
        {
            //Create instance of class and load result into it
            NetCommsReturnResults = new NetCommsForumSearch();
            NetCommsReturnResults.NetCommsResult += node.InnerText;

        }

    }
}

private bool BeforeHAPRequestLogin(HttpWebRequest request)
{
    string org = "org=" + your_acct_num;
    string login = "login=" + your_login_name;
    string pwd = "pwd=" + your_password;

    request.Method = "POST";

    using (StreamWriter writer = new StreamWriter(request.GetRequestStream()))
    {
        string requestBody = org + "&" + login + "&" + pwd;

        writer.Write(requestBody);
    }

    return true;
}

private bool BeforeHAPRequest(HttpWebRequest request)
{
    request.CookieContainer = new CookieContainer();

    foreach (Cookie cookie in _cookies)
    {
        request.CookieContainer.Add(cookie);
    }

    return true;
}

private void AfterHAPRequest(HttpWebRequest request, HttpWebResponse response)
{
    _cookies = request.CookieContainer.GetCookies(request.RequestUri);
}

Open in new window


Be sure to change your_acct_num, your_login_name, and your_password accordingly.
0
 
deanlee17Author Commented:
Ok changed it and got...

Object reference not set to an instance of an object.

on line

_cookies = request.CookieContainer.GetCookies(request.RequestUri);
0
 
käµfm³d 👽Commented:
Can you put a breakpoint on that line and see if there is a value in response.Cookies?
0
 
deanlee17Author Commented:
Home now mate, will do it first thing in morning
0
 
deanlee17Author Commented:
0
 
käµfm³d 👽Commented:
Sorry, I was asking for the response object  : )
0
 
deanlee17Author Commented:
Im sorry, I dont know which line you mean :)
0
 
käµfm³d 👽Commented:
In the code above, you have the method AfterHAPRequest. It takes two parameters. The first is the request; the second is the response. Mouse over the second parameter (while at a breakpoint within the method) and see if the Cookies collection is populated.
0
 
deanlee17Author Commented:
Ah ok, see attached...
printscreen.png
0
 
käµfm³d 👽Commented:
Can you expand Cookies? If you see a cookie count of more than zero, then try the following; otherwise, we'll have to try a different approach.

using System.Net;

...

private CookieCollection _cookies = new CookieCollection();

public void NetComponents()
{
    HtmlWeb NC = new HtmlWeb();
    HtmlAgilityPack.HtmlDocument doc;
    HtmlNodeCollection NetlinkNodes;

    NC.PreRequest = BeforeHAPRequestLogin;
    NC.PostResponse = AfterHAPRequestLogin;
    NC.Load("http://www.netcomponents.com/login.htm");    
    NC.PreRequest = BeforeHAPRequest;
    NC.PostResponse = AfterHAPRequest;
    doc = NC.Load(NetComsURL.NetComURL);
    NetlinkNodes = doc.DocumentNode.SelectNodes("//div[@ID=\"livesearch\"]//td");
    
    if (NetlinkNodes != null)
    {
        // Loop through the nodes in and grab the last one
        foreach (var node in NetlinkNodes)
        {
            //Create instance of class and load result into it
            NetCommsReturnResults = new NetCommsForumSearch();
            NetCommsReturnResults.NetCommsResult += node.InnerText;

        }

    }
}

private bool BeforeHAPRequestLogin(HttpWebRequest request)
{
    string org = "org=" + your_acct_num;
    string login = "login=" + your_login_name;
    string pwd = "pwd=" + your_password;

    request.Method = "POST";

    using (StreamWriter writer = new StreamWriter(request.GetRequestStream()))
    {
        string requestBody = org + "&" + login + "&" + pwd;

        writer.Write(requestBody);
    }

    return true;
}

private void AfterHAPRequestLogin(HttpWebRequest request, HttpWebResponse response)
{
    _cookies = response.Cookies;
}

private bool BeforeHAPRequest(HttpWebRequest request)
{
    request.CookieContainer = new CookieContainer();

    foreach (Cookie cookie in _cookies)
    {
        request.CookieContainer.Add(cookie);
    }

    return true;
}

private void AfterHAPRequest(HttpWebRequest request, HttpWebResponse response)
{
    _cookies = request.CookieContainer.GetCookies(request.RequestUri);
}

Open in new window

0
 
deanlee17Author Commented:
Cookie count was zero, sadly :(
0
 
käµfm³d 👽Commented:
If you login with your WebBrowser control, does [WebBrowser Control].Document.Cookie contain anything?
0
 
deanlee17Author Commented:
Hi, sorry I was off sick yesterday.

Are you asking me to set a break point?

I cannot run my program as it stands because it errors as mentioned about because 'HttpWebResponse response' is empty.
0

Featured Post

NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

  • 18
  • 14
  • 6
Tackle projects and never again get stuck behind a technical roadblock.
Join Now