• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 351
  • Last Modified:

Not using axWebBrowser

Hi,

I have a little application that performs naviation through a given site, using the axWebBrowser control. This works fine and you can see the software navigating through all the site, like an speedy slideshow.

                                axWebBrowser1.Navigate( (string)location );

But, now to increase performance I was requiered to re-do this without the visual stuff, i.e. removing the axWebBrowser component.
I recall I saw at some place that axWebBrowser was merely a wrapper and rendering for another class that was doing all the html parsing work. But I can't find now where I saw that.

So, basically, I need to get access to a class that returns me the list of links fo each given url, so I can perform the navigation.

Thanks.
0
fischermx
Asked:
fischermx
  • 4
  • 3
1 Solution
 
MaximKammererCommented:
For getting the data from the internet you should use the System.Net.WebRequest class.
Sample code from MSDN:

// Initialize the WebRequest.
WebRequest myRequest = WebRequest.Create("http://www.contoso.com");

// Return the response.
WebResponse myResponse = myRequest.GetResponse();

// Code to use the WebResponse goes here.

// Close the response to free resources.
myResponse.Close();

If you need both an Html parser & donwloader you can use the HTMLDocumentClass from the Microsoft.mshtml wrapper supplied with DevStudio. For an example have a look at:
http://www.dotnet247.com/247reference/msgs/15/75674.aspx

This is a problem description, but it contains some sample code you could use.


Best regards,
Maxim
0
 
fischermxAuthor Commented:
That HTMLDocumentClass is what I was looking for, thanks ! but I thought it has a different functionality, but it is pretty much the same as the axWebBrowser just without browsing.

Now, let me ask you, I already tried by using WebRequest, but that puts all the parsing work in my side, right ? I mean, I didn't find a way to get document elements from it ... or am I missing something ?

So, the question is, could be a way to combine both the webrequest and the document class ? Something like get the webresponse and assign it somewhere to the document class for it to get it parsed without revisiting the page ?

0
 
MaximKammererCommented:
Yes, the webrequest only gets you the document. You can find an html parser written in C# at:

http://www.planetsourcecode.com/vb/scripts/ShowCode.asp?txtCodeId=2201&lngWId=10

Best regards,
Maxim
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
fischermxAuthor Commented:
Maxim :

Thanks for your help.

The sample in the first link you send me dont work at all. I followed up that thread in google and it seems the guy didn't make it.
I tested it, too. The event handler used there is never being fired and again searching on google, I have only one reference to it and it is that thread, so I would guess, that's not that path to follow.


Now, this method works too without a parser :

                  System.Net.WebRequest webRequest = WebRequest.Create(url);
                  System.Net.WebResponse webResponse = webRequest.GetResponse();
                  System.IO.StreamReader streamReader = new System.IO.StreamReader(webResponse.GetResponseStream(),
                                                                                                                                   System.Text.Encoding.Default);

                  HTMLDocument hd = new HTMLDocument();
                  IHTMLDocument2 ihd2 = (IHTMLDocument2)hd;
                  ihd2.write(streamReader.ReadToEnd());
                  ihd2.close();

But it has two problems :
1.- It  does not solve correctly the relative paths, which may is solved using the parser you're showing now.
2.- The request response is a bit erratic on this method. Let me explain, if I point to http://www.google.com, I get the google page in my country. This is the default behavior in google site when you enter google first time, then you get a link "google in english", you click there, and then you are never sent back to the other page. I have no idea how to control this or what is the cause. May be some extra parameters ?

0
 
MaximKammererCommented:
Ad 1.) Perhaps the IHTMLDocument2 interface resolves relative links correctly if you also set the .URL property to the address you got the document from (otherwise it has no information what the link is relative to).
Ad 2.) Probably google stores this kind of information in Cookies. You could try something like this:

System.Net.CookieContainer jar = new System.Net.CookieContainer();

System.Net.WebRequest webRequest = WebRequest.Create(url);
webRequest.CookieContainer = jar;     // is null by default == cookie handling disabled

System.Net.WebResponse webResponse = webRequest.GetResponse();

// continue processing...

Best regards,
Maxim
0
 
fischermxAuthor Commented:
Thanks !
 
Now my code looks like this :

                        CookieContainer jar = new CookieContainer();
                  WebRequest webRequest = WebRequest.Create(url);
                  // webRequest.CookieContainer = jar;     // webRequest does not contain CookieContainer
                  WebResponse webResponse = webRequest.GetResponse();
                  StreamReader streamReader = new StreamReader(webResponse.GetResponseStream(), System.Text.Encoding.Default);

                  HTMLDocument hd = new HTMLDocument();
                  IHTMLDocument2 ihd2 = (IHTMLDocument2)hd;

                        // ((IHTMLDocument2)ihd2).write("<html></html>");
                  ihd2.write(streamReader.ReadToEnd());
                  ihd2.url = url;
                  ihd2.close();

But I have two problems :
1- The addition of "ihd2.url = url" is causing the IE opens and goes to the URL !! :) weird, isn't it ?
I tried to do the write(html) thing that you see commented because I saw it somewhere else but it is the same.
If I put first the ihd2.url = url and then read from the stream I get an object reference error.


2.- The cookie thing had a little problem. It seems that webRequest does not contains a CookieContainer member, I'm in the help file now, in that section it does not say where this class belongs to.



0
 
MaximKammererCommented:
Ad 2) - Sorry - my fault. The CookieContainer is a member of HttpWebRequest. So the code should look something like this:

WebRequest webRequest = WebRequest.Create(url);
((HttpWebRequest) webRequest).CookieContainer = jar;     // webRequest does not contain CookieContainer
WebResponse webResponse = webRequest.GetResponse();
 
Ad 1) - Yep, these classes are only wrappers for IE COM controls. Perhaps you really could use the html parser from the link above.

Best regards,
Matthias
0

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now