Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 8186
  • Last Modified:

Navigating MSHTML from C# without a WebBrowser control

I would appreciate some advice on how to correctly exploit MSHTML from within a C# application (without the use of a WebBrowser control).

Specifically, I am trying to use MSHTML to load a sequence of web pages that will be programmatically scraped. I understand that it is normal to use a Windows Forms or WPF WebBrowser control with MSHTML, but this is not possible in my case because the application I am developing is a class library and (as far as I can tell) the WebBrowser controls cannot operate without a window handle assigned by a hosting form.

Therefore, the approach I have adopted is the one described here:

http://radio.javaranch.com/balajidl/2006/01/18/1137606354980.html

This technique works as described and I am able to load pages and to access their DOM content through the MSHTML API. I have also succeeded in injecting JavaScript into a loaded page and invoking functions using the invokeScript() method, so I know that the scripting engine is running correctly.

Unfortunately, although this approach seems almost perfect for my needs, it fails when attemting to navigate from a loaded page (either by invoking click() on a hyperlink or submit() on a form). I have tried this using both JavaScript (within a loaded page) and via the API, but neither method results in MSHTML sending out a request for a new page (verified using Fiddler2).

I have proved that MSHTML is capable of issuing its own requests in this configuration, because I have succeeded in loading a sequence of pages by successive calls to the Navigate() method, but I can't find any way to get it to navigate from within a loaded document.

Any ideas would be very much appreciated.

Thanks,
Tim
0
Tim85
Asked:
Tim85
  • 2
  • 2
1 Solution
 
Todd GerbertIT ConsultantCommented:
I can't answer your question directly, however you should be able to use the WebBrowser control in a class library without any forms.  You will, however, need a reference to System.Windows.Forms.  Example below has a simple class that just gets the title of a web page:
using System;
using System.Windows.Forms;
 
namespace WebBrowserClassLibraryTest
{
	public class MyWebBrowser
	{
		public string GetTitleOfUrl(string url)
		{
			string returnValue;
			WebBrowser wb = new WebBrowser();
 
			wb.Navigate(url);
			while (wb.ReadyState != WebBrowserReadyState.Complete)
				Application.DoEvents();
			returnValue = wb.Document.Title;
 
			return returnValue;
		}
	}
}

Open in new window

0
 
Tim85Author Commented:
Thank you for your very quick solution - it works!

However, now that you have persuaded me that it is feasible to host a WebBrowser in a class library, I am tempted to use the GeckoWebBrowser (see http://geckofx.org/) because it operates completely independently of IE. However, when I try your code above
with GeckoWebBrowser it refuses to Navigate() until the window handle has been assigned.

Can you suggest any elegant way of faking this in a class library? (Sorry, I appreciate that this is a completely different question and I will post it separately if necessary).
0
 
Todd GerbertIT ConsultantCommented:
You can call CreateControl() on the GeckoWebBrowser object to manually create a handle for it.

You could also download/modify the source so this isn't necessary (I'm thinking this probably should be in the constructor for GeckoWebBrowser).
using System;
using System.Threading;
using System.Windows.Forms;
using Skybound.Gecko;
 
namespace WebBrowserClassLibraryTest
{
	public class MyWebBrowser
	{
 
		private GeckoWebBrowser gwb = new GeckoWebBrowser();
 
		public MyWebBrowser()
		{
			Xpcom.Initialize(@"C:\Program Files\XULRunner");
			gwb.CreateControl();
		}
 
		public string GetTitleOfUrl(string url)
		{
			gwb.Navigate(url);
 
			while (gwb.IsBusy)
			{
				Application.DoEvents();
				Thread.Sleep(100);
			}
 
			return gwb.Document.Title;
 
		}
 
	}
}

Open in new window

0
 
Tim85Author Commented:
Incredible !!!!! Thank you so much.

Tim
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now